Need help to convert html to PDF

ksheng · March 3, 2021, 3:07pm

I got some issues converting html to pdf and they failed. I attach some files here.

The other question is want to ask is that it is generally slow to convert html to pdf. Is there any approach to improve the performance? Does Aspose code fetch any content through internet if there is a link in the html file?

thanks,
kevin

asad.ali · March 3, 2021, 8:27pm

@ksheng

The Aspose.HTML does not fetch any content from the internet. However, could you please share the sample code snippet that you are using to convert the attached files? We will test the scenario in our environment and address it accordingly.

ksheng · March 3, 2021, 9:01pm

We use something like this.

pdfSaveOptions = new PdfSaveOptions();
if (StringUtils.startsWithIgnoreCase(FilenameUtils.getExtension(inputFile.getFilePath()), "M") || StringUtils.startsWithIgnoreCase(inputFile.getFileType(), "M") ) {
                Converter.convertMHTML(inputFile.getFilePath(), pdfSaveOptions, responseFilePath);
} else {
                Converter.convertHTML(inputFile.getFilePath(), pdfSaveOptions, responseFilePath);
            }

ksheng · March 4, 2021, 12:47am

@asad.ali i re-run the failed html files and most of then were successfully converted to pdf except this one here.

161974902.zip (1005.2 KB)

The interesting thing is that the fails appeared to be related to multithreading processes.
I am planning to use rerun method unless you can find something wrong for me.

On the other hand, I do have an issue with html performance hit. One attached here document took 10 minutes to convert. I attached it here for you and maybe you can find out a fix for us. I have no idea why it took so long since this is a simple html page document.

160129694.zip (5.4 KB)

thanks,
kevin

asad.ali · March 4, 2021, 3:47pm

@ksheng

We were able to notice java.lang.StackOverflowError Exception with the above file while testing the scenario with Aspose.HTML for Java 20.12. Therefore, have logged an issue as HTMLJAVA-736 in our issue tracking system for the sake of correction.

Furthermore, we were also able to notice the delay in conversion process in case of the second HTML file that you shared. A ticket as HTMLJAVA-737 has been logged for this case so that we can investigate and check how performance can be improved for conversion process.

We will look into details of both tickets and keep you posted with the status of their correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

ksheng · March 4, 2021, 6:14pm

thank you @asad.ali.

I wonder why the delay is extensive. is it trying to connect to internet or doing some checking and could not get through?

Thank you so much for taking care of the issues.

I know you guys always did great job supporting us and I don’t know if this matters and changes anything as we have purchased the license. If this info can expediate the process, please let me know.

thanks,
Kevin

asad.ali · March 4, 2021, 8:54pm

@ksheng

The API does not try to connect to the internet for conversion unless there is some resource loading from an URL in the HTML e.g. image. However, we will further investigate the reasons behind this behavior of the API and let you know once additional updates are available. We have also recorded your concerns and will consider them during the investigation of the logged ticket. Please give us some time.

We apologize for the inconvenience.

ksheng · March 4, 2021, 9:37pm

Thanks @asad.ali I will let your engineers to work on that.

Just want to confirm one thing. If an html page has the img, it will try loading the jpg from the source, in this case, it is
https://secure.ivari.ca/ivari_UserManagement/img/ivari-med-fr-117x50.jpg

is that correct? what if the resource is not reachable or deleted or blocked due to security rules?

Note i removed the <> tag from the html below in order for you to see the html content. otherwise you will see the empty image with “ivari” text like the bottom of the message.

img width=117     height=50 id="_x0000_i1025"
    src="https://secure.ivari.ca/ivari_UserManagement/img/ivari-med-fr-117x50.jpg"
 style='display:block' alt=ivari

<img width=117     height=50 id="_x0000_i1025"
    src="https://secure.ivari.ca/ivari_UserManagement/img/ivari-med-fr-117x50.jpg"
 style='display:block' alt=ivari>

asad.ali · March 4, 2021, 11:31pm

@ksheng

Yes, that is correct. The API will try to load the JPG from the source and the delay in conversion is happening as the source is not available or blocked. When we removed the image source from HTML, the conversion speed increased and output PDF was generated in seconds.

ksheng · March 5, 2021, 12:17am

that’s good to know @asad.ali, thank you. If the remote resource is not available, would it make sense to have a configurable setting like time to wait so it does not have to wait too long?

Do you have any recommendation how we can work around this type of problems?

thanks/kevin

asad.ali · March 5, 2021, 9:17pm

@ksheng

This is something that we need to implement in the API yet. For the sake of implementation, we have logged a feature request as HTMLJAVA-738 in our issue tracking system. We will surely inform you as soon as the feature is available. Please give us some time.

We apologize for the inconvenience.

ksheng · March 23, 2021, 2:00pm

I want to follow up this issue, @asad.ali.

Sometimes hyper link within a html file does not exist anymore because the remote site has been deleted. When converting html to pdf document seems taking a bit longer, like 2 minutes for one page html page. I know we’re ok to just ignore those remote resources, but how do i do to shorten the conversion time? Is there any parameter to pass to html to pdf conversion and dictate the program NOT TRYING to retrieve any remote resources? We’re running the conversion in a server which does not have internet connection anyways.

I can set up a proxy server for http request, how do I tell the program to use the proxy or can we change http request timeout setting? Either one of these should allow us to improve the performance.

Thanks for your help.
Kevin

asad.ali · March 24, 2021, 11:35pm

@ksheng

We regret to inform you that the ticket HTMLJAVA-738 has not been yet fully investigated. Please note that we will surely investigate and resolve the ticket however, it will be resolved on a first come first serve basis. We will certainly inform you as soon as some significant progress is made towards its resolution. Please give us some time.

We apologize for the inconvenience being faced.

gianfranco.dancelli · December 5, 2023, 1:16pm

Hello,
I’m interested in same issue as @ksheng reported. I want to convert HTML to PDF via Aspose.HTML for Java. If the HTML doesn’t contain images the PDF is produced correctly, if it contains images conversion never ends. What is the result of ticket HTMLJAVA-738 today?

Thank you.
Gian

asad.ali · December 7, 2023, 12:45pm

@gianfranco.dancelli

The ticket HTMLJAVA-738 has been under the investigation and due to some technical difficulties, it could not get resolved. We are currently working on implementing its fix and as soon as it is resolved, we will update you in this forum thread. Please spare us some time. We are sorry for the inconvenience.