HTML to PDF conversion issues

Hi, I am trying to convert HTML files to pdf and encountered with a proble and didn’t get the expected result. Page breaks are not the same. In addition, the there is a difference in the distance between the headlines - see the differences between the attached files in the headlines “Table of Contents” and “Introduction” and also the missing line under the “OASIS” headline in the first page.

Does this result from using the limited version where there is a text “Evaluation only…” at the beginning of each page or from other reasons?

files.zip (361.6 KB)

@ghaj17

You are using Aspose.PDF to convert HTML to PDF and you have posted this query in Aspose.Words’ forum. We have moved this forum thread to Aspose.PDF where you will be guided appropriately.

@ghaj17

We tested the scenario using Aspose.PDF for .NET 19.8 with valid license and following code snippet:

var opts = new HtmlLoadOptions();
opts.PageInfo.Margin = new MarginInfo(0, 0, 0, 0);
Document doc = new Document(dataDir + "input.html", new HtmlLoadOptions());
doc.Save(dataDir + "htmltopdf.pdf");

HTMLtoPDF.pdf (304.3 KB)

Above is the output that we obtained in our environment. We were able to notice the issue with page breaks however, the issue related to missing lines could not be replicated. We have logged replicated issue as PDFNET-46853 in our issue tracking system for the sake of correction and will let you know as soon as it is resolved. Please spare us little time.

Yes, the text appears on each page is because of using the API in trial mode. You can get a free 30-days temporary license in order to evaluate the API without any limitation.

We are sorry for the inconvenience.

Actually I used Aspose.Words for Java (dependency aspose-words 19.7 jdk17) to convert HTML to PDF.
When I tried to use Aspose.PDF for Java (dependency aspose-pdf 19.5 jdk17) I got a worst result and it takes much more time to convert and some HTML files were not converted at all. In addition I saw in the documentation that Aspose.Words is also for HTML files.
Here is the result of the HTML to PDF conversion using Aspose.PDF for Java and I changed the code according to what you wrote:

HtmlLoadOptions options = new HtmlLoadOptions();
options.getPageInfo().setMargin(new MarginInfo(0, 0, 0, 0));
Document doc = new Document(dataDir + “input.html”, options);
doc.save(dataDir + “htmltopdf.pdf”);

htmltopdf.pdf (223.0 KB)

You can see for example that the “OASIS” and its underline is missing.

@ghaj17

Would you kindly share a screenshot of the issue that you are mentioning. It would help us understand the issue and address it accordingly.

Here is a screenshot of the original document: originalScreenshot.PNG (45.3 KB)

You can see that the highlighted headline “OASIS” and its underline is missing in the converted to pdf file named “htmltopdf.pdf” that was attached in my last comment.
I also saw that there is an option to convert the html file using Aspose.HTML library. I saw the exmple code in this link: Java HTML API - HTML and CSS Markup Parser and Translator

// load the file to be rendered
HTMLDocument html = new HTMLDocument(dir + “template.html”);
// render to PDF & XPS HtmlRenderer renderer = new HtmlRenderer();
renderer.render(new PdfDevice(new PdfRenderingOptions(), dir + “output.pdf”), html);

but the pdf result was not good at all. Here is the converted to pdf file using Aspose.HTML:
doc_sample.pdf (110.7 KB)

I just need to know which library is the best for HTML to PDF conversion and how to configure it that the result will be exactly like the original HTML file. Pay attention that we use JAVA and not .NET.

@ghaj17

We have tested the scenario with Aspose.PDF for Java 19.7 and obtained attached output.

HTMLtoPDF_19.7.pdf (276.7 KB)

This issue was not replicated when using API with valid license. Also, we found that resultant document with licensed version of the API looks better. The total time taken by the program was 27 seconds in our environment i.e. Windows 10 EN x64, IntelliJ Idea (Console App), RAM 8GB, Core i5 2.1 GHz.

Would you kindly share your environment details and total time taken by the program. We will further proceed to assist you accordingly. Furthermore, we will share our feedback with you regarding Aspose.HTML for Java soon.

The result I sent before was done by Aspose.PDF for Java 19.5.
Now I used the the new version Aspose.PDF for Java 19.7 and got a better result: doc_sample.pdf (315.2 KB)

However, it still took a long time to convert: 28.497 seconds.

My environment details: Windows 10 Enterprise EN x64, Eclipse, RAM 32GB, Core i7 2.71 GHz

Thanks for your help.

@ghaj17

We have logged an issue as PDFJAVA-38789 in our issue tracking system for further investigation. We will surely look into details of the issue and keep you informed on its resolution status. Please spare us little time.

We are sorry for the inconvenience.

Uploading: target.pdf…
Cannot upload the html file but here is the html:

unable to upload an html file.

@ocecontent

Please upload your sample HTML in ,zip format along with the code snippet that you are using. We will test the scenario in our environment and address it accordingly.