HTML tag <hr> causes contents to extend beyond margins when converted to PDF

Dear Aspose Support Team

I am experiencing issues when converting certain HTML documents to PDF using Aspose.Words. When the hr tag is used inside a table, it can sometimes cause the contents of that table to push beyond the margins of converted PDF document. This happens when some styling is used, such as alignment and padding. The tag also misbehaves inside nested tables, where any text pushed beyond the margin is obscured.

You can use the attached HTML document to replicate this issue. I am also including a PDF document for reference, that was converted using version 23.7 of the Aspose.Words Java library. I used the default HtmlLoadOptions and PdfSaveOptions.

Thank you for your assistance.

Kind regards
ML
hr-tag-issue-html.zip (848 Bytes)
hr-tag-issue.pdf (34.8 KB)

@ml42 Unfortunately, I cannot reproduce the problem on my side using the latest 23.8 version of Aspose.Words for Java. I have used the following simple code for testing:

Document doc = new Document("C:\\Temp\\in.html");
doc.save("C:\\Temp\\out.pdf");

out.pdf (31.7 KB)

Thank you for your swift response. I tried converting the example using the latest version 23.8 using the same test code, but I am getting the same result as before. Is it possible this issue only happens on Linux machines?
converted-with-aspose-words-23-8.pdf (34.8 KB)

@ml42 The document you have attached dis produced by 23.7 version, not 23.8. Also, as I can see fonts used in your document differ from the fonts used in my output PDF document. Most likely this occurs because fonts required for document rendering are not available in your Linux environment. To build an accurate document layout the fonts are required. If Aspose.Words cannot find the fonts used in the document the fonts are substituted . This might lead into the layout and appearance difference. You can implement IWarningCallback to get a notification when font substitution is performed.
Also, see our documentation to learn where Aspose.Words looks for fonts and how to specify fonts location:
https://docs.aspose.com/words/java/specify-truetype-fonts-location/

@alexey.noskov Thank you, I double checked and the latest version improves the issue somewhat.
Some more complicated HTML is still showing problems for me, though, probably due to the font substitution as you are suggesting:

Font ‘Arial’ has not been found. Using ‘Liberation Sans’ font instead. Reason: table substitution.
Font ‘Times New Roman’ has not been found. Using ‘FreeSerif’ font instead. Reason: table substitution.
Font ‘Helvetica’ has not been found. Using ‘FreeSans’ font instead. Reason: table substitution.

I will try adding the TrueType fonts to my application and come back to you.

1 Like

@alexey.noskov Providing the correct fonts did not improve my issue with text extending beyond document margins for certain HTML files.
But I managed to link it to my use of AutoFitBehavior.AUTO_FIT_TO_WINDOW and explicit setting of paper size to A4. I am setting this auto-fit behaviour for all tables in my document to make sure they can fit inside the PDF. I am currently working on an example that I could share here.

1 Like

@alexey.noskov Thank you for your patience. Can you please try to reproduce this issue with the attached HTML example? I am including my output PDF for reference.

I am loading the required font following the guide you provided and IWarningCallback shows no font substitution warnings on my end.
missing-text.zip (47.6 KB)

@ml42
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25902

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

1 Like

Hi @alexey.noskov,
Thanks so much for opening the ticket on your side.
We, Global Relay, have paid support services with Aspose for past more than 10 years.(Aspose. Word and Aspose.Email)
Could you let us know when the fix will be released?

Regards

@vsingh52 We have completed analysis of the issue. The issue occurs because table column widths are not calculated the way MS Word does for a table in html input.

The issue is to be addressed by the new logic being implemented per WORDSNET-832. Currently the logic is not applied to tables with wrapped shapes inside cells as content metrics for text wrapped around shapes are not implemented.

The issue has been postponed until metrics for tables with wrapped shapes are supported per WORDSNET-832. So at the moment we cannot provide you any reliable estimate.

HI @alexey.noskov
Thank you for providing the update.
Can we keep this thread open so that we can track this?
Also, I saw this old thread about he same issue posed around 8 years ago.
There was some fix mentioned in the last comment. Is that not relevant?

What is the frequency of these kind of fixes in order to track and map it better at our side?

Thanks

@vsingh52 We continuously work on improving our document layout engine to make it closer to MS Word. But unfortunately, we cannot provide you any estimates regarding the issue at the moment. We will be sure to let you know once the issue is resolved or we have additional information for you.

Thank you @alexey.noskov. I will wait for an update.

1 Like

Hello @alexey.noskov
Is there any update on the fix timing? Could you prioritize this defect for paid customers like Global Relay?
It is getting difficult to explain this issue to our major clients and they are asking us for a fix.
Is there any workaround that we can try on our side?

-Varun

@vsingh52 We have completed analyzing the issue. The issue occurs because table column widths are not calculated by Aspose.Words the way MS Word does for a table in html input.

The issue is to be addressed by the new logic being implemented per WORDSNET-832. Currently the logic is not applied to tables with wrapped shapes inside cells as content metrics for text wrapped around shapes are not implemented.

The issue has been postponed until metrics for tables with wrapped shapes are supported per WORDSNET-832.