Preserve Text Wrapping & Gap between Table Borders during DOCX to PDF Conversion using C# .NET API

Hi,

When converting a DOCX with some tables to PDF. I noticed the borders became different and some one line rows became two lines.

Code:
var doc = new Document(@".\test14.docx");
doc.Save(@".\test14.pdf");

I’ve attached the test files and a screenshot for your reference. Could you please take a look?

image.png (150.4 KB)
test14.zip (162.2 KB)

Thanks,

@ServerSide527,

We tested the scenarios and have managed to reproduce the same problems on our end. For the sake of corrections, we have logged the following issues in our issue tracking system.

WORDSNET-19471: related to gap between table borders missing in PDF
WORDSNET-19477: related to incorrect text wrapping in PDF

We will further look into the details of these issues and will keep you updated on the status of corrections. We apologize for your inconvenience.

@ServerSide527,

Regarding WORDSNET-19477, we have completed the work on your issue and concluded to close this issue as Not a Bug. Please see the following analysis details:

This Word document has the Kerning feature enabled. Aspose.Words can correctly render such documents only with the help of the HarfBuzz shaping engine. This requires installing the Aspose.Words.Shaping.Harfbuzz nuget package and adding an extra line of code:

var doc = new Document(@".\test14.docx");
doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
doc.Save(@".\test14.pdf");

To learn more about OpenType features, please refer to the following article:
How to Use OpenType Features

Hi,

Thanks for your reply. I have tried the new library and it seemed to work on issue WORDSNET-19477, no new lines were created.

However, in the ticket status in this post, I saw WORDSNET-19471 (related to gap between table borders missing in PDF) was closed and not fixed, while WORDSNET-19477 is still open.

According to the comment in Preserve Text Wrapping & Gap between Table Borders during DOCX to PDF Conversion using C# .NET API - #2 by awais.hafeez it should be the other way around.

Could you help me check if the status of the ticket was incorrect?

Thanks,

@ServerSide527,

The issue of “Gap between Table Borders missing in PDF” (WORDSNET-19471) is now fixed and the fix will be included in the next version of Aspose.Words i.e. 19.12. We will inform you via this thread as soon as Aspose.Words 19.12 containing the fix of your issue will be released at the start of next month.

Thanks for your further input/feedback on this topic. We will close WORDSNET-19477 with “Not a Bug” status. Please use the workaround mentioned here.

Thank you.

I have read the article but could you give me a bit more technical details on how HarfBuzz operates with Aspose, since it is quite external?

I saw in the article that it was used for OpenType fonts but here the font involved was TrueType (Arial). Could you let me know why it still made a difference with the library?

Could you also let me know where else the library might make an impact and is there any known limitation / regression if I used this library?

Thanks,

@ServerSide527,

We have logged your questions/concerns in our issue tracking system and will keep you posted on further updates.

Hi,

Could we have any update on the previous question in Preserve Text Wrapping & Gap between Table Borders during DOCX to PDF Conversion using C# .NET API - #6 by ServerSide527 ?

We’d like to be aware of the possible positive or negative impact of introducing the library for all our PDF rendering (since we can’t know when we should use it or not for different documents, we will have to turn it on for all kinds of documents).

Thanks,

@ServerSide527,

Thanks for your inquiry. I am afraid, we do not have any further update on this topic yet. But, we have logged your concerns in our issue tracking system and will keep you posted on further updates. We apologize for the delay.

@ServerSide527,

If you have documents with such features as Kerning, number forms or have documents with complex scripts such as Arabic, Thai, Hindi and many others, the only option to get correct rendering to PDF is to use the HarfBuzz library. Using this library will probably affect performance slightly. However, this library is written in C, so it should work very fast. Also it is used in many Linux distributions nowadays, so we do not expect any serious issues with this library.

Regarding your other questions about relations between OpenType and TrueType fonts, we would suggest you please read these articles to get better understanding:

The issues you have found earlier (filed as WORDSNET-19471) have been fixed in this Aspose.Words for .NET 19.12 update and this Aspose.Words for Java 19.12 update.