Using the following code to convert a docx file to pdf.
Document doc = new Document("test.docx");
doc.Save($"out.pdf", SaveFormat.Pdf);
Results in a pdf with incorrect order of characters as shown in the following image.
TamilCharsBug.JPG (40.1 KB)
The following zip file contains “test.docx” and “out.pdf”
Tamil.zip (69.7 KB)
We see this incorrect order also when extracting the characters of the initial test.docx file, which is also important for us to be in correct order.
I think this topic may be related to this Bangla characters end up in the wrong order
@Jan_Kratzert
Please refer to the following article. You need to enable open type feature as shown below to get the desired output.
Enable OpenType Features
Document doc = new Document(MyDir + "Test.docx");
doc.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;
doc.Save(MyDir + "21.9.pdf");
Ok, thanks for the answer. Is it recommended to enable this OpenType feature in general? Or are there any effects on non OpenType fonts?
Your link mentions this “Text shaping is only performed when exporting to PDF or XPS formats.”. So I assume this won’t fix the incorrect order when extracting the text (Symbol, Font and Position), am I right?
@Jan_Kratzert
Please note that extracting text from document and rendering document to PDF or XPS are different process. You need to enable open type features when rendering document to PDF for better support of international languages and writing systems as compared to PostScript and TrueType.
If you want to extract text from document and save it to flow formats e.g. DOCX or DC, you do not need to enable open type features.