We are using Aspose.Words with the Aspose.Words.Shaping.HarfBuzz package for converting Word files to pdf. For most characters the pdf is rendered correctly, but there is a problem when characters are built together with a zero-width joiner. Lets assume we have the following fictious sequence of characters:
If the zero-width joiner has different formatting than the characters before and after, the characters will be rendered without the zero-width joiner. When pasting raw text into a document in MS Word, Word will automatically pick a font that includes the characters in the pasted string. Whitespace characters gets the default font, or more specifically, the run element has no formatting information, i.e. [A] and [B] gets the same fonts, while [ZWJ] gets no font. This appears to cause the JoinRunsWithSameFormatting function to interpret the character and zero-width joiner runs to have different formating, thus not joining them. While this is technically correct since they have different formatting, it will not produce a correctly rendered PDF.
While our example only shows this issue with zero-width joiner, we expect similar issues with zero-width non-joiner.
See attached zip archive for:
- a word file containing different sequences of characters, with and without zero-width joiner, with and without formatting for the zero-width joiner, all with the expected rendering as an image.
- pdf file generated with Aspose.Words 23.8.0 and Aspose.Words.Shaping.HarfBuzz 23.8.0
- pdf file generated with Microsoft Word 2307 (Build 16626.20170)
ZWJ_examples.zip (116.0 KB)