Zero-width joiner is not rendered correctly when exporting Word document to PDF

Hello!

We are using Aspose.Words with the Aspose.Words.Shaping.HarfBuzz package for converting Word files to pdf. For most characters the pdf is rendered correctly, but there is a problem when characters are built together with a zero-width joiner. Lets assume we have the following fictious sequence of characters:
[A][ZWJ][B]

If the zero-width joiner has different formatting than the characters before and after, the characters will be rendered without the zero-width joiner. When pasting raw text into a document in MS Word, Word will automatically pick a font that includes the characters in the pasted string. Whitespace characters gets the default font, or more specifically, the run element has no formatting information, i.e. [A] and [B] gets the same fonts, while [ZWJ] gets no font. This appears to cause the JoinRunsWithSameFormatting function to interpret the character and zero-width joiner runs to have different formating, thus not joining them. While this is technically correct since they have different formatting, it will not produce a correctly rendered PDF.

While our example only shows this issue with zero-width joiner, we expect similar issues with zero-width non-joiner.

See attached zip archive for:

  • a word file containing different sequences of characters, with and without zero-width joiner, with and without formatting for the zero-width joiner, all with the expected rendering as an image.
  • pdf file generated with Aspose.Words 23.8.0 and Aspose.Words.Shaping.HarfBuzz 23.8.0
  • pdf file generated with Microsoft Word 2307 (Build 16626.20170)

ZWJ_examples.zip (116.0 KB)

@lars.olsson
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25784

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

1 Like

The issues you have found earlier (filed as WORDSNET-25784) have been fixed in this Aspose.Words for .NET 23.9 update also available on NuGet.

1 Like