Incorrect shown of Chinese surrogates after exporting to docx format

Hello All,
Chinese surrogates are displayed incorrectly after exporting in docx format. But when I export the same content into xlsx format I do not catch this issue.
Can you please help understand how it can be fixed? or It is known issue which is not fixed yet?
Thank you in advance for help.

Hi Vasili,

Thanks for your inquiry. Could you please attach your input Word document here along with code for testing? I will investigate the issue on my side and provide you more information.

Hello Tahir,

Thank you for answer.

In during additional investigation, I found that docx file displays Chinese surrogates correctly if I assigned “PMingLiU-ExtB” font during export but in Excel it works without it.

Please, see attached archive:

  • Program.cs: example of test code

  • sample.xlsx: created xlsx file

  • sample.docx: created docx file which does not display Chinese surrogates correct

  • sample2.docx: created docs file which display Chinese surrogate if “PMingLiU-ExtB” font was assigned

  • Unicode_symbols.txt: input data for export

Best Regards,

Vasili

Hi Vasili,

Thanks for sharing the detail. Please note that Microsoft Word and Microsoft Excel documents are completely different file formats. The characters you are trying to insert into document are not present in regular font. These characters are inserted with font ‘Times New Roman’ as this font is default font. So please use font ‘PMingLiU-ExtB’ for such characters.

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Font.Name = "PMingLiU-ExtB";
builder.Writeln("𩧱");
builder.Writeln("ヨーロッパ最大の自動車会社");
doc.Save(MyDir + "Out.docx");

Hello Tahir,
I understand differences of formats. But not clear for Excel with default fonts it works correctly but for Word it does not work.
Also, I understand that pre-processing of input content can be made in my code, but I think it will be more best solution if it will be applyed in Aspose component and will be available for all other developers which can catch this issue. Do you plan to make these changes? Because, now I have correct exporting Chinese surrogates into xlsx format and I’m failed with export into docx, pdf and csv.
Thank you in advance.
Best Regards,
Vasili.

Hi Vasili,

Thanks for your inquiry. I have managed to reproduce the same issue at my side. I have logged this issue as WORDSNET-8401 in our issue tracking system. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Hello Tahir,
Thank you. I will wait notification.
Best Regards,
Vasili.

Hi Vasili,

Our development team will analyze the Chinese surrogates issue. Once the analysis is completed, we will share more detail about this issue.

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

Hi Vasili,

It is to inform you that our development team has completed the work on the issue (WORDSNET-8401) and has come to a conclusion that this issue and the undesired behaviour you’re observing is actually not a bug in Aspose.Words. So, we have closed this issue as ‘Not a Bug’.

The DocumentBuilder.Writeln method takes font and paragraph formatting specified by the Font and ParagraphFormat properties. The Font will be taken from defaults Style.The characters you are trying to insert into document are not present in regular font. These characters are inserted with font ‘Times New Roman’ as this font is default font. In your case, I suggest you please use font ‘PMingLiU-ExtB’ for such characters.

If we can help you with anything else, please feel free to ask.