Find and replace text with Emoji generates question mark boxes using C#

Hi,

I am testing out Aspose.Words .Net 19.11.0. I would like to see how Emoji are inserted into word documents by Aspose.

I found that they are displayed as question mark boxes. When I copied it over to the forum, they are shown as the decimal code of the symbol. Please see below:

ª®iwwà¡Û☞♨ǼýèéùÉÈ😿😜🙉☨😻♭Ɨï😙😝ǀ ¤Ɗèéù'.=Wª®injdeÉÈÈÀ-f¤ƊÈÀ-'.W😿😜🙉☨😻♭Ɨï😙😝ǀ

When I copied it to other editors like notepad, they are displayed as the correct emoji.

ª®iwwà¡Û☞:hotsprings:ǼýèéùÉÈ:crying_cat_face::stuck_out_tongue_winking_eye::hear_no_evil:☨:heart_eyes_cat:♭Ɨï:kissing_smiling_eyes::stuck_out_tongue_closed_eyes:ǀ ¤Ɗèéù’.=Wª®injdeÉÈÈÀ-f¤ƊÈÀ-’.W​:crying_cat_face::stuck_out_tongue_winking_eye::hear_no_evil:☨:heart_eyes_cat:♭Ɨï:kissing_smiling_eyes::stuck_out_tongue_closed_eyes:ǀ

If I copied the above line from notepad to the word document, it would now display the emoji correctly.
And now if I copied the original line and pasted it in the same word document, it displays the correct emoji. But it only happens after I copy and paste the line from other applications.

I am wondering how do I make the generated word document to display the emoji correctly without all the copy and paste workaround?

Thanks!

@Otter

To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input Word document.
  • Please attach the output file that shows the undesired behavior.
  • Please attach the expected output file that shows the desired behavior.
  • Please create a standalone console application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

Hi,

Please see the standalone console application attached. You will also find the following in the ASPOSETest folder:

  1. Input Word document: SingleFieldTemplateCreatedByOffice2016.docx
  2. Output file: Generated.docx
  3. expected output file: Expected.docx

ASPOSETest.zip (52.5 KB)

I also find that the mail merge is not working correctly for header and footer. Is it because I am using an evaluation license? We are using a licensed older version of ASPOSE in our PROD and the mail merge is working correctly with header and footer. The code is very similar except the minor tweak on the API change.

Thanks!

@Otter

Please note that Aspose.Words mimics the behavior of MS Word. If you insert the same content into document using MS Word, you will get the same output. E.g. copy the same content into Notepad and save it. Insert the text document into your input Word document using MS Word. You will get the same output. Please check the attached image for detail. Emoji.png (29.3 KB)

Moreover, if you perform find and replace operation using MS Word, you will also get the same output. Please check the attached image for detail. find and replace.png (31.3 KB)

Hi Tahir,

That is exactly what I thought should happen. Unfortunately, it is not the case. Please see the screen capture of the debug line that I have added into the console program that I have sent earlier. The Replacement of the RepalcingArgs is assigned with a string that with emoji correctly displayed.
ASPOSE_Debug.PNG (32.9 KB)

However, the generated word document displays the emoji as question mark boxes.

image.png (3.4 KB)

If I copy the line from the generated word document and paste to a notepad, the emoji are displayed correctly.
image.png (5.1 KB)

Now if I copy the line from Notepad and paste it to the same word document, it is displayed correctly.

image.png (2.1 KB)

From the above scenario, Aspose doesn’t seem to mimics the same behavior as the MS word.
Also, why the header and footer were not processed in my test program? Could you please also look into that? Thanks!

@Otter

Thanks for sharing the detail. The Generated.docx does not has the shared issue. Please check the attached image for detail. generated.png (20.4 KB)

We have generated the document using the latest version of Aspose.Words for .NET 19.11 and have not found any issue with it. Please check the attached output document and let us know if you see the question mark boxes at your end.
19.11.zip (11.0 KB)

Thank you, Tahir.

Yes I do see the question mark boxes when I open the attached output document.
When I select the line and press CTRL+F, I see the emoji rendered correctly in the search box.
image.png (5.2 KB)

@Otter,

We are working on your query and will get back to you soon.

@Otter

The font used for content in your document is “Segoe UI Emoji”. You can check it by clearing the font formatting of text. Please make sure that the font set for this text is installed on your system or not.

Could you please also share the MS Word version that you are using? Moreover, please also set the font of entire text to ‘Times New Roman’ or ‘Calibri’ using MS Word and share the behavior you notice. Thanks for your cooperation.

Hi Tahir,

I have confirmed that I have “Segoe UI Emoji” installed on my desktop. I am using Microsoft Office Professional Plus 2016.

If I explicitly change the format of the {Assigned} to “Segoe UI Emoji”, the generated document displays the emoji correctly.

If I set the format of the {Assigned} to Times new Roman or Calibri, the generated document doesn’t display the emoji correctly.

I noticed that Aspose was able to automatically format the Chinese characters in the string as “DengXian”. Unfortunately, it was not able to do the conversions for emoji.

Thank you!

@Otter

It seems that MS Word 2016 behaves differently at your end and out end. We are using MS Word 2016 English version at our side. MS Word 2016 displays the emoji correctly at our end.

Moreover, it seems that it is not a bug. The template document contains the tag {Assigned} with font name ‘Times New Roman’. Aspose.Words sets the font name of inserted text as ‘Times New Roman’.

Hi Tahir,

As you can see in my generated document, the first 2 Chinese characters was automatically converted to DengXian(body) format while the rest of the characters remain as Calibri.

image.png (8.6 KB)

Does ASPOSE API perform the conversion internally (from Calibri to DengXian when it detects Chinese characters) or it is MS Word’s magic?

Thanks!

@Otter

Your input document has tag {Assigned} and its style is “Title” that has font name ‘Times New Roman’.

In your shared document “Generated.docx” (generated by Aspose.Words) has style for same paragraph as ‘Title’ and its font name is ‘Times New Roman’.

You can check it by unzipping document. Please change the extension of document from .docx to .zip and unzip the document. You can find the detail of Emoji in document.xml. Please check the attached image for detail. generated document.xml.png (29.9 KB)

So, Aspose.Words generates the document Generated.docx correctly.

It seems that you are using different document or MS Word behaves differently. We suggest you please check the same document at some other system. Moreover, please unzip your document and check the document.xml as shared above.

Hope this answers your query. If you still face problem, please let us know.

Hi Tahir,

The generated document.xml does display correctly on IE. However, it doesn’t display correctly on MS Word. Is there some metadata that we can add to the generated document that will make MS word know that it needs to fall back to other fonts? We are hoping to upgrade our ASPOSE version, however it is hard to justify if we can’t get this resolved.

It looks like we are dealing with a similar issue as:

With the example data from that post, I have received quite a few boxes instead of the correct fonts.

the quick brown fox jumped over the lazy dog ૂપા ૌહગમક વીદૈલ િદં રહસજા્ દનાી ૂપા તોબ ્દુ 快速的棕色狐狸跳過懶惰的狗 тхе љуицк броњн фоџ јумпед овер тхе лаѕз дог فاث ضعهؤن لاقخصى بخء تعةحثي خرثق فاث مشئغ يخل տհէ խըիգկ բրուն ֆոց ճըմպէդ ովէր տհէ լազե դոք ੂਪਾ ੌਹਗਮਕ ਵੀਦੈਲ ਿਦੰ ਰਹਸਜਾ੍ ਦਨਾੀ ੂਪਾ ਤੋਬ ੍ਦੁ тхе љуицк броњн фоџ јумпед овер тхе лаѕз дог otğ frnvm çıhgz ahö krspğe hcğı otğ lujd ehü:)Ended here last…:joy::joy::joy::joy::joy::joy: 絵文字"

image.png (8.9 KB)

Is it something that is fixed in Java but not .Net?

Thanks!

@Otter

You may try to convert text into UTF-8 encoding and then insert it into Word document. Hope this helps you.

Could you please share the screenshot of document.xml?

Could you please share the operating system detail that you are using?

We will further investigate this issue and provide you more information on it.

Hi Tahir,

I have tried to explicitly convert the text to UTF-8 encoding before insert, but it didn’t help.

Please find the screenshot of the document.xml.
image.png (47.3 KB)

Here is my OS information:
OS Name: Microsoft Windows 10 Enterprise
Version: 10.0.17134 Build 17134

I do have the necessary fonts installed on my desktop. Is it something to do with Office Word setting? Since the same document works on your desktop but not on my desktop. Unfortunately, our organization deploys a standard configuration to all desktops. It means that all desktops in our organization will not be able to see the emoji correctly unless we are able to pinpoint what need to be changed in the setting.

image.png (73.1 KB)

Thank you!

@Otter

Thanks for sharing the detail. The document.xml is correct at your end. The style of Emoji is also correct. MS Word should display the Emoji correctly. Have you checked the same document at different systems? Please check it and share your findings here for our reference. Thanks for your cooperation.

Perhaps, you are facing this issue due to MS Word version. Please read the following article.
Emojis are not displayed in Office applications

Yes, I have my colleagues to check it on their machines. The emoji were not displayed correctly on their machines.

Thanks for the article, but it is for Windows 7. All our machines are on Windows 10.

@Otter

Aspose.Words writes the Emoji into document correctly and it is visible in document.xml. It seems that this issue is related to MS Word. Could you please share the complete version of MS Word as shared in attached image?
office version.png (18.0 KB)

A post was split to a new topic: DOCX to PDF conversion issue with Emojis rendering