Wrong interpretation of the symbol "�"

I have the file *.docx. It contains the text paragraph:

“★ZOOM—Get closer to your ideas. Pinch to bring up zoom
without losing your place.”

I use the library Aspose to convert this docx file to HTML.

The library creates:

★ ZOOM**—**Get closer to your ideas. Pinch to bring up zoom without losing your place.

The question is: why the symbol “—” was surrounded with the tag “”? When I do the same operation via MS Word I get HTML like (the symbol “—” is not surrounded with the tag “”):

★ ZOOM**—**Get closer to your ideas. Pinch to bring up

zoom without losing your place.

Hi Tolik,

Thanks for your inquiry. Please note that Aspose.Words tries to mimic the same behaviour as MS Word do. Aspose.Words converts the MS Word documents to html (“Web Page, Filtered”). Please convert your document to HTML (“Web Page, Filtered”) by using MS Word and check the HTML output. Upon processing HTML, some features of HTML might be lost. You can find a list of limitations upon HTML exporting/importing here:
https://docs.aspose.com/words/java/load-in-the-html-html-xhtml-mhtml-format/
https://docs.aspose.com/words/java/save-in-the-html-html-xhtml-mhtml-format/

Hope this answers your query. Please let us know if you have any more queries.

Did you tried to export the attached file to HTML?

MS Word 2010 exports the file to HTML as I expect (it doesn’t surround the char “—”).

But you library surrounds it with the tag “span”.

Hi Tolik,

Thanks for your inquiry. Please try to convert your document to HTML (“Web Page, Filtered”) by using MS Word and check the HTML output. The following html output shows the character ‘—’ inside Span tag. I have attached the MS Word output html with this post for your kind reference.

★<span

lang=EN-US> ZOOM—Get closer to your ideas. Pinch to bring up zoom without

losing your place.

The problem is: your library surrounds the char “—” with the tag . Paragraph is surrounded with span and the char “—” is surrounded with the tag . This is the problem. There should be only one span tag for the paragraph content.

Could you look at the my first e-mail (message).

Hi Tolik,

Thanks for your inquiry. In your case, I suggest you please use Document.JoinRunsWithSameFormatting method. This method joins runs with same formatting in all paragraphs of the document. Hope this helps you.

Document dstDoc = new Document(MyDir + "whats-new.docx");
dstDoc.joinRunsWithSameFormatting();
dstDoc.save(MyDir + "Out.html");

It doesn’t work because of Aspose library thinks the char “—” has another Font.

Could you look at the my first request?

Why the library thinks that there are different fonts? It looks like this is origin why Aspose library splits the sentance on different spans with different fonts.

Hi Tolik,

Thanks for your inquiry. In case you are using an older version of Aspose.Words, I would suggest you please upgrade to the latest version (v13.6.0) from here:

https://downloads.aspose.com/words/java

tolik:

The library creates:

★ ZOOM—Get closer to your ideas. Pinch to bring up zoom without losing your place.

The question is: why the symbol “—” was surrounded with the tag “”? When I do the same operation via MS Word I get HTML like (the symbol “—” is not surrounded with the tag “”):

★ ZOOM—Get closer to your ideas. Pinch to bring up

zoom without losing your place.

Please note that Aspose.Words tries to mimic the same behaviour as MS Word do. Aspose.Words converts the MS Word documents to html (“Web Page, Filtered”). Please convert your document to HTML (“Web Page, Filtered”) by using MS Word and check the HTML output.

After using Document.joinRunsWithSameFormatting method the output html have same span tags. I have attached the output html files generated with both Aspose.Words and MS Word with this post for your kind reference.

Document dstDoc = new Document(MyDir + "whats-new.docx");

dstDoc.joinRunsWithSameFormatting();

dstDoc.save(MyDir + "Out.html");

MS Word output
★ ZOOM—Get closer to your ideas. Pinch to bring up zoom without losing your place.

Aspose.Words output
★ZOOM—Get closer to your ideas. Pinch to bring up zoom without losing your place.