Symbol, WIndings, Webdings etc

PatrickVB · April 4, 2018, 12:29pm

Dear Team,

I would like to get some advice on a possibly complex topic.
I need to process docx documents which often also have symbols (greek characters, webdings etc) characters inside.
I do not to process these document using java and convert the content into xml UTF-8 compliant.

Is there any support in the Aspose Word api for converting text fragments (would be runs I think) (using any of the special fonts like symbol, webdings etc) to its corresponding UTF-8 representation.

The only solution I could think of is the implementation of a kind of mapping where the input value together with the font name is used as input and then converted into its corresponding (if existing) UTF-8 representation.

Many thanks for your support

Patrick

tahir.manzoor · April 4, 2018, 4:47pm

@PatrickVB,

Thanks for your inquiry. Please ZIP and attach your input and expected output documents. We will then provide you more information about your query.

PatrickVB · April 5, 2018, 10:18pm

Hi Tahir,

Please find attached the zip containing a small docx document with some greek characters (using symbol font), then some additional characters (entered using insert symbol) and some wingdings characters.

Also the zip file contains a manually crafted (as I do not have the code yet) of how the output file needs to look like. For the symbol fonts, I know I made some errors with the mapping of the characters to the entity code.

I would like to understand how the aspose api can help me in achieve my target.

Many thanks
specialFonts.zip (12.9 KB)

tahir.manzoor · April 6, 2018, 5:34am

@PatrickVB,

Thanks for sharing the detail. Please note that Aspose.Words mimics the behavior of MS Word. If you convert your document to text file format using MS Word, you will get the same output of Aspose.Words. Unfortunately, symbols are not exported to text file format in your case.

You can save DOCX to TXT file format using Aspose.Words with UTF-8 encoding.

Document doc = new Document(MyDir + "in.docx");
TxtSaveOptions options = new TxtSaveOptions();
options.setEncoding(Charset.forName("UTF-8"));
doc.save(MyDir + "output.txt", options);

PatrickVB · April 6, 2018, 6:00am

So Tahir,

I my understanding correct that there is no support from Aspose API to convert Symbol, Wingdings, Webdings fonts to proper UTF-8 characters.

I’m aware that I can export to text using that save options. However this is not giving me the desired result. Also save as text in word does not produce the outcome that I need.

Any advise on how to tackle the problem at hand?

Kind regards

Patrick

tahir.manzoor · April 6, 2018, 9:55am

@PatrickVB,

Could you please share your expected output file format? We will then provide you more information about your query.

PatrickVB · April 6, 2018, 11:00am

The output format was in the same zip file.
It is the xml file.

tahir.manzoor · April 6, 2018, 2:40pm

@PatrickVB,

We have logged a feature request as WORDSNET-16670 in our issue tracking system to export symbols in Word document to XML file. We will look into the possibility of implementation of this feature. Once the analysis of this feature is completed, we will then update you via this thread.

tahir.manzoor · April 11, 2018, 7:21am

@PatrickVB,

Thanks for your patience. It is to update you that we have completed the analysis of WORDSNET-16670 feature and has come to a conclusion that we won’t be able to provide this feature. This feature has been closed with “Won’t Fix” resolution.

PatrickVB · April 11, 2018, 7:35am

Dear Tahir,

Thanks for the feedback.
In the mean while, I have implemented what I needed by a simple mapping of the symbol font character value to its corresponding Unicode value according to this mapping Symbol font – Unicode alternatives for Greek and special characters in HTML

It works like a char.

Kind regards

Patrick

tahir.manzoor · April 11, 2018, 11:35am

@PatrickVB,

It is nice to hear from you that your problem has been resolved. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.