Greek symbols and Symbol font

tashiro · November 20, 2007, 10:02am

Hi,

I’m using Words for Java and try to read lots of documents with greek symbols, which are displayed by the Symbol font.

If I read these documents I get strange results for those symbols like for example

Run.getText():
small alpha = '\uf061’
small beta = '\uf062’
small gamma = '\uf067’

The results are not unicode nor are they strings encoded by the symbol charset.

So, how should I interpret these results?

Thank you,
Stephan Michels.

Konstantin · November 20, 2007, 10:27am

Hi,

Can you provide the document so that I can reproduce a problem? You can even truncate the doc and leave in it only few Greek symbols.

Best Regards,

tashiro · November 20, 2007, 10:45am

Sure! I added a zip file, which contain a basic test file.

Konstantin · November 21, 2007, 3:06am

Hi, Stephan,

1. MS Word really saves _these_ values. Aspose.Words just displays them.

2. This is Unicode.

Since Word 97, MS Word translates all ascii keys to Unicode using current code page. With symbol fonts this translation little complicated (or simplified?). Here is a small quote:

“When a symbol font is selected, text that is entered is translated to Unicode by a different translation than that provided by the current codepage. Since a symbol font could potentially contain any character or graphic symbol, there is no way to know if the character being entered is defined in Unicode or not. As a result, Word will translate the 8-bit value entered to a special range in Unicode known as the “Private Use Area”. Specifically, Word will translate the 8-bit value by adding 0xF000”

You can use Greek chars from non-symbol fonts to get more consistent results.

Best Regards,

tashiro · November 21, 2007, 8:51am

Okay, then the characters are just shifted about 0xf000 and are encoded by the Symbol code page.

That explains it, thank you.

Stephan Michels.