Free Support Forum - aspose.com

Special characters problem

Aspose.Slides version is 2.8.6.0
TextFrame.Text property returns wrong character codes for special characters.
Attached presentation contains text frame with character 0xfc, but Aspose.Slides.TextFrame.Text[0] for that frame contains character with code 0xf0fc. The same problem with bullets.

Dear Dmitry,

Aspose.Slides is showing the right code, the code 0xfc is actually for character ü, you can verify it by writing ü in a text file and opening it in a browser after saving it with extension .html

A sample app code is attached. Please build it with MSVS 2008. And open ppt file attached above using this app. It shows character code as 61692 (0xf0fc in hex). But character has code 0xfc (252 in dec). Am I doing anything wrong?

Ppt files can have 8bit ANSI or 16bit Unicode characters. Most probably PowerPoint saved this character as unicode. That’s ok.
Aspose.Slides also converts everything to unicode so 0xfc character after conversion becomes 0xf0fc.
If you need convert it back just clear 8 high bits.

ch = ch & 0x00ff;

I think it is wrong solution… What if inserted character code is realy unicode? (Attached PPT demonstrates such case)

May be it is a bug in Aspose ANSI->UNICODE converter?

That is definitely not a bug.
I wrote only about ANSI characters converted to Unicode.
That doesn’t mean you can use this way for any Unicode character of course.

ANSI mapped to Unicode table in the range 0xF000 - 0xF0FF.
In the second presentation unicode character is 0x25CA.

Ok. Is there a way to detect if portion contains Unicode or ANSI charset?
Are all characters in the range 0xF000 - 0xF0FF are mapped from ANSI or there is a special flag for such cases?

Yes, all characters in this range are mapped from ANSI.

Ok. I found that MS uses 0xF000-0xF0FF as a band for symbol characters.
And one more thing. Is there any way to detect which font index should be used for rendering portion: FontIndex, SymbolFontIndex, AnsiFontIndex or AsianOrComplexFontIndex? Or just detect by the character codes of the Text property value?

These are ranges we use in our renderer:

CJK and ComplexTextLayot == AsianOrComplexFontIndex
Symbol == SymbolFontIndex
All other ranges == FontIndex
AnsiFontIndex is not used.

SetRangetype(0xF000, 0xF0FF, CharacterType.Symbol); // Symbol font
// CJK (chinese japan korean) characters
SetRangetype(0x2E80, 0x2EFF, CharacterType.CJK); // CJK Radicals Supplement
SetRangetype(0x3000, 0x303F, CharacterType.CJK); // CJK Symbols and Punctuation
SetRangetype(0x3040, 0x309F, CharacterType.CJK); // Hiragana
SetRangetype(0x30A0, 0x30FF, CharacterType.CJK); // Katakana
SetRangetype(0x3100, 0x312F, CharacterType.CJK); // Bopomofo
SetRangetype(0x3130, 0x318F, CharacterType.CJK); // Hangul Compatibility Jamo
SetRangetype(0x31C0, 0x31EF, CharacterType.CJK); // CJK Strokes
SetRangetype(0x31F0, 0x31FF, CharacterType.CJK); // Katakana Phonetic Extensions
SetRangetype(0x3200, 0x32FF, CharacterType.CJK); // Enclosed CJK Letters and Months
SetRangetype(0x3300, 0x33FF, CharacterType.CJK); // CJK Compatibility
SetRangetype(0x3400, 0x4DBF, CharacterType.CJK); // CJK Unified Ideographs Extension A
SetRangetype(0x4E00, 0x9FFF, CharacterType.CJK); // CJK Unified Ideographs
SetRangetype(0xAC00, 0xD7AF, CharacterType.CJK); // Hangul Syllables
SetRangetype(0xF900, 0xFAFF, CharacterType.CJK); // CJK Compatibility Ideographs
SetRangetype(0xFE30, 0xFE4F, CharacterType.CJK); // CJK Compatibility Forms
// additional
SetRangetype(0xFF01, 0xFFFF, CharacterType.CJK);
// Full- and Half- width characters are treated by powerpoint as CJK

// CTL (complex text layout) characters
SetRangetype(0x0590, 0x05FF, CharacterType.ComplexTextLayout); // Hebrew
SetRangetype(0x0600, 0x06FF, CharacterType.ComplexTextLayout); // Arabic
SetRangetype(0x0750, 0x077F, CharacterType.ComplexTextLayout); // Arabic Supplement
SetRangetype(0x0900, 0x097F, CharacterType.ComplexTextLayout); // Devanagari
SetRangetype(0x0980, 0x09FF, CharacterType.ComplexTextLayout); // Bengali
SetRangetype(0x0A00, 0x0A7F, CharacterType.ComplexTextLayout); // Gurmukhi
SetRangetype(0x0A80, 0x0AFF, CharacterType.ComplexTextLayout); // Gujarati
SetRangetype(0x0B00, 0x0B7F, CharacterType.ComplexTextLayout); // Oriya
SetRangetype(0x0B80, 0x0BFF, CharacterType.ComplexTextLayout); // Tamil
SetRangetype(0x0C00, 0x0C7F, CharacterType.ComplexTextLayout); // Telugu
SetRangetype(0x0C80, 0x0CFF, CharacterType.ComplexTextLayout); // Kannada
SetRangetype(0x0D00, 0x0D7F, CharacterType.ComplexTextLayout); // Malayalam
SetRangetype(0x0D80, 0x0DFF, CharacterType.ComplexTextLayout); // Sinhala
SetRangetype(0x0E00, 0x0E7F, CharacterType.ComplexTextLayout); // Thai