Special characters problem

x_ray_forever · August 13, 2008, 3:37am

Aspose.Slides version is 2.8.6.0
TextFrame.Text property returns wrong character codes for special characters.
Attached presentation contains text frame with character 0xfc, but Aspose.Slides.TextFrame.Text[0] for that frame contains character with code 0xf0fc. The same problem with bullets.

shakeel.faiz · August 13, 2008, 6:11am

Dear Dmitry,

Aspose.Slides is showing the right code, the code 0xfc is actually for character ü, you can verify it by writing ü in a text file and opening it in a browser after saving it with extension .html

x_ray_forever · August 13, 2008, 8:06am

A sample app code is attached. Please build it with MSVS 2008. And open ppt file attached above using this app. It shows character code as 61692 (0xf0fc in hex). But character has code 0xfc (252 in dec). Am I doing anything wrong?

alcrus · August 13, 2008, 8:36am

Ppt files can have 8bit ANSI or 16bit Unicode characters. Most probably PowerPoint saved this character as unicode. That’s ok.
Aspose.Slides also converts everything to unicode so 0xfc character after conversion becomes 0xf0fc.
If you need convert it back just clear 8 high bits.

ch = ch & 0x00ff;

x_ray_forever · August 13, 2008, 9:47am

I think it is wrong solution… What if inserted character code is realy unicode? (Attached PPT demonstrates such case)

May be it is a bug in Aspose ANSI->UNICODE converter?

alcrus · August 13, 2008, 10:25am

That is definitely not a bug.
I wrote only about ANSI characters converted to Unicode.
That doesn’t mean you can use this way for any Unicode character of course.

ANSI mapped to Unicode table in the range 0xF000 - 0xF0FF.
In the second presentation unicode character is 0x25CA.

x_ray_forever · August 13, 2008, 11:06am

Ok. Is there a way to detect if portion contains Unicode or ANSI charset?
Are all characters in the range 0xF000 - 0xF0FF are mapped from ANSI or there is a special flag for such cases?

alcrus · August 13, 2008, 2:21pm

Yes, all characters in this range are mapped from ANSI.

x_ray_forever · August 14, 2008, 3:40am

Ok. I found that MS uses 0xF000-0xF0FF as a band for symbol characters.
And one more thing. Is there any way to detect which font index should be used for rendering portion: FontIndex, SymbolFontIndex, AnsiFontIndex or AsianOrComplexFontIndex? Or just detect by the character codes of the Text property value?

alcrus · August 14, 2008, 6:20am

These are ranges we use in our renderer:

CJK and ComplexTextLayot == AsianOrComplexFontIndex
Symbol == SymbolFontIndex
All other ranges == FontIndex
AnsiFontIndex is not used.

SetRangetype(0xF000, 0xF0FF, CharacterType.Symbol); // Symbol font
// CJK (chinese japan korean) characters
SetRangetype(0x2E80, 0x2EFF, CharacterType.CJK); // CJK Radicals Supplement
SetRangetype(0x3000, 0x303F, CharacterType.CJK); // CJK Symbols and Punctuation
SetRangetype(0x3040, 0x309F, CharacterType.CJK); // Hiragana
SetRangetype(0x30A0, 0x30FF, CharacterType.CJK); // Katakana
SetRangetype(0x3100, 0x312F, CharacterType.CJK); // Bopomofo
SetRangetype(0x3130, 0x318F, CharacterType.CJK); // Hangul Compatibility Jamo
SetRangetype(0x31C0, 0x31EF, CharacterType.CJK); // CJK Strokes
SetRangetype(0x31F0, 0x31FF, CharacterType.CJK); // Katakana Phonetic Extensions
SetRangetype(0x3200, 0x32FF, CharacterType.CJK); // Enclosed CJK Letters and Months
SetRangetype(0x3300, 0x33FF, CharacterType.CJK); // CJK Compatibility
SetRangetype(0x3400, 0x4DBF, CharacterType.CJK); // CJK Unified Ideographs Extension A
SetRangetype(0x4E00, 0x9FFF, CharacterType.CJK); // CJK Unified Ideographs
SetRangetype(0xAC00, 0xD7AF, CharacterType.CJK); // Hangul Syllables
SetRangetype(0xF900, 0xFAFF, CharacterType.CJK); // CJK Compatibility Ideographs
SetRangetype(0xFE30, 0xFE4F, CharacterType.CJK); // CJK Compatibility Forms
// additional
SetRangetype(0xFF01, 0xFFFF, CharacterType.CJK);
// Full- and Half- width characters are treated by powerpoint as CJK

// CTL (complex text layout) characters
SetRangetype(0x0590, 0x05FF, CharacterType.ComplexTextLayout); // Hebrew
SetRangetype(0x0600, 0x06FF, CharacterType.ComplexTextLayout); // Arabic
SetRangetype(0x0750, 0x077F, CharacterType.ComplexTextLayout); // Arabic Supplement
SetRangetype(0x0900, 0x097F, CharacterType.ComplexTextLayout); // Devanagari
SetRangetype(0x0980, 0x09FF, CharacterType.ComplexTextLayout); // Bengali
SetRangetype(0x0A00, 0x0A7F, CharacterType.ComplexTextLayout); // Gurmukhi
SetRangetype(0x0A80, 0x0AFF, CharacterType.ComplexTextLayout); // Gujarati
SetRangetype(0x0B00, 0x0B7F, CharacterType.ComplexTextLayout); // Oriya
SetRangetype(0x0B80, 0x0BFF, CharacterType.ComplexTextLayout); // Tamil
SetRangetype(0x0C00, 0x0C7F, CharacterType.ComplexTextLayout); // Telugu
SetRangetype(0x0C80, 0x0CFF, CharacterType.ComplexTextLayout); // Kannada
SetRangetype(0x0D00, 0x0D7F, CharacterType.ComplexTextLayout); // Malayalam
SetRangetype(0x0D80, 0x0DFF, CharacterType.ComplexTextLayout); // Sinhala
SetRangetype(0x0E00, 0x0E7F, CharacterType.ComplexTextLayout); // Thai

andrey.potapov · January 19, 2022, 9:58am

A post was split to a new topic: PowerPoint Found a Problem with Content for Special Symbol

andrey.potapov · January 19, 2022, 9:59am