Hi Christophe,
Thanks
for using our API’s.<o:p></o:p>
I have tested the scenario and I am able to reproduce the same problem. For the sake of correction, I have logged it in our issue tracking system as PDFNEWNET-38558. We will investigate this issue in details and will keep you updated on the status of a correction.
We apologize for your inconvenience.
Hi Christophe,
Hi Christophe,
Hi Team,
The product team has started investigating earlier reported issue but I am afraid its not yet resolved. However I have intimated them to share current updates and share any possible ETA. As soon as we have required information, we will let you know.
Thanks for your patience.
The earlier logged issue PDFNET-38558 was investigated and according to the findings by our product team, if problem symbol(vertical line) is copied into another text editor, such as MS Word, then it is displayed as horizontal line. Also line of text with problem symbol was decoded and tested with another font library, problem symbol was drawn as horizontal line too.
Root of problem is that Aspose.Pdf and Acrobat get different unicodes for problem symbol (code 0x1ED3).
Aspose.Pdf gets value U+30FC, whereas for Acrobat it is currently unknown value. Aspose.Pdf functionality to decode text was implemented in accordance to PDF specification and no violations were found for input PDF document. But it seems that Adobe Acrobat uses another mechanism to decode input content to Unicode, cause probability to use symbol (glyph) vertical line instead of horizontal line for common Unicode (horizontal line) is very low.
Also it was found that if font with problem symbol is embedded into document, resultant image is correct. In this case there is no need in Unicode to get symbols (glyph), another mechanism to decode input codes on symbols (glyph) is used which uses direct mapping between input codes (from PDF) and symbols (glyph) from font and usage of embedded font is proving idea about Acrobat Unicode collision - if we use embedded font whole PDF document is converted correctly (with vertical line in correspondent place).
So if it’s possible to embed problem font in document (font Ryumin-Medium), please, use this approach and in this case document will be converted well. Also common CJK fonts can be used instead of Ryumin-Medium, e.g MS Gothic font, but this font has to be embedded to get correct image.
If there is no possibility to embed font into document - unfortunately this error can’t be corrected, cause there are no ways to detect Acrobat’s decoding logic for problem documents like ‘news_no29.pdf’.
Some experiments were made with content - problem symbol (vertical line) was copied and pasted into another place in the same document. And Acrobat pasted this symbol as horizontal line, with Unicode U+30FC (horizontal line). Then it was achieved to get vertical line via Acrobat “option make text direction vertical” - and it was found that Acrobat was started to use new symbol (glyph), and linked it with the same Unicode (U+30FC, horizontal line), and it’s a collision - use the same Unicode for different symbols - vertical and horizontal line, Acrobat’s logic has a collision.
But right decoding logic - is to use unicode U+007C for vertical line and U+30FC for horizontal line.
Both Unicode are common for fonts in a world. So we have a collision that Acrobat decodes the same symbol differently for display and copy/paste operations.
Also it proves that any copy/paste operation with Acrobat leads to horizontal line instead of vertical line and only displaing of symbol produces vertical line, it also looks like collision. May be usage of current logic by Acrobat software has strong arguments but this decoding mechanism is unknown for Aspose.Pdf for current time.
In case of any further assistance, please feel free to let us know.