Issue reading Hebrew text

adamp1 · February 1, 2016, 9:21am

I am using running into an issue with reading Hebrew characters using TextAbsorber.

Here is the text from my source PDF:
**** PDF DOCUMENT: \8934\אוניברסיטה.pdf

When I read this in, it shows up as:
pdf.הטיסרבינוא \PDF DOCUMENT: \8934 ****

Here is my code:

var textAbsorber = new TextAbsorber(new Aspose.Pdf.Text.TextOptions.TextExtractionOptions(Aspose.Pdf.Text.TextOptions.TextExtractionOptions.TextFormattingMode.Raw));

origPage.Accept(textAbsorber);

var pageText = textAbsorber.Text;

Can you provide insight on how I can ensure that the Hebrew characters are read in correctly?

Thanks,
Adam

tilal.ahmad · February 1, 2016, 11:11pm

Hi Adam,

Thanks for your inquiry. I have tested your scenario using Aspose.Pdf for .NET 11.2.0 and managed to observe the reported issue. For further investigation, I have logged an issue in our issue tracking system as PDFNEWNET-40209 and also linked your request to it. We will keep you updated via this thread regarding the issue status.

We are sorry for the inconvenience caused.

<span style=“font-size:10.0pt;font-family:“Arial”,“sans-serif”;mso-fareast-font-family:
Calibri;color:#333333;mso-ansi-language:EN-US;mso-fareast-language:EN-US;
mso-bidi-language:AR-SA”>Best Regards,