I am using running into an issue with reading Hebrew characters using TextAbsorber.
Here is the text from my source PDF:
**** PDF DOCUMENT: \8934\אוניברסיטה.pdf
When I read this in, it shows up as:
pdf.הטיסרבינוא \PDF DOCUMENT: \8934 ****
Here is my code:
var textAbsorber = new TextAbsorber(new Aspose.Pdf.Text.TextOptions.TextExtractionOptions(Aspose.Pdf.Text.TextOptions.TextExtractionOptions.TextFormattingMode.Raw));
origPage.Accept(textAbsorber);
var pageText = textAbsorber.Text;
Can you provide insight on how I can ensure that the Hebrew characters are read in correctly?
Thanks,
Adam
Hi Adam,
Thanks for your inquiry. I have tested your scenario using Aspose.Pdf for .NET 11.2.0 and managed to observe the reported issue. For further investigation, I have logged an issue in our issue tracking system as PDFNEWNET-40209 and also linked your request to it. We will keep you updated via this thread regarding the issue status.
We are sorry for the inconvenience caused.
<span style=“font-size:10.0pt;font-family:“Arial”,“sans-serif”;mso-fareast-font-family:
Calibri;color:#333333;mso-ansi-language:EN-US;mso-fareast-language:EN-US;
mso-bidi-language:AR-SA”>Best Regards,