We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Custom Font Encoding when Extracting PDF Text

We are using the TextAbsorber class to extract text out of some PDF reports. Most of the PDFs we have come across have been encoded using ANSI, but we recently came across one with custom font encodings. When viewing the PDF in Adobe, everything appears to be correct, but the TextAbsorber is not extracting the text in a usable way. Is there something that we can do to handle this using Aspose.Pdf?

I have provided some PDF property info (PDFProperties_1, PDFProperties_2) as well as a
CodeSnippet. I cannot provide the the PDF as the information is proprietary.

This seems to be related to PDFJAVA-36721 from this post

@seanJohnsonRSI

Thank you for contacting support.

We would like to update you that the feature of getting custom encoding is not supported yet. As you have noticed, PDFJAVA-36721 is already logged as a feature request. However, please note that attachments are accessible to thread owner and Aspose staff only. Source PDF document is required so that we may efficiently address your concerns.

It might be important to note that I’m using .Net. Sorry for not including that information earlier.

@seanJohnsonRSI

Thank you for the information.

Requested feature will be supported in .NET as well as Java version, alike. We will let you know as soon as some significant updates will be available in this regard.