Text extracting problem

plariviere · July 6, 2018, 4:58pm

Greetings, we are currently using Aspose.PDF to extract text from PDF file but in one particular document, we are not getting the results we were expecting.

In the attached document, if I search for “Bank of Washington”, I would have no results because it seems like the text extraction concatenated “Number” with “Bank” (making it NumberBank of Washington).

Code deleted

Is there a possible fix for this? Thank you

Farhan.Raza · July 6, 2018, 10:55pm

@plariviere

Thank you for contacting support.

The problem is probably occurring when you are trimming the text from PDF document. Instead, you can simply iterate through document and extract text without any problem. Aspose.PDF for .NET API regards new line as combination of escape sequence characters \r\n. Therefore you can avoid the problem by extracting or searching for text as elaborated in below documentation articles.

We hope this will be helpful. Please feel free to contact us of you need any further assistance.