Read text from PDF

Dear Support,

I am trying to read text from PDF file to extract certain data. If data is in paragraph format then it can be easily extracted however issue comes when it’s in tabular format as then it becomes tough to identify to which row or column certain text belongs. I am looking to explore if Aspose offers any solution for this issue.

image.jpg (95.1 KB)

Thanks,
Saurabh

@saurabhmauryabu

Can you please share the used sample code and input file. We will be able to investigate that on our end on provision of requested information.

Hello @mudassir.fayyaz,

Sorry for replying back late.
Please find below code snippet that we are using-

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(FilePath);
TextAbsorber textAbsorber = new TextAbsorber();
pdfDocument.Pages.Accept(textAbsorber);
extractedText = textAbsorber.Text;

Attached sample file that we are using-
Sample.pdf (25.6 KB)

@saurabhmauryabu

I request you to try the code from Extract Table from Existing PDF Document article and share your feedback.

Hello @mudassir.fayyaz,

I have used code given at Extract Table from PDF and getting output as
below. Notice that entire text is not coming in output-
image.png (32.3 KB)

Please suggest how to get entire text.

Thanks,
Saurabh

@saurabhmauryabu

Are you using it with latest version and valid license because I can extract the text fine. You should apply license before making any calls to API methods. You can get 30-days free temporary license in case you do not have one to evaluate API without any limitation. In case you still face any issue, please share a sample application.