Unable to extract all the text from PDF using TextAbsorber

manasiak · November 2, 2018, 6:33pm

Want to extract all the text from pdf. I have below code for that.But unable to extract all the text.

var pdfDocument = new Aspose.Pdf.Document(@“OriginalPdf.pdf”);
TextAbsorber textAbsorber = new TextAbsorber();
pdfDocument.Pages.Accept(textAbsorber);
String extractedText = textAbsorber.Text;
textAbsorber.Visit(pdfDocument);
File.WriteAllText(@“demodata.txt”, extractedText);

From below pdf i want to extract the text.
Test.pdf (83.0 KB)

Text file generated contains only headers -

Evaluation Only. Created with Aspose.PDF. Copyright 2002-2018 Aspose Pty Ltd.
Maintenance

Clean

Can you please help me with this?

asad.ali · November 2, 2018, 7:28pm

@manasiak

Thanks for contacting support.

We have tested the scenario in our environment using similar code snippet and did not notice any issue. For your kind reference, an output text file is also attached. It seems you are using API without setting any valid license, which is why the output is limited or not complete.

Would you please make sure to use valid license before calling your method to extract text. In case you do not have any license yet, you can easily get a free 30-days temporary license in order to evaluate API features without any limitations.

demodata.zip (981 Bytes)