Problem searching and extracting text

trenton.brundage · November 10, 2014, 3:25pm

I have a document where this example does not work …

//open document
Document pdfDocument = new Document(“input.pdf”);
//create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);
//get the extracted text
string extractedText = textAbsorber.Text;
// create a writer and open the file
TextWriter tw = new StreamWriter(“extracted-text.txt”);
// write a line of text to the file
tw.WriteLine(extractedText);
// close the stream
tw.Close();

It crashes on the Accept() function. Please advise.

tilal.ahmad · November 11, 2014, 12:52am

Hi Trenton,

Thanks for your inquiry. We will appreciate it if you please share your sample document here. So we will look into it and will provide you information accordingly.

We are sorry for the inconvenience caused.

Best Regards,

trenton.brundage · November 11, 2014, 1:20am

See attached file.

codewarior · November 11, 2014, 1:31am

Hi Trenton,

Thanks for sharing the resource file.

I have tested the scenario using Aspose.Pdf for .NET 9.7.0 in VisualStudio 2012 application with target platform as .NET Framework 4.5 running over Windows 7 (x64) and I am unable to notice any issue. The text is properly being extracted. For your reference, I have also attached the resultant file generated over my end.

[C#]

// open document
Document pdfDocument = new Document("c:/pdftest/testFile.pdf");

// create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();

// accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);

// get the extracted text
string extractedText = textAbsorber.Text;

// create a writer and open the file
TextWriter tw = new StreamWriter("c:/pdftest/extracted-text.txt");

// write a line of text to the file
tw.WriteLine(extractedText);

// close the stream
tw.Close();

tilal.ahmad · November 11, 2014, 1:44am

Hi Trenton,

Thanks for sharing the source file. I have tested the scenario over Win 7 64-bit using Aspose.Pdf for .NET 9.7.0 and was unable to replicate the issue.

Please download and try the latest version of Aspose.Pdf for .NET. If the issue persists, please share your environment details, so that we can investigate it further.

//open document
Document pdfDocument = new Document(myDir + "testFile.pdf");

//create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();

//accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);

//get the extracted text
string extractedText = textAbsorber.Text;

// create a writer and open the file
TextWriter tw = new StreamWriter(myDir + "report2.txt");

// write a line of text to the file
tw.WriteLine(extractedText);

// close the stream
tw.Close();

Please feel free to contact us for any further assistance.

Best Regards,

trenton.brundage · November 11, 2014, 12:31pm

Updating from 9.6 to 9.7 fixed the issue for me. Thanks much.

codewarior · November 12, 2014, 1:50am

Hi Trenton,

Thanks for the acknowledgment. We are glad to hear that your problem is resolved in latest release of Aspose.Pdf for .NET 9.7.0. Please continue using our API and in the event of any further query, please feel free to contact.