Problem searching and extracting text

I have a document where this example does not work …

//open document
Document pdfDocument = new Document(“input.pdf”);
//create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);
//get the extracted text
string extractedText = textAbsorber.Text;
// create a writer and open the file
TextWriter tw = new StreamWriter(“extracted-text.txt”);
// write a line of text to the file
tw.WriteLine(extractedText);
// close the stream
tw.Close();

It crashes on the Accept() function. Please advise.


Hi Trenton,


Thanks for your inquiry. We will appreciate it if you please share your sample document here. So we will look into it and will provide you information accordingly.

We are sorry for the inconvenience caused.

Best Regards,

See attached file.

Hi Trenton,


Thanks for sharing the resource file.

I have tested the scenario using Aspose.Pdf for .NET 9.7.0 in VisualStudio 2012 application with target platform as .NET Framework 4.5 running over Windows 7 (x64) and I am unable to notice any issue. The text is properly being extracted. For your reference, I have also attached the resultant file generated over my end.

[C#]

//open
document
<o:p></o:p>

Document pdfDocument = new Document(“c:/pdftest/testFile.pdf”);<o:p></o:p>

//create
TextAbsorber object to extract text
<o:p></o:p>

TextAbsorber textAbsorber = new TextAbsorber();<o:p></o:p>

//accept
the absorber for all the pages
<o:p></o:p>

pdfDocument.Pages.Accept(textAbsorber);<o:p></o:p>

//get
the extracted text
<o:p></o:p>

string extractedText = textAbsorber.Text;<o:p></o:p>

//
create a writer and open the file
<o:p></o:p>

TextWriter tw = new StreamWriter(“c:/pdftest/extracted-text.txt”);<o:p></o:p>

//
write a line of text to the file
<o:p></o:p>

tw.WriteLine(extractedText);<o:p></o:p>

//
close the stream
<o:p></o:p>

tw.Close();

Hi Trenton,


Thanks for sharing the source file. I have tested the scenario over win 7 64 bit using Aspose.Pdf for .NET 9.7.0 and unable to replicate the issue. Please download and try latest version of Aspose.Pdf for .NET. If issue persist then please share your environment details, so we will investigate it further.

//open document<o:p></o:p>

Document pdfDocument = new Document(myDir + "testFile.pdf");

//create TextAbsorber object to extract text

TextAbsorber textAbsorber = new TextAbsorber();

//accept the absorber for all the pages

pdfDocument.Pages.Accept(textAbsorber);

//get the extracted text

string extractedText = textAbsorber.Text;

// create a writer and open the file

TextWriter tw = new StreamWriter(myDir + "report2.txt");

// write a line of text to the file

tw.WriteLine(extractedText);

// close the stream

tw.Close();


Please feel free to contact us for any further assistance.


Best Regards,



Updating from 9.6 to 9.7 fixed the issue for me. Thanks much.

Hi Trenton,


Thanks for the acknowledgment. We are glad to hear that your problem is resolved in latest release of Aspose.Pdf for .NET 9.7.0. Please continue using our API and in the event of any further query, please feel free to contact.