I am using the code below to extract text from each page of a pdf file. An exception is raised when the final page (page 158) is processed. The same exception is also raised if I extract text from all pages at once. The problem pdf file is attached. Thanks.
Dim doc As New Document(strFile)
Dim strPageWords As String = String.Empty
Dim intPages As Integer = doc.Pages.Count
For intPage As Integer = 1 To intPages
Dim ta As New Text.TextAbsorber()
doc.Pages(intPage).Accept(ta)
strPageWords = ta.Text
Next
System.NullReferenceException: Object reference not set to an instance of an object. at ...ctor( ) at ..( ) at ..( ) at ..( ) at ..() at ..(Queue , , ) at ..( , ) at ..() at ...ctor( ) at Aspose.Pdf.Text.TextAbsorber.Visit(Page page) at Aspose.Pdf.Page.Accept(TextAbsorber visitor)
Thank you for sharing the template file and sample code.
I have tested your scenario with the latest version of Aspose.Pdf
for .NET v6.5 and did not find the issue reported by you. Please download and
try the latest version and check if it works fine for you.
This continues to fail for me. I am using VB in VS 2010 and the Aspose.Pdf ver 6.5 dll from the net4.0 folder from the dll only download. Please check for some issue. Thanks.
After further testing, I am able to regenerate your issue. Your issue has been registered in our issue tracking system with issue id: PDFNEWNET-32894. You will be notified via this forum thread regarding any updates against your issue.
Thanks for your patience. I am pleased to share the issue PDFNEWNET-32894 reported earlier has been fixed but I am afraid now we have encountered another issue where complete text of PDF file is not being extracted. For the sake of correction, I have separately logged this problem as PDFNEWNET-33225 in our issue tracking system. We will further look into the details of this problem and will keep you updated on the status of correction. Please be patient and spare us little time. We are really sorry for this inconvenience.
We have further investigated the issue “PDFNEWNET-33225: Complete text is not being extracted from PDF file” and in order to extract the complete text, please try using the following code snippet.
Please try using the latest release version of Aspose.Pdf for .NET 7.9.0 and in case you still face the same issue or you have any further query, please feel free to contact.
C#
//open document
Document pdfDocument = new Document("c:/pdftest/LJP2015_use_enww.pdf");
//string to hold extracted text
string extractedText = "";
foreach (Page pdfPage in pdfDocument.Pages)
{
using (MemoryStream textStream = new MemoryStream())
{
//create text device
TextDevice textDevice = new TextDevice(); //set text extraction options - set text extraction mode (Raw or Pure)
Aspose.Pdf.Text.Textoptions.TextExtractionOptions textExtOptions = new Aspose.Pdf.Text.Textoptions.TextExtractionOptions(Aspose.Pdf.Text.Textoptions.TextExtractionOptions.TextFormattingMode.Pure);
textDevice.ExtractionOptions = textExtOptions; //convert a particular page and save text to the stream
textDevice.Process(pdfPage, textStream); //close memory stream
textStream.Close();
//get text from memory stream
extractedText += Encoding.Unicode.GetString(textStream.ToArray());
}
}
File.WriteAllText("c:/pdftest/LJP2015_use_enww.txt", extractedText);
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
Enables storage, such as cookies, related to analytics.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.