Find Text in Pdf Error

alinprisecaru-1 · August 8, 2014, 10:35am

I have an issue when using the TextFragmentAbsorber.
First let me describe the bigger picture.

I have an HTML view.
I generate a PDF file from it using HTML to Pdf conversion.
I temporarily save the file before returning it to the client.

Then I am trying to insert a form input on a certain location in the PDF file.
That location can be found by searching for “[textinput]” in the PDF.

To do that, I use the following code:

Document document = new Document(pdfFilePath);
//the file path is correct and the Document is initialised succesfully
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("[textinput]");

//here I get an error - "Object reference not set to an instance of an object."
document.Pages.Accept(textFragmentAbsorber);

So what am I doing wrong? Please help me if you know a fix.

I forgot to specify: The PDF file HAS content. So it’s not empty. The algorythm just crashes.

I made it work on a dummy test pdf file.
I will look again maybe it’s my fault in dealing with the PDF file.

I still get the error on a sepcific pdf file. I thought it shouldn’t care about the content.
I attached the file. I tested it and it crashes at the line of code I specified.

codewarior · August 8, 2014, 3:51pm

Hi Prisecaru,

Thanks for using our API’s.

I have tested the scenario and I am able to notice the same problem. For the sake of correction, I have logged this problem as PDFNEWNET-37321 in our issue tracking system. We will further look into the details of this problem and will keep you updated on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

alinprisecaru-1 · August 11, 2014, 1:13am

Thank you.
At least I know for the moment that I’m not doing anything wrong in the code itself.

alinprisecaru-1 · August 11, 2014, 1:54am

I don’t know if this is relevant or not, but here is some additional information:

The PDF document I try to parse using your methods is generated using TuesPechkin html-to-pdf converter which is created using the wkhtmltopdf open-source tool.

I don’t know, but maybe is something about the PDF file itself or some information in each page that triggers the crash.

codewarior · August 11, 2014, 2:31am

Hi Prisecaru,

Thanks for sharing the information.

During the resolution of this problem, the development team will definitely investigate the structure of PDF file and I have also shared above stated information with them. As soon as we have some definite news regarding its resolution, we will let you know.

aspose.notifier · October 8, 2014, 2:57pm

The issues you have found earlier (filed as PDFNEWNET-37321) have been fixed in Aspose.Pdf for .NET 9.7.0.

This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

alinprisecaru-1 · October 9, 2014, 2:12am

Thanks ! Works now
I really appreciate that you guys fixed this bug.

tilal.ahmad · October 10, 2014, 2:19am

Hi Prisecaru,

Thanks for your feedback. It is good to know that your issue has been resolved.

Please keep using our API and feel free to ask any question or concern. We will be more than happy to extend our support.

Best Regards,