Search Text inside a pdf document

Hi,

We are using Aspose.pdf.kit in a asp.net application.

Implementation is like:
on user request a pdf file (already uploaded in the portal) is being shown in custom pdf viewer interface. The pdf pages are appearing in image format.

Here we need to incorporate the 'search text' facility in the viewer UI. Where user will key in a text and will try to find it across the pages of a pdf document. Ideally the search text need to be highlighted on the pdf pages (images) and on click of a FindNext button it will move between pages (images).

How to acheive the same using Aspose.pdf.kit for .net component?

Hi Jaydip,

You may try to use Aspose.Pdf.Kit for .NET to find and replace text along with highlighting it using ReplaceText method. However, this will be done behind the scene in your code and you’ll be able to show the updated PDF after highlighting the text. Or if you’re showing the images then you can again convert the PDF to image after highlighting and then show that updated image.

I hope this helps. If you find any further questions, please do let us know.
Regards,

Thanks Shahzad for your help.

I need to check that how realistic it will be to implement the suggested workaround, cuz it may delay the process of finding text inside the file and user also need to browse each page to find the highlighted text.

However is there any workaround so we can atleast get the page no(s). on which the text exists??

Hi Jaydip,

I would like to share with you that there is not direct way to get the list of the pages where a particular text segment is present; however, with the [merged Aspose.Pdf](https://blog.aspose.com/2011/07/01/first-version-of-merged-aspose.pdf-for-.net-is-now-available-for-download), we have provided more flexibility to find the text. Using this new API, you can navigate through pages and search the text from each page. This way, you’ll be able to keep track of any pages containing the particular text segment. Please see the following section for more details: Working with Text.

Regards,

Hi,

Thanks for the reply. First I am trying with the Aspose.PDF.kit component, following the workaround suggested by you. But when I m replacing the text (same text with a color) it is not exeuting consistently i.e. not changing all the occurances in a document.

Am I missing something?? Or is there any other prerequisites to execute the same??

Any help will be appreciated.

Thanks.

Hi Jaydip,

Please try to set the ReplaceTextStrategy property of the PdfContentEditor object. You need to set the Scope value to REPLACE_ALL.

I hope this helps. If you find any further questions, please do let us know.
Regards,

Hi Shahzad,

Thanks for your reply. I have used the replacetextstrategy property like:

contentEditor.ReplaceTextStrategy.ReplaceScope= ReplaceTextStrategy.Scope.REPLACE_ALL;

It is ending up with 2 scenarios like:

1) After that the ReplaceText method execution is taking longtime to run and atlast ending up with error 'System.OutOfMemoryException' was thrown' .

2) For another file (converted from word to pdf) it is passing through successfully but replacing any text.

Any clue, why this is happening??

Thanks.

Hi Jaydip,

Please share both the input PDF files with us, so we could investigate the issue at our end. Also, please provide the code snippet which is causing the particular problem. Moreover, please mention which particular file is causing which problem, so we would be able to focus on the issues properly.

We’re sorry for the inconvenience and looking forward to help you out.
Regards,

Hi Shahzad,

I have attached a fileset for your reference.

Appreciate your effort in helping me out.

Thanks.

Hi Jaydip,

Thank you very much for sharing the sample files with us. We’ll investigate the issue at our end and you’ll be updated with the results the earliest possible.

We’re sorry for the inconvenience.
Regards,

Hi Jaydip,

I have tested these issues using the latest version (5.7.0), but couldn’t reproduce the problems at my end. It works fine in both scenarios. Could you please make sure that you’re using the latest version? If it still doesn’t resolve your issue then please share the .NET Framework version, OS and your development/deployment environment where this issue is occurring, so we would be able to investigate the issue at our end using your particular scenario.

We’re sorry for the inconvenience.
Regards,

Hi Shahzad,

Yes, it is working now with the recommended version.

Only one problem persists like on highlighting the searched text with

textProperties.BackgroundColor = System.Drawing.Color.Yellow;

contentEditor.ReplaceTextStrategy.ReplaceScope= ReplaceTextStrategy.Scope.REPLACE_ALL;

//replace text with background color

contentEditor.ReplaceText("SOP", "SOP", textProperties);

  • It is highlighting not on the correct position of the word found on the document
  • It seems covering foreground of the text rather marking the background.
  • While opening the new pdf file (with highlighted words) it is throwing an error.

(attached the pdf for your ref.)

Please help me out on these last issues and we are close to the solution.

Thanks for alll your help.

Also attached another file. Try to find and replace the word 'LUXEON '

it is not happening with the new version too.

Hi Jaydip,

I have reproduced both the problems and logged them as shown below:

PDFKITNET-29494 - Background color is not set properly
PDFKITNET-29495 - Word is not replaced

Our team will look into these issues and you’ll be updated via this forum thread once they’re resolved.

We’re sorry for the inconvenience.
Regards,

HI Shahzad,

Thanks for all your effort for solving the issue.

I will wait for the solution.

Regards.

HI Shahzad,

Any findings on the problem??
I am waiting for a solution so I can go ahead with the project with this component.

Thanks.

Hi Jaydip,

I have asked our team to share the ETA of these issues. You’ll be updated as soon as the response is received.

Regards,

Hello Shahzad,

This is Gan and am looking into this post. I downloaded the evaluation version and am getting the same error. Was there any fix/workaround for the same?

Thanks,

Gan

Hi Gan,

I’m sorry to share with you that these two issues are not yet resolved. Our team is working on these issues. We’ll get in touch with our development team and you’ll be notified via this forum thread once these issues are resolved.

We’re sorry for the inconvenience.
Regards,