Search in Pages

WST · November 5, 2013, 4:26am

Hi all,

is it possible to search Text with textFragmentAbsorber in more then 1 Page, but not in all? I have to search Text in PDF Documents with more the 5000 Pages. Now i search in 1 Page, if i found the text go to Page 4 or 5 or 6 and put a Image in this selected page.

I want to search in a PDF with more than 500 Pages in 100Pages Steps, is this possible?

Thanks and Regards
Winfried

tilal.ahmad · November 6, 2013, 10:56am

Hi Winfried,

Thanks for your inquiry. I’m afraid currently TextFragmentAbsrober doesn’t support searching from set of pages, only support text search from one individual page ore all pages. We’ve logged a enhancement request as PDFNEWNET-36006 for the purpose in our issue tracking system. We will notify you via this forum thread as soon as it’s resolved.

We are sorry for the inconvenience caused.

Best Regards,

WST · November 7, 2013, 3:50am

Hi Tilal,

thanks for your Answer. It will help not now, but for the next Future helps.

Thanks and Regards
Winfried

tilal.ahmad · November 8, 2013, 4:04am

Hi Winfried,

Thanks for your feedback. Yes you are right it will become part of knowledge base and will help other community members, implementation of this feature will also help others with similar requirements. We will keep you updated about the issue progress via this forum thread.

Best Regards,

asad.ali · December 15, 2017, 8:52pm

@WST

Thanks for your patience.

Our product team has investigated the earlier logged issue PDFNET-36006 and according to the investigation, your scenario does not require any changes in Aspose.Pdf code.

TextFragmentAbsorber is already able to visit pages in any order and it stores search results for all visited pages. If you need to get text fragments for page numbers from 101 to 200, just create a new TextFragmentAbsorber and visit successively all those pages by the TextFragmentAbsorber instance.

Please consider the following code with latest version Aspose.Pdf for .NET 17.12:

Document doc = new Document(myDir + "pdf_reference_1_5.pdf");
            
int i = 1;
string report = String.Empty;
TextFragmentAbsorber absorber = new TextFragmentAbsorber();

while (i <= doc.Pages.Count)
{
    absorber.Visit(doc.Pages[i]);
    if (i % 100 == 0)
    {
        report = String.Format("{0} text fragments found on the pages from {1} to {2}.",
            absorber.TextFragments.Count, i - 99, i);
        Console.WriteLine(report);
        absorber = new TextFragmentAbsorber();
    }
    i++;
}
report = String.Format("{0} text fragments found on the last {1} pages.",
            absorber.TextFragments.Count, (i - 1) % 100);
Console.WriteLine(report);

In case of any further assistance, please feel free to let us know.