PDF Content and Pages Numbers

ShaneRussell · January 21, 2009, 5:31pm

I am wanting to parse the content of a PDF file and create bookmarks on particular pages. Can I seach for text in a PDF and return the page number the found text is on?

ShaneRussell · January 21, 2009, 5:46pm

Think I have found my own answer.

PDFExtractor.GetNextPageText

Getting the text a page at a time. If there is a better way happy to know it.

Cheers guys.

codewarior · January 21, 2009, 7:22pm

Hello Shane,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for considering Aspose.

Aspose.Pdf.Kit for Java has a class named PdfSearcher which offers the capability to search the particular text in a rectangle, but I am afraid it lacks the capability to return the page number over which the text is found. You can accomplish the programmatically, using setStartPage and setEndPage and searching for certain text pattern using searchTextInRectangle.

FYI: PdfExtractor is a class which can be used to extract the text or Image contents from the Pdf file, as a whole, and in order to search for a particular text string, you would have to parse the document contents programmatically and search for the text pattern by yourself, but this method will not help out, in retrieving the page number.

ShaneRussell · January 21, 2009, 7:31pm

Thanks for that Nayyer.

Have gone with PDF Extractor. Using the GetNextPageText method I can keep track of the page numbers myself so I am able to write the correct bookmarks when I get the search match.

Got it working about 10 minutes ago.

Thanks for the quick response. You guys rock.

Cheers,

Shane