Parse PDF Document

Hi,

We have a PDF document with many pages.

On each page, on one of the rows, there exists a Tag such as "Tag ID: " and then the ID number.

Is it possible to split the files and get the IDs for each page?

Thanks!

Hi,

Thank you for considering Aspose.

You can get your required results by performing the following operations:

  1. Split your original PDF document. Please refer to:

[http://www.aspose.com/Products/Aspose.Pdf.Kit/Api/Split-the-PDF-Document-into-Single-Page-Documents.html ](http://www.aspose.com/Products/Aspose.Pdf.Kit/Api/Split-the-PDF-Document-into-Single-Page-Documents.html)

OR Extract pages from a PDF document. [http://www.aspose.com/Products/Aspose.Pdf.Kit/Api/Extract-Pages-from-a-PDF-Document.html ](http://www.aspose.com/Products/Aspose.Pdf.Kit/Api/Extract-Pages-from-a-PDF-Document.html)

  1. Extract text from each page. Please refer to:

[http://www.aspose.com/Products/Aspose.Pdf.Kit/Api/Extract-Text-from-PDF-Document.html ](http://www.aspose.com/Products/Aspose.Pdf.Kit/Api/Extract-Text-from-PDF-Document.html)

  1. Please find the required string from the text extracted from each Pdf and store it where you like.

Please let us know, if you need more help.

Thanks.

Hi,

Thanks for the quick reply!

If we know the tag is somewhere in the file, how do we get the text after the tag?

Thanks!!

Hi,

The text extracted from the PDF can be store in simple text file or save in the memory, then you can use C# strings function to find text after TAGID inside the text. Aspose.Pdf.Kit is not used to find text in text files or memory in this case just write some program to find text.

Thanks.