Reading existing pdf files

tchai787 · January 7, 2010, 11:25am

I’m looking for a .NET component that can read existing pdf files and know what’s on a page, be it tables with text, text in a box, or paragraph of text. Basically knows what objects (tables, text, decision box, text in box, lines drawn with arrow, etc.) are on a page. I need to access the text within its context. For example, which table, row and column the text is in. Can Aspose.pdf handle this type of requirements? Same requirements with Word documents. Can Aspose.word handle this? Thanks.

shahzadlatif · January 8, 2010, 4:37am

Hi Thomas,

Thank you very much for considering Aspose.

I would like to inform you that currently Aspose.Pdf.Kit allows you to extract text, extract images, extract annotations, extract bookmarks, extract attachment etc. However, I’m afraid that you can’t extract text with its particular context; the text extracted using ExtractText method is raw text.

Nevertheless, if you can elaborate your requirement along with a sample PDF file, we’ll try to support this feature in our future versions.

We’re sorry for the inconvenience.
Regards,