Extracting text and overlaying on image of PDF


We hve a requirement which involves receiving PDFs from many sources with investment information and need to extract string and numeric data and eventually get into a database.

The extractor method with gettext etc gets the text out but what we ideally want to do is to have a stored template from each source, extract the next and display overlayed on an image of the original PDF - to allow quick verification that the data format has not changed.

Most of this is straightforward coding but we need to not only extract text data but also the positions on the screen. We have installed and got eval licence for Aspose.pdf.kit but cannot see anyway of getting this coordinate information out. Is this supported?



Hi Paul,

I’m sorry to inform you that Aspose.Pdf.Kit only allows you to extract text in raw format. If you want to further process the extracted text, you’ll have to do that in your own code. However, if you could elaborate your requirement in detail with some examples and snapshots etc. our team might be able to provide support for such a feature in our future versions.

