Free Support Forum - aspose.com

Extract Images or Tables from PDF along with previous paragraph

Hello,

We have already used Aspose.Words to retrieve the images/tables along with the corresponding text.
We managed to do that because we can iterate section by section, paragraph by paragraph and also we can get the images under a paragrpah using ‘paragraph.GetChildNodes(Aspose.Words.NodeType.Shape, true)’.

Now our word documents have to be replaced by PDF files and we would like to retain the same functionality. I have gone through the Aspose.PDF documentation and also saw the examples provided in the documentation, but I could not find a way to achieve our need. Could you please let me know if it is indeed possible to retrieve images/tables (along with text) from a PDF file.

Just an example:
Paragraph1
Paragraph2
Image1
Image2
Paragraph3
Image3

We need to extract images and the corresponding text like this:
Document 1 : Paragraph 2 + Image 1
Document 2 : Paragraph 2 + Image 2
Document 3 : Paragraph 3 + Image 3

So basically we need to retrieve the image and its preceding paragraph. We have already done it using Aspose.Words. But can not find a way to do that in Aspose.PDF.

Thanks in advance.

@ssingh05

We are afraid this is not possible in case of PDF files. Unlike the word files in flow format, PDF files are fixed format documents. You can only extract images, paragraphs and tables separately.