Extract images from the document based on Fig Caption using Java

Swaran · June 15, 2020, 12:30pm

Hi team,
I have shared the Input for the parallel fig caption and as well the needed output.

Input File :: Input.zip (175.9 KB)

Output File :: Output.zip (88.9 KB)

Thanks in Advance !!

tahir.manzoor · June 15, 2020, 7:53pm

In your case, we suggest you following solution.

Iterate over paragraph nodes. You can get the paragraph nodes using CompositeNode.GetChildNodes method.
Get the text of Paragraph node using Node.ToString method.
If paragraph text contains “Fig” twice, get the previous Node of Paragraph.
Get the Shape nodes from the previous node of paragraph using CompositeNode.GetChildNodes method.
Insert the Shape node into new document and save it to PDF. You can use NodeImporter.ImportNode method to import a node from one document into another.

Hope this helps you.