Thanks for your inquiry. Please share your input, existing output and expected output documents here as ZIP file. We will look into these and will guide you accordingly.
Thanks for sharing the resources. But I am afraid Wave Propagations(2).zip and expected output.zip files have same results.page13_Fig_1_Fig_3.zip (160.1 KB)
However, as per my understanding you want to extract image and related caption together. Please find updated code snippet. Hopefully it will help you to accomplish the task.
Document interimdoc = new Document("Wave Propagation(004)_page13.docx");
int i = 1;
ArrayList nodes = null;
// Get the paragraphs that start with "Fig".
for (Paragraph paragraph : (Iterable<Paragraph>)interimdoc
.getChildNodes(NodeType.PARAGRAPH, true))
{
// If want to include captions with Image
nodes = new ArrayList();
if (paragraph.toString(SaveFormat.TEXT).trim()
.startsWith("Fig"))
{
nodes.add(paragraph);
Node previousPara = paragraph.getPreviousSibling();
while (previousPara != null
&& previousPara.getNodeType() == NodeType.PARAGRAPH
&& previousPara
.toString(SaveFormat.TEXT)
.trim().length() == 0
&& ((Paragraph)previousPara).getChildNodes(
NodeType.SHAPE, true).getCount() > 0)
{
if (previousPara != null)
nodes.add(previousPara);
previousPara = previousPara.getPreviousSibling();
}
if (nodes.size() > 0)
{
// Reverse the node collection.
Collections.reverse(nodes);
// Extract the consecutive shapes and export them into
// new document
Document dstDoc = new Document();
dstDoc.removeAllChildren();
dstDoc.ensureMinimum();
for (Paragraph para : (Iterable<Paragraph>)nodes)
{
NodeImporter importer = new NodeImporter(interimdoc,
dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(para, true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
dstDoc.save("E:/data/image_" + i + ".docx");
}
i++;
nodes.clear();
}
}
}
Thanks for your feedback. Please note while condition is failing for Fig 6 because its parent paragraph contains some text runs. You can remove following condition from while loop, it will help you to resolve the issue.
Furthermore, please check document explorer example, a very useful example. It will help you to understand the document object model(DOM) of a document and tune your code accordingly.