The extraction of images based on the paragraph node fig caption as keyword for the extraction process.
The extraction read the next sibling and extract the images.
But the new source document having previous sibling as figure caption and also having consecutive images .
So, the extraction process skip some images.
please, help me to extract the images using fig caption as previous sibling .
The input document having both previous caption as well as next caption with consecutive images.
In that case,the images with next sibling is executed first.
Some input sample having previous caption only .In that case also images with next sibling is executed first then consider the previous cases. It also take 2nd fig caption for 1st image.finally skip the images.please help me resolve the issue.please provide solution for this and previous post.
and also it grab images those having next sibling as fig caption.let me know how to delete empty pages.
I am using this(dstdoc.removeallchildren ) method for before appending into it.it also not remove the empty pages.and also how to overcome the clashes between the previous sibling and next sibling caption.
I have attached the sample document Test.zip (327.2 KB)
The expected output expected output.zip (327.8 KB)
Please make sure that you are integrating the code correctly.
We have not found this issue while using the shared code example. Could you please share some more detail about this issue? We will investigate the issues and provide you more information on this.
The sample you have shared is fine .It gave the expected output for fig caption as previous sibling.
But I m extracting the images using various section.
but some document output is mismatched.because the actual image having next sibling as fig caption but it consider the previous sibling as fig caption.for example.fig 5 is came as fig 4.
The actual output Actual output.zip (1.1 MB)The output folder having empty documents.
And figure5 is extracted as fig 4.And fig 8 is extracted as fig 7.
The code shared in this forum thread to extract the shapes works fine. We used Test2.docx as input document and have not found any issue.
As per my understanding, you have document that contains shapes with Fig caption. Some Fig captions are before shape node and some are after shape node. There is no exact condition based on which we decide either the Fig caption is before or after Shape node.
The code shared with you works fine. You just need to use it according to your requirement. Hope this answers your query.
Thanks for your inquiry. Following code example shows how to bookmark the Fig caption and Shape nodes. This also removes the content of bookmark1 (first Fig caption and Shape node). Hope this helps you.
Document doc = new Document(MyDir + "Test2.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
ArrayList nodes = new ArrayList();
int bookmark = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
//Get the paragraphs that start with "Fig".
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node nextPara = paragraph.getNextSibling();
builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
builder.startBookmark("Bookmark" + bookmark);
while (nextPara != null
&& nextPara.getNodeType() == NodeType.PARAGRAPH
&& nextPara.toString(SaveFormat.TEXT).trim().length() == 0
&& ((Paragraph)nextPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
{
if(nextPara != null)
nodes.add(nextPara);
nextPara = nextPara.getNextSibling();
}
//nextPara contains the caption of next shape
//Move the cursor to the end of paragraph
builder.moveToParagraph(paragraphs.indexOf((Paragraph)nextPara.getPreviousSibling()), -1);
builder.endBookmark("Bookmark" + bookmark);
bookmark++;
//Extract the consecutive shapes and export them into new document
Document dstDoc = new Document();
for (Paragraph para : (Iterable<Paragraph>)nodes)
{
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(para, true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
}
dstDoc.save(MyDir + "output"+i+".docx");
i++;
nodes.clear();
}
}
//Remove the content of first bookmark.
doc.getRange().getBookmarks().get("bookmark1").setText("");
doc.save(MyDir + "output.docx");