Thanks for sharing your requirement in detail. Please spare us some time for the analysis of your desired output. We will get back to you soon with code example according to your requirement.
Best Regards,
Tahir Manzoor
Thanks for sharing your requirement in detail. Please spare us some time for the analysis of your desired output. We will get back to you soon with code example according to your requirement.
Best Regards,
Tahir Manzoor
Thank you… waiting eagerly for reply…
Thanks for your patience. Please use following code example to achieve your requirement. Hope this helps you.
Document doc = new Document(MyDir + "Imageproblem.docx");
int i = 1;
ArrayList nodes = new ArrayList();
//Remove empty paragraphs
for (Paragraph paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
if (paragraph.toString(SaveFormat.TEXT).trim().length() == 0
&& paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0
&& paragraph.getText().contains(ControlChar.PAGE_BREAK) == false) {
paragraph.remove();
}
}
//Get the paragraphs that start with "Fig".
for (Paragraph paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node previousPara = paragraph.getPreviousSibling();
while (previousPara != null
&& previousPara.getNodeType() == NodeType.PARAGRAPH
&& previousPara.toString(SaveFormat.TEXT).trim().length() == 0
&& ((Paragraph)previousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
{
if(previousPara != null)
nodes.add(previousPara);
previousPara = previousPara.getPreviousSibling();
}
//Extract the consecutive shapes and export them into new document
Document dstDoc = new Document();
for (Paragraph para : (Iterable<Paragraph>)nodes)
{
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(para, true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
}
dstDoc.save(MyDir + "output"+i+".docx");
i++;
nodes.clear();
}
}
Thank you Tahir… It is absolutely working.
Regards
Priya Dharshini J P
Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.
Thank you tahir.
While executing the input document test15.zip (363.8 KB)
as attached with the above mentioned logic, output is created as expected but with many blank documents. Is there a way to create only documents with images and avoiding blank document creation.
Regards
Priya Dharshini J P
Mismatch.zip (448.0 KB)
Hi team,\
By using the above logic, All group images in problem document output after execution is produced with mismatch, the order in which images are created is reverse. Expected Output is attached. Kindly help out.
Regards
Priya Dharshini J P
Hi team,
Also requesting a solution to delete/remove extracted contents from source document after embedding into new document in order to avoid repetition of images.
Thankin you
Due to time consistency, requesting solution as soon as possible.
Regards
Priya Dharshini J P
Thanks for your inquiry. You want to extract images from Word document before the text that starts with “Fig” or “Figure”. You also want to remove the empty paragraphs from the output document. We already shared the solution to your queries in following thread. Please use the same approach and modify the code according to your use cases.
Best Regards,
Tahir Manzoor
But my problem is group images created are in reverse order from source document. And many blank documents are created during execution. In addition to it I request you to delete/remove the images extracted after execution from source document.
Thanking you
Thank you @tahir.manzoor
using the code mentioned above,
Document doc = new Document(MyDir + “Imageproblem.docx”);
int i = 1;
ArrayList nodes = new ArrayList();
//Remove empty paragraphs
for (Paragraph paragraph : (Iterable) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
if (paragraph.toString(SaveFormat.TEXT).trim().length() == 0
&& paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0
&& paragraph.getText().contains(ControlChar.PAGE_BREAK) == false) {
paragraph.remove();
}
}
//Get the paragraphs that start with “Fig”.
for (Paragraph paragraph : (Iterable) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith(“Fig”))
{
Node previousPara = paragraph.getPreviousSibling();
while (previousPara != null
&& previousPara.getNodeType() == NodeType.PARAGRAPH
&& previousPara.toString(SaveFormat.TEXT).trim().length() == 0
&& ((Paragraph)previousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
{
if(previousPara != null)
nodes.add(previousPara);
previousPara = previousPara.getPreviousSibling();
}
//Extract the consecutive shapes and export them into new document
Document dstDoc = new Document();
for (Paragraph para : (Iterable<Paragraph>)nodes)
{
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(para, true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
}
dstDoc.save(MyDir + "output"+i+".docx");
i++;
nodes.clear();
}
}
I have the following difficulties:
Many Blank/Empty Documents are created during execution.
Consecutive images(Group Images) are appearing in reverse order.
(For example: If 3 consecutive images are in source document then, first images appears in last and last image appears at first in output document. )
After extraction of images to new document, inorder to avoid repetition of same image being extracted again, I request a work around solution delete/remove that image from source document.
I am in need of such a workaround, hope you can help me out.
Thanking you for helping out.
The above mentioned solution is working fine for consecutive images except for reversal order.
Regards
Priya
Thanks for your inquiry. We have modified the code according to your requirements. Please use the following modified code example.
Document doc = new Document(MyDir + "Problem.docx");
int i = 1;
ArrayList nodes = new ArrayList();
//Remove empty paragraphs
for (Paragraph paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
if (paragraph.toString(SaveFormat.TEXT).trim().length() == 0
&& paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0
&& paragraph.getText().contains(ControlChar.PAGE_BREAK) == false) {
paragraph.remove();
}
}
//Get the paragraphs that start with "Fig".
for (Paragraph paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node previousPara = paragraph.getPreviousSibling();
while (previousPara != null
&& previousPara.getNodeType() == NodeType.PARAGRAPH
&& previousPara.toString(SaveFormat.TEXT).trim().length() == 0
&& ((Paragraph)previousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
{
if(previousPara != null)
nodes.add(previousPara);
previousPara = previousPara.getPreviousSibling();
}
if(nodes.size() > 0)
{
//Reverse the node collection.
Collections.reverse(nodes);
//Extract the consecutive shapes and export them into new document
Document dstDoc = new Document();
for (Paragraph para : (Iterable<Paragraph>)nodes)
{
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(para, true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
}
//Remove the first empty paragraph
if(dstDoc.getFirstSection().getBody().getFirstParagraph().toString(SaveFormat.TEXT).trim().length() == 0)
dstDoc.getFirstSection().getBody().getFirstParagraph().remove();
dstDoc.save(MyDir + "output"+i+".docx");
i++;
nodes.clear();
}
}
}
The above code example does not import duplicate images in output document.
Best Regards,
Tahir Manzoor
Thank you @tahir.manzoor, the code is doing excellently as we expected. but due to extraction of inline images that are extracted using different mechanisms that you had mentioned at earlier stages, we get duplicate images, so to avoid that, we request a form to delete/remove images extracted after extraction from source document to avoid duplication. I am very thankful to your continuous support and solutions. We are able to perform well with the absolutely perfect replies from @tahir.manzoor
Thanking You
Priya Dharshini J P
Thanks for your inquiry. We have not found the duplicate images issue in output documents. Could you please share the following resources here for testing?
Thanks for your cooperation.
Best Regards,
Tahir Manzoor
Code.zip (7.1 KB)
Support.zip (1.9 MB)
test (15).zip (363.8 KB)
test (8).zip (2.7 MB)
test (2).zip (2.7 MB)
Hi @tahir.manzoor,
I have attached the Code we have been using for extraction from your solutions, test files which will produce duplication when executed.
Thanking you for all the help @tahir.manzoor, We await solution.
Regards
Priya Dharshini J P
Thanks for sharing the input documents. In case you are using old version of Aspose.Words, we suggest you please use latest version of Aspose.Words for Java 17.6. We have not found any duplicate images in output documents while using code example shared in following post.
Output documents : output test 8.zip (1.9 MB)
output test (2).zip (1.6 MB)
output test (15).zip (260.5 KB)
Best Regards,
Tahir Manzoor