Remove the extracted image from the document

Hi Team,

My Requirement is to copy/extract shapes in docx to new docx based on figure caption starting with “Fig” using paragraph.

But the issue is to requesting a solution to delete/remove extracted contents from source document after embedding into new document in order to avoid repetition of images.Please,kindly solve this issue.

Thanks & Regards,
Priyanga G

@priyanga,

Thanks for your inquiry. Please ZIP and attach your input Word document and expected document here for testing. Please create your expected document by using MS Word. We will then provide you code to produce output similar to your expected document by using Aspose.Words.

Hi @awais.hafeez,

Thanks for your feedback.Please kindly help me to solve the below issues.

  1. Here i have attached the input sample and expected output.
  2. Here i have attached the sample code which generate the repetition of images .From this kindly help me to modify the code to remove extracted contents from source document after embedding into new document

1)Sample1:sampleforpartfigure.zip (1.6 MB)

Expected output: expected Output.zip (1.4 MB)

Actual Output: Actual Output.zip (2.1 MB)

2)Sample source:sample source.zip (882 Bytes)

Thanks & Regards,
Priyanga G

@priyanga,

The following code example will copy the last Shape to a new Document and save it. It also removes the last Shape from the source document. You can build on the same logic.

Document doc = new Document("D:\\temp\\sampleforpartfigure\\sampleforpartfigure.docx");

NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
int count = shapes.getCount();

// Save the target Shape in a new Document
Shape lastShape = (Shape) shapes.get(count - 1);
Document dstDoc = (Document) doc.deepClone(false);
dstDoc.ensureMinimum();
Shape importedShape = (Shape) dstDoc.importNode(lastShape, true);
dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(importedShape);
dstDoc.save("D:\\Temp\\sampleforpartfigure\\fig.docx");

// Remove the shape from Source Word document
lastShape.remove();

doc.save("D:\\Temp\\sampleforpartfigure\\awjava-18.6.docx");

Hi @awais.hafeez,

Thanks for your quick reply.

Still ,I am facing same problem. I am using paragraph.remove() at the final of extraction .This also not working

Here i have attached the sample code which generate the repetition of images .From this kindly help me to modify the code to remove extracted contents from source document after embedding into new document.Please,kindly help me to solve those issue

Sample source:sample source.zip (882 Bytes)

Thanks & Regards,
Priyanga G

@priyanga,

You can define an ArrayList. And inside the for loop, you can collect the Nodes (add Paragraphs/Shapes that you want to delete to ArrayList) and after the loop ends, iterate through ArrayList and remove each item. Hope, this helps.

Hi @awais.hafeez,

Thanks for your feedback.

As You mentioned ,I have define the array list and inside for loop collect the nodes and finally ,clear the nodes.This also produced the same issue .Here I have attached the modified source. please,Kindly help me to solve this issue.

Sample source:source sample.zip (1.1 KB)

Thanks & Regards,
Priyanga G

@priyanga,

The ArrayList.clear() method will not work. You need to iterate through each item in ArrayList and remove them. Here is sample code:

ArrayList nodesToBeRemoved = new ArrayList();

Document doc = new Document("D:\\temp\\sampleforpartfigure\\sampleforpartfigure.docx");

NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);

int i = 0;
for(Shape shape : (Iterable<Shape>) shapes) {
    Document dstDoc = (Document) doc.deepClone(false);
    dstDoc.ensureMinimum();
    Shape importedShape = (Shape) dstDoc.importNode(shape, true);
    dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(importedShape);
    dstDoc.save("D:\\Temp\\sampleforpartfigure\\fig_" + i + ".docx");

    // Add the Paragraphs or Shapes that you want to remove in this ArrayList
    nodesToBeRemoved.add(shape);
    i++;
}

for(int j=0; j<nodesToBeRemoved.size(); j++)
{
    ((Node) nodesToBeRemoved.get(j)).remove();
}

doc.save("D:\\Temp\\sampleforpartfigure\\awjava-18.6.docx");

Hi @awais.hafeez,

Thanks for your feedback.

Please,kindly help me to how to implement the nodes to remove in the following code .

Sample code:source sample.zip (1.1 KB)

Thanks & regards,
Priyanga G

@priyanga,

You can build on the following code to achieve what you are looking for:

ArrayList nodesToBeRemoved = new ArrayList();
Document doc = new Document("D:\\temp\\sampleforpartfigure\\sampleforpartfigure.docx");

int i = 0;
for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (para.toString(SaveFormat.TEXT).trim().startsWith("Figure")) {
        Paragraph prevPara = (Paragraph) para.getPreviousSibling();
        Shape shape = (Shape) prevPara.getChildNodes(NodeType.SHAPE, true).get(0);

        Document dstDoc = (Document) doc.deepClone(false);
        dstDoc.ensureMinimum();
        Shape importedShape = (Shape) dstDoc.importNode(shape, true);
        dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(importedShape);
        dstDoc.save("D:\\Temp\\sampleforpartfigure\\fig_" + i + ".docx");

        // Add the Paragraphs or Shapes that you want to remove in this ArrayList
        nodesToBeRemoved.add(shape);
        nodesToBeRemoved.add(para);
        i++;
    }
}

for(int j=0; j<nodesToBeRemoved.size(); j++)
{
    ((Node) nodesToBeRemoved.get(j)).remove();
}

doc.save("D:\\temp\\sampleforpartfigure\\awjava-18.6.docx"); 

We also recommend you to read the following section of documentation to get familiar with the Document Object Model of Aspose.Words:
Aspose.Words Document Object Model

Hi @awais.hafeez,

Thanks for your reply .still I am facing the same issue.

As per your code, i have add the paragraph and finally,remove the node and save the document. This also not remove the image from the document.Please,kindly help me to solve this issue.Here I have attached the modified code .

I awaiting for your quick reply ASAP.

source:source.zip (1.2 KB)

Thanks & Regards,
Priyanga G

@priyanga,

I think, after executing your logic, you should remove all unwanted Shapes/Paragraphs from the source document at the end just before Save method:

Document doc = new Document("D:\\temp\\sampleforpartfigure\\sampleforpartfigure.docx");

// your logic goes here

// execute this new code to remove unwanted Shapes just before Save
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
for(Shape shape : (Iterable<Shape>) shapes) {
    shape.remove();
}
// execute this new code to remove unwanted Paragraphs just before Save
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph para : (Iterable<Paragraph>) paragraphs) {
    if (para.toString(SaveFormat.TEXT).trim().startsWith("Figure")) {
        para.remove();
    }
}

doc.save("D:\\temp\\sampleforpartfigure\\awjava-18.6.docx");

Hi @awais.hafeez,

Thanks for your reply.

In previous discussion you include this new code to remove unwanted Shapes just before Save. It can delete all shapes from the document.But my concern is only to remove the extracted image.Here,i have attached the modified code. please, kindly help me to delete the extracted images for the document using the following code.

source :source.zip (1.2 KB)

Please,explain how to solve this issue.

Thanks & Regards,
Priyanga G

@priyanga,

Please try the following simple code and see how it goes on your end?

ArrayList nodesToBeRemoved = new ArrayList();
Document doc = new Document("D:\\temp\\sampleforpartfigure\\sampleforpartfigure.docx");

int i = 0;
for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (para.toString(SaveFormat.TEXT).trim().startsWith("Figure")) {
        Paragraph prevPara = (Paragraph) para.getPreviousSibling();
        while (prevPara != null && prevPara.getChildNodes(NodeType.SHAPE, true).getCount()>0) {
            Document dstDoc = (Document) doc.deepClone(false);
            dstDoc.ensureMinimum();
            Paragraph importedPara = (Paragraph) dstDoc.importNode(prevPara, true);
            dstDoc.getFirstSection().getBody().appendChild(importedPara);
            dstDoc.save("D:\\Temp\\sampleforpartfigure\\fig_" + i + ".docx");

            nodesToBeRemoved.add(prevPara);
            prevPara = (Paragraph) prevPara.getPreviousSibling();
        }

        nodesToBeRemoved.add(para);
        i++;
    }
}

for(int j=0; j<nodesToBeRemoved.size(); j++)
{
    ((Node) nodesToBeRemoved.get(j)).remove();
}

doc.save("D:\\temp\\sampleforpartfigure\\awjava-18.6.docx");