Remove equations from the document causes issue

Hi @tahir.manzoor

Thanks for understanding.

The requirement is to read the word document.
Extract the images using paragraph node .each image can be stored in each document in a separate folder .I am having separate methods to handle extraction of images as single images ,inline images and table images .some of the images is handled in section D(not under above three categories ) in source code.In the section D extract the images and also the equations extracted along with the output.Let me know how to remove equations permanently.

Thanks You very much.
priyanga

@priyanga,

Thanks for sharing the detail. You can use the following code snippet in DSMT4 method to remove the equation from the document. Hope this helps you.

if (shape.getOleFormat().getProgId().startsWith("Equation"))
{
    shape.getParentParagraph().remove();
}

Hi @tahir.manzoor

Thank you very much for giving solution.

Once i was use the parent paragraph remove method in DSMT4 method it show only the extraction failed message.i think it can remove all paragraph nodes.please let me know how to solve it.

Thanks & regards,
priyanga G

@priyanga,

Please open your input document “FeifeiShen-REVISE-2017_input.docx” in MS Word. There are equations on page 5 and 6. Please share problematic and expected output documents for these pages. We already requested for these documents. If you cannot supply us with this information we will not be able to investigate your issue.

Please manually create your expected Word documents using Microsoft Word for page 5 and 6. Please ZIP and attach them here for our reference. We will then provide you code example according to your requirement.

Hi @tahir.manzoor

Thank you very much.

The input document is FeifeiShen-REVISE-2017_input.zip (2.6 MB)

The expected output isFeifeiShen-REVISE-2017_old.zip (1.5 MB)

regards
priyanga

@priyanga,

Thanks for sharing the detail. Perhaps, you are not using the DSMT4 method and extract contents code correctly. Please use DSMT4 method to remove the equations (shapes). This method works without any issue at our end. After removing the shape, please use the same code shared with you earlier to extract the shapes.

In case you are using older version of Aspose.Words, we suggest you please upgrade to the latest version of Aspose.Words for Java 17.8.

We have tested the scenario using following code example and have not found the shared issue. Please check the output document. output documents.zip (2.1 MB)

DSMT4(MyDir + "FeifeiShen-REVISE-2017_input.docx");

/** SECTION D START **/ int i = 1; 
Document interimdoc = new Document(MyDir + "FeifeiShen-REVISE-2017_input.docx");
NodeCollection shapes_otherimg = interimdoc.getChildNodes(NodeType.SHAPE, true);

for (Shape shape : (Iterable<Shape>) shapes_otherimg) {
    if (shape.hasImage() && shape.getParentParagraph().getNextSibling() != null
            && shape.getParentParagraph().getNextSibling().getNodeType() == NodeType.PARAGRAPH) {

        ArrayList nodes1 = ExtractContents.extractContent(shape.getParentParagraph(), shape.getParentParagraph(), true);

        ExtractContents.generateDocument(interimdoc, nodes1).save(MyDir + "output"+i+".docx");

        Paragraph fig = (Paragraph) shape.getParentParagraph();
        /**
         * REMOVAL OF NODE(START,END) FROM SOURCE WORD DOC START
         **/
        shape.getParentParagraph().insertBefore(new BookmarkStart(interimdoc, "Image_" + i), shape);
        fig.appendChild(new BookmarkEnd(interimdoc, "Image_" + i));
        i++;
        for (Bookmark bookmark : interimdoc.getRange().getBookmarks()) {
            if (bookmark.getName().startsWith("Image_")) {
                bookmark.setText("");
            }
        }

    }
}

Hi @tahir.manzoor

Now I am able to get exact output .Thanks a lot.

Thanks
&
regards,
priyanga G

@priyanga,

Thanks for your feedback. It is nice to hear from you that your issue has been solved. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

Hi @tahir.manzoor,

We have a group shape in this document.but that images cannot be recovered.It will show the error message only.
Already we are include the NodeCollection Gshapes = interimdoc1.getChildNodes( NodeType.GROUP_SHAPE, true);method still the images cannot be able to recovered.please let me know how to resolve it.

The input document is Test.zip (431.2 KB)

The output isOutput.zip (837.1 KB)

Thanks
&
Regards
priyanga G

@priyanga,

Thanks for your inquiry. Please use the following code example to export the group shape into new document. Hope this helps you.

Document srcDoc = new Document(MyDir + "Test.doc");
NodeCollection groupShapes = srcDoc.getChildNodes(NodeType.GROUP_SHAPE, true);
int i = 1;
for (GroupShape groupShape : (Iterable<GroupShape>) groupShapes) {

    Document doc = new Document();
    NodeImporter imp = new NodeImporter(srcDoc, doc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    Node impNode = imp.importNode(groupShape, true);
    doc.getFirstSection().getBody().getFirstParagraph().appendChild(impNode);
    doc.save(MyDir + "output"+i+".docx");
    i++;
}

Hi @Tahir,

Thank you very much.

Thanks
&
regards

priyangaG

@priyanga,

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

Hi @tahir,

Thanks for your update .I am able extract the group shape images.Thanks for your help.still some of the images not extracted from the word document.please let me know how to extract those images.

The source code is NewAip(1).zip (46.3 KB)
The input document is
Test.zip (2.7 MB)
Thanks & Regards
priyanga G

@priyanga,

Thanks for your inquiry. Could you please share the page numbers of input document that have group shapes and are not exported? We will investigate the issue and provide you more information on this.

Hi @tahir.manzoor

Thank you very much fro your timely reply.The page number is 40.Once the group shape image is arrived the extraction is getting failed.Please,help me how to resolve it.

Thanks and regards,
priynaga G

@priyanga,

Thanks for sharing the detail. The shared page does not contain the GroupShape. It has shapes inside table. Please check the Aspose.Words’ DOM image for detail. You can use the same approach to export the Table in output document. DOM.png (12.5 KB)

Thank you very much for providing the solution.

Thanks
&
Regards,

Priyanga.G

@priyanga,

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

Hi tahir,

Thanks for all help and support.The requirement is How to get fig caption for group shape and print the name along with filename.please help me to get fig caption .

Thanks & regards
priyanga G

@priyanga,

Thanks for your inquiry. In this case, we suggest you following solution.

  1. Iterate through all Shape nodes.

  2. If the ancestor node of Shape is Table, export the table into new document.

  3. If the ancestor node of Shape is GroupShape, export GroupShape to new document.

  4. if the next sibling node of Shape’s parent node is Paragraph and it contains the text “Figure”, export the Shape’s paragraph and its next sibling to new document.

     Document doc = new Document(MyDir + "test.docx");
     DocumentBuilder builder = new DocumentBuilder(doc);
     int i = 1;
     NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
     for (Shape shape : (Iterable<Shape>) shapes)
     {
         if(shape.hasChart() || shape.hasImage())
         {
             if (shape.getAncestor(NodeType.TABLE) != null)
             {
                 Document dstDoc = new Document();
                 NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
                 Node newNode = importer.importNode(shape.getAncestor(NodeType.TABLE), true);
                 dstDoc.getFirstSection().getBody().appendChild(newNode);
                 dstDoc.save(MyDir + "output"+i+".docx");
                 i++;
             }
    
         if (shape.getAncestor(NodeType.GROUP_SHAPE) != null)
         {
             Document dstDoc = new Document();
             NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
             Node newNode = importer.importNode(shape.getAncestor(NodeType.GROUP_SHAPE), true);
             dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
             dstDoc.save(MyDir + "output"+i+".docx");
             i++;
         }
    
         Node node = shape.getParentNode().getNextSibling();
         //Modify this condition according to your requirement
         if(node != null && node.getNodeType() == NodeType.PARAGRAPH
                 && node.toString(SaveFormat.TEXT).contains("Figure"))
         {
             Document dstDoc = new Document();
             NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
             Node newNode = importer.importNode(shape.getParentParagraph(), true);
             dstDoc.getFirstSection().getBody().appendChild(newNode);
    
             newNode = importer.importNode(shape.getParentParagraph().getNextSibling(), true);
             dstDoc.getFirstSection().getBody().appendChild(newNode);
    
             dstDoc.save(MyDir + "output"+i+".docx");
             i++;
         }
     }}