Removing empty pages

akshayapria · November 24, 2017, 12:32pm

Thanks a lot .

other document having group images.
I have handled those group images.but still that image is not came.please help me to resolve the same.

The input Test.zip (431.2 KB)

The expected output expected output.zip (172.5 KB)

The actual output actual output.zip (111.2 KB)

regards,
pria

tahir.manzoor · November 24, 2017, 4:45pm

@akshayapria,

Thanks for your inquiry. In this scenario, your document contains the GroupShape. Please use following code example to extract the GroupShape. Hope this helps you.

Document doc = new Document(MyDir + "test.doc");
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);

for (Node paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node node = paragraph.previousPreOrder(doc);
        while (node != null &&  node.getNodeType() != NodeType.GROUP_SHAPE && node.getNodeType() != NodeType.BODY)
        {
            node = node.previousPreOrder(doc);
        }
        if(node != null &&  node.getNodeType() == NodeType.GROUP_SHAPE)
        {
            Document dstDoc = new Document();
            NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
            Node newNode = importer.importNode(node, true);
            dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
            dstDoc.save(MyDir + "output"+i+".docx");
            i++;
            continue;
        }
    }
}

akshayapria · November 27, 2017, 12:37pm

Hi @tahir.manzoor,

Thank you very much.

Issue-1_Now also one more image is not extracted.

The actual output is actual output.zip (121.5 KB)
please kindly help me to extract the skipped one.

Thanks
&
regards,
pria

tahir.manzoor · November 28, 2017, 4:57am

@akshayapria,

Thanks for your inquiry. Please use following code example to get the desired output.

Document doc = new Document(MyDir + "test.doc");
int i = 1;
java.util.List<Node> al = new ArrayList<Node>();
Set<Node> hs = new HashSet<>();

NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);

for (Node paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().contains("Fig"))
    {
        Node node = paragraph.previousPreOrder(doc);

        while (node != null &&  node.getNodeType() != NodeType.GROUP_SHAPE && node.getNodeType() != NodeType.BODY)
        {
            node = node.previousPreOrder(doc);
        }
        if(node != null &&  node.getNodeType() == NodeType.GROUP_SHAPE)
        {
            al.add(node);
        }
    }
}

hs.addAll(al);
al.clear();
al.addAll(hs);

for(Node node : al)
{
    System.out.println(node);
    Document dstDoc = new Document();
    NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    Node newNode = importer.importNode(node, true);
    dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
    dstDoc.save(MyDir + "output"+i+".docx");
    i++;
}

akshayapria · November 28, 2017, 1:01pm

HI @tahir.manzoor

I am very thankful to consider my problem. and also greatly to appreciate your solutions.

Still the missed image is not extracted.some text are extracted.please help me to ignore the text document and extract the skipped images.

The actual output actual output.zip (105.8 KB)

Thanks & regards,
pria

tahir.manzoor · November 28, 2017, 2:26pm

@akshayapria,

Thanks for your inquiry. Your input document contains eight group shapes. The code example shared in my previous post generates the correct output. We have attached the output documents with this post for your kind reference. output docs.zip (158.9 KB)

Please make sure that you are using the same code and latest version of Aspose.Words for Java 17.11.

akshayapria · November 29, 2017, 12:54pm

Thanks for your feedback.

Again the same issue.I am using the latest version of Aspose.Words For Java 17.11
Only three images are extracted.
The actual output is Output.zip1 (64.0 KB)
please help me to resolve the issue.

Thanks & regards,
pria.

akshayapria · November 29, 2017, 1:07pm

HI @tahir.manzoor,

Thanks for your feedback.

The one document having images above the lable.

Already handled images above the label .but still it is not working.

The source code source.zip (44.4 KB)

The input test.zip (175.9 KB)

The actual output actual output.zip (155.1 KB)

The expected output expected output.zip (206.1 KB)

please kindly help me to resolve the issue.

The second document is having image without fig caption(fig 1).please let me know how to extract those images.

The outputexpected output.zip (1.9 MB)

The input test.zip (2.2 MB)

Thanks & Regards,
pria.

tahir.manzoor · November 29, 2017, 2:12pm

@akshayapria,

Thanks for your inquiry.

In this case, we suggest you following solution.

Iterate through all paragraphs.
If paragraph starts with text “Fig”, get the previous nodes from this paragaph until it is not shape node into ArrayList. In this case, please also add paragraphs into ArrayList that have text (a), (b) etc.
Please use NodeImporter to import the extracted nodes into new document.

Please refer to the following article.
How to Extract Images from a Document