Issue on text box embedded on image using aspose.words in java

priyanga · March 27, 2018, 4:25am

Thank you for your feedback.

please ,kindly share some sample to bookmark and extract the content inside the text box.

Thanks and regards,
priyanga G

tahir.manzoor · March 27, 2018, 2:00pm

Thanks for your inquiry. Please use the following code example to get the desired output.

Document doc = new Document(MyDir + "Article reviewed [13-05-2017]_test.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node PreviousPara = paragraph.getPreviousSibling();
         
        while (PreviousPara != null
                && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                && (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
                ((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0))
        { 
            PreviousPara = PreviousPara.getPreviousSibling();
        }

        if(PreviousPara == null)
        {
            builder.moveToDocumentStart();
            builder.startBookmark("Bookmark" + bookmark);
        }
        else
        {
            builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
            builder.startBookmark("Bookmark" + bookmark);
        }

        builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
        builder.endBookmark("Bookmark" + bookmark);
        bookmark++;
    }
}
 
for (Bookmark bm : doc.getRange().getBookmarks())
{
    if(bm.getName().startsWith("Bookmark"))
    {
        ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
        Document dstDoc = ExtractContents.generateDocument(doc, nodes);
        dstDoc.save(MyDir + "output"+i+".docx");
        i++;
    }
}

priyanga · March 29, 2018, 1:13pm

Hi @tahir.manzoor,

Thank you very much.

It’s working fine.The code give the exact output as expected.

Thanks and regards,
priyanga G

tahir.manzoor · March 29, 2018, 3:22pm

@priyanga,

Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

priyanga · April 2, 2018, 10:06am

Hi @tahir.manzoor,

I have an one more issue on the same scenario.

The extracted image position are changed in fig 2,fig3.

Please ,kindly help me to extract the images as same in the source document.

source document:

Test2:FiguresAAA-test2.zip (942.1 KB)

Test3:FiguresAAA-test3.zip (182.1 KB)

expected output:

output Test2:Figures_2.zip (936.0 KB)

output Test3:Figures_3.zip (181.0 KB)

actual output:

Output Test2: output2.zip (435.2 KB)

Output Test3:output3.zip (151.9 KB)

Thanks & regards,
priyanga G

tahir.manzoor · April 2, 2018, 4:27pm

@priyanga,

Thanks for your inquiry. Please use the following code example to get the desired output. Hope this helps you.

Document doc = new Document(MyDir + "FiguresAAA-test2.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node PreviousPara = paragraph.getPreviousSibling();
        while (PreviousPara != null
                && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                && (((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0 ||
                ((Paragraph)PreviousPara).getChildNodes(NodeType.GROUP_SHAPE, true).getCount() > 0 ||
                PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0)
                )
        {
            PreviousPara = PreviousPara.getPreviousSibling();
        }

        if(PreviousPara == null)
        {
            builder.moveToDocumentStart();
            builder.insertParagraph();
            builder.startBookmark("Bookmark" + bookmark);
        }
        else
        {
            Node node = ((Paragraph)PreviousPara).getParentNode().insertAfter(new Paragraph(doc), PreviousPara);
            builder.moveTo(node);
            //builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
            builder.startBookmark("Bookmark" + bookmark);
        }

        builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
        builder.endBookmark("Bookmark" + bookmark);
        bookmark++;
    }
}

for (Bookmark bm : doc.getRange().getBookmarks())
{
    if(bm.getName().startsWith("Bookmark"))
    {
        ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
        Document dstDoc = ExtractContents.generateDocument(doc, nodes);
        dstDoc.getFirstSection().getPageSetup().setLeftMargin(doc.getFirstSection().getPageSetup().getLeftMargin() - 30);
        dstDoc.getFirstSection().getPageSetup().setRightMargin(doc.getFirstSection().getPageSetup().getRightMargin() - 30);
        dstDoc.save(MyDir + "output"+i+".docx");
        i++;
    }
}

priyanga · April 3, 2018, 10:55am

Hi @tahir.manzoor,

It’s working fine for particular files.

It cannot be executed for the following two files.please ,kindly provide solution for the files using the previously shared code.

Input _1: Manuscrit-ExperimentShearSlab-BUI-REVISED-[V2]-test.zip (1.4 MB)

expected output: shear_expected output.zip (1.5 MB)

Input_2: MSSP_gear_test.zip (901.0 KB)

expected output:
Output1.zip (821.4 KB)

Thanks and regards,
priyanga G

tahir.manzoor · April 3, 2018, 5:41pm

@priyanga,

Thanks for your inquiry. Please note that the code example shared in this forum thread will not work for all your cases. First you need to list down all your use cases and then write the code accordingly. You need to use the same approach i.e. bookmark the content and extract them. You need to change the condition in while loop only.

For your document “Manuscrit-ExperimentShearSlab-BUI-REVISED-[V2]-test.docx”, there are following three cases:

The shapes are inside Table. You need to use NodeImporter to export the table into new document.
The previous paragraphs of Fig caption contains the text (a), (b), ( c ) and Shape node.
The last case is; previous paragraphs of Fig caption contains the shape node.

For your document “MSSP_gear_test.docx”, there are following two cases:

The shapes are inside GroupShape. You need to use NodeImporter to export the GroupShape into new document.
The shape node contains the OLE object with progID “Visio.Drawing.11”. You need to use NodeImporter to export the paragraph node that contains the OLE.

We already shared the code examples for these cases in your other threads. We suggest you please list down all your use cases and write the code accordingly.

priyanga · April 4, 2018, 1:10pm

Hi @tahir.manzoor

Thanks for your feedback.

Instead of change this left and right margins. please provide solution as below to set page property for this scenario

Document dstDoc = new Document();
dstDoc.removeAllChildren();

            Section section = ((Paragraph) previousPara).getParentSection();
            NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
            Node newNode = importer.importNode(section, false);

            dstDoc.getSections().add(newNode);
            dstDoc.updatePageLayout();
            newNode = importer.importNode(previousPara, true);
            dstDoc.getFirstSection().getBody().appendChild(newNode);
            dstDoc.acceptAllRevisions();
            dstDoc.save(MyDir + "output"+i+".docx");

Thanks & regards,
priyanga G

tahir.manzoor · April 4, 2018, 4:16pm

@priyanga,

Thanks for your inquiry. In your case, you need to set the page setup properties according to source document. Please use the following generic code snippet to get the desired output.

for (Bookmark bm : doc.getRange().getBookmarks())
{
    if(bm.getName().startsWith("Bookmark"))
    {
        ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
        Document dstDoc = ExtractContents.generateDocument(doc, nodes);

        PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
        dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
        dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
        dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());
        dstDoc.save(MyDir + "output"+i+".docx");
        i++;
    }
}

priyanga · April 5, 2018, 6:49am

Hi @tahir.manzoor,

Thank you very much .

Great .It’s working fine.

Thanks & regards,
Priyanga G

priyanga · April 5, 2018, 8:16am

Hi @tahir.manzoor,

Thanks for your feedback.

while extracting images the text also extract as like in the source document .please kindly help me to solve the issue.

source code:

source document :mssp_test (2).zip (2.9 MB)

actual output:
MSSP_gear signature Revision submitted (2).zip (2.5 MB)

Thanks & regards,
Priyanga.G

awais.hafeez · April 5, 2018, 11:52pm

@priyanga,

We are working over your query and will get back to you soon.

tahir.manzoor · April 6, 2018, 4:32am

@priyanga,

Thanks for your patience. We have modified the code according to your requirement. Please use the following code example to get the desired output. Hope this helps you.

Document doc = new Document(MyDir + "mssp_test.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node PreviousPara = paragraph.getPreviousSibling();
        while (PreviousPara != null
                && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                && (((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0 ||
                ((Paragraph)PreviousPara).getChildNodes(NodeType.GROUP_SHAPE, true).getCount() > 0)
                )

        {
            PreviousPara = PreviousPara.getPreviousSibling();
            if(PreviousPara.toString(SaveFormat.TEXT).trim().length() > 0 &&
                    (PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
                     PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
                     PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
                     PreviousPara.toString(SaveFormat.TEXT).trim().contains("(d)")
                    )
              )
                continue;
            else
                break;
        }

        if(PreviousPara == null)
        {
            builder.moveToDocumentStart();
            builder.insertParagraph();
            builder.startBookmark("Bookmark" + bookmark);
        }
        else
        {
            Node node = ((Paragraph)PreviousPara).getParentNode().insertAfter(new Paragraph(doc), PreviousPara);
            builder.moveTo(node);
            //builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
            builder.startBookmark("Bookmark" + bookmark);
        }

        builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
        builder.endBookmark("Bookmark" + bookmark);
        bookmark++;
    }
}

for (Bookmark bm : doc.getRange().getBookmarks())
{
    if(bm.getName().startsWith("Bookmark"))
    {
        ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
        Document dstDoc = ExtractContents.generateDocument(doc, nodes);

        PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
        dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
        dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
        dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());
        dstDoc.save(MyDir + "out\\output"+i+".docx");
        i++;
    }
}

priyanga · April 6, 2018, 5:54am

Hi @tahir.manzoor,

Thank you very much.Thanks for all your support.

Thanks & regards,
Priyanga G

priyanga · April 6, 2018, 5:59am

Hi @tahir.manzoor,

There are some issue after integration of this shared code.please kindly help me to solve this error.

Exception in thread “main” java.lang.IllegalArgumentException: Parameter name: paraIdx
at com.aspose.words.DocumentBuilder.zzZ(Unknown Source)
at com.aspose.words.DocumentBuilder.moveToParagraph(Unknown Source)
at com.proc.DisplayImageExtraction.MsspGear.main(MsspGear.java:69)

builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);

Thanks and regards,
Priyanga G

tahir.manzoor · April 6, 2018, 2:25pm

@priyanga,

Thanks for your inquiry. We tested this code with “FiguresAAA-test2.docx”, “FiguresAAA-test3.docx”, and “mssp_test.docx”. We have not faced any exception. Please share the input document for which you are facing this exception.

priyanga · April 10, 2018, 1:03pm

Hi @tahir.manzoor,

Thanks for your quick reply.

I am using the same code for the following document.

It shows the error:com.aspose.words.Table cannot be cast to com.aspose.words.Paragraph

please ,kindly help me to solve the error in the line: builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);

Input 1: Bernardo_et_al_RevisedPaper_aspose.zip (2.9 MB)

Thanks and regards,
Priyanga G

tahir.manzoor · April 10, 2018, 5:19pm

@priyanga,

We are investigating this issue and will share the modified code according to your requirement. We apologize for your inconvenience.

tahir.manzoor · April 12, 2018, 6:08am

@priyanga,

Thanks for your patience. In your document, the shapes are inside table and it also contains charts. Please use the following code example to get the desired output.

Document doc = new Document(MyDir + "Bernardo_et_al_RevisedPaper_aspose.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
ArrayList tables = new ArrayList();
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node PreviousPara = paragraph.getPreviousSibling();
        while (PreviousPara != null && PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0
                && ((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() == 0)
            PreviousPara = PreviousPara.getPreviousSibling();

        while (PreviousPara != null
                && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                &&   (((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0 || ((Paragraph)PreviousPara).getChildNodes(NodeType.GROUP_SHAPE, true).getCount() > 0)

                )

        {
            PreviousPara = PreviousPara.getPreviousSibling();
            if(PreviousPara.toString(SaveFormat.TEXT).trim().length() > 0 &&
                    (PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
                            PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
                            PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
                            PreviousPara.toString(SaveFormat.TEXT).trim().contains("(d)")
                    )
                    )
                continue;
            else
                break;
        }

        if(PreviousPara == null)
        {
            builder.moveToDocumentStart();
            builder.insertParagraph();
            builder.startBookmark("Bookmark" + bookmark);
            builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
            builder.endBookmark("Bookmark" + bookmark);
            bookmark++;
        }
        else if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
        {
            Node node = ((Paragraph)PreviousPara).getParentNode().insertAfter(new Paragraph(doc), PreviousPara);
            builder.moveTo(node);
            builder.startBookmark("Bookmark" + bookmark);
            builder.moveTo(paragraph);
            //builder.writeln();
            builder.endBookmark("Bookmark" + bookmark);
            bookmark++;
        }
        else if(PreviousPara.getNodeType() == NodeType.TABLE)
        {
            if(((Table)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
                tables.add(((Table)PreviousPara));
        }

     }
}
 
for (Bookmark bm : doc.getRange().getBookmarks())
{
    if(bm.getName().startsWith("Bookmark"))
    {
        ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
        Document dstDoc = ExtractContents.generateDocument(doc, nodes);

        PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
        dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
        dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
        dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());
        if(dstDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
            dstDoc.getLastSection().getBody().getLastParagraph().remove();
        dstDoc.save(MyDir + "out\\output"+i+".docx");
        i++;
    }
}

for(Table table : (Iterable<Table>)tables)
{
    Document dstDoc = new Document();

    NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    Node newNode = importer.importNode(table, true);
    dstDoc.getFirstSection().getBody().appendChild(newNode);
    dstDoc.save(MyDir + "out\\output" + i + ".docx");
    i++;
}