Issue on text box embedded on image using aspose.words in java

Hi Team,

My requirement is to extracting the images and saved into new document.

The issue is text box is missing from the image (Fig.1).please kindly help me to extract the image along with text box.

Input : test.zip (2.7 MB)

actual output:actual output.zip (214.2 KB)

expected output:expected output.zip (220.6 KB)

Thanks & regards,
priyanga G

@priyanga,

Thanks for your inquiry. You are getting the same Shape in output document as it is in input document. Please use OoxmlSaveOptions as shown below to get the expected output.

OoxmlSaveOptions options = new OoxmlSaveOptions();
options.setCompliance(OoxmlCompliance.ISO_29500_2008_STRICT);
doc.save(MyDir + "18.3.docx", options);

Hi @tahir.manzoor

Thank for your feedback.

But the content inside the text box is missing (Issue: text box content setif is extracted as s in the output).please kindly help me to resolve the issue.

Input:Article reviewed [13-05-2017]_test.zip (231.9 KB)

actual output:18.3.zip (215.3 KB)

Thanks & regards,
priyanga G

@priyanga,

Thanks for your inquiry. In this case, we suggest you please bookmark the desired content and extract them using the code example shared in following link.
Extract Content from a Bookmark

Hi @tahir.manzoor,

Thank you for your feedback.

please ,kindly share some sample to bookmark and extract the content inside the text box.

Thanks and regards,
priyanga G

@priyanga,

Thanks for your inquiry. Please use the following code example to get the desired output.

Document doc = new Document(MyDir + "Article reviewed [13-05-2017]_test.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node PreviousPara = paragraph.getPreviousSibling();
         
        while (PreviousPara != null
                && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                && (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
                ((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0))
        { 
            PreviousPara = PreviousPara.getPreviousSibling();
        }

        if(PreviousPara == null)
        {
            builder.moveToDocumentStart();
            builder.startBookmark("Bookmark" + bookmark);
        }
        else
        {
            builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
            builder.startBookmark("Bookmark" + bookmark);
        }

        builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
        builder.endBookmark("Bookmark" + bookmark);
        bookmark++;
    }
}
 
for (Bookmark bm : doc.getRange().getBookmarks())
{
    if(bm.getName().startsWith("Bookmark"))
    {
        ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
        Document dstDoc = ExtractContents.generateDocument(doc, nodes);
        dstDoc.save(MyDir + "output"+i+".docx");
        i++;
    }
}

Hi @tahir.manzoor,

Thank you very much.

It’s working fine.The code give the exact output as expected.

Thanks and regards,
priyanga G

@priyanga,

Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

Hi @tahir.manzoor,

I have an one more issue on the same scenario.

The extracted image position are changed in fig 2,fig3.

Please ,kindly help me to extract the images as same in the source document.

source document:

Test2:FiguresAAA-test2.zip (942.1 KB)

Test3:FiguresAAA-test3.zip (182.1 KB)

expected output:

output Test2:Figures_2.zip (936.0 KB)

output Test3:Figures_3.zip (181.0 KB)

actual output:

Output Test2: output2.zip (435.2 KB)

Output Test3:output3.zip (151.9 KB)

Thanks & regards,
priyanga G

@priyanga,

Thanks for your inquiry. Please use the following code example to get the desired output. Hope this helps you.

Document doc = new Document(MyDir + "FiguresAAA-test2.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node PreviousPara = paragraph.getPreviousSibling();
        while (PreviousPara != null
                && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                && (((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0 ||
                ((Paragraph)PreviousPara).getChildNodes(NodeType.GROUP_SHAPE, true).getCount() > 0 ||
                PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0)
                )
        {
            PreviousPara = PreviousPara.getPreviousSibling();
        }

        if(PreviousPara == null)
        {
            builder.moveToDocumentStart();
            builder.insertParagraph();
            builder.startBookmark("Bookmark" + bookmark);
        }
        else
        {
            Node node = ((Paragraph)PreviousPara).getParentNode().insertAfter(new Paragraph(doc), PreviousPara);
            builder.moveTo(node);
            //builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
            builder.startBookmark("Bookmark" + bookmark);
        }

        builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
        builder.endBookmark("Bookmark" + bookmark);
        bookmark++;
    }
}

for (Bookmark bm : doc.getRange().getBookmarks())
{
    if(bm.getName().startsWith("Bookmark"))
    {
        ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
        Document dstDoc = ExtractContents.generateDocument(doc, nodes);
        dstDoc.getFirstSection().getPageSetup().setLeftMargin(doc.getFirstSection().getPageSetup().getLeftMargin() - 30);
        dstDoc.getFirstSection().getPageSetup().setRightMargin(doc.getFirstSection().getPageSetup().getRightMargin() - 30);
        dstDoc.save(MyDir + "output"+i+".docx");
        i++;
    }
}

Hi @tahir.manzoor,

It’s working fine for particular files.

It cannot be executed for the following two files.please ,kindly provide solution for the files using the previously shared code.

Input _1: Manuscrit-ExperimentShearSlab-BUI-REVISED-[V2]-test.zip (1.4 MB)

expected output: shear_expected output.zip (1.5 MB)

Input_2: MSSP_gear_test.zip (901.0 KB)

expected output:
Output1.zip (821.4 KB)

Thanks and regards,
priyanga G

@priyanga,

Thanks for your inquiry. Please note that the code example shared in this forum thread will not work for all your cases. First you need to list down all your use cases and then write the code accordingly. You need to use the same approach i.e. bookmark the content and extract them. You need to change the condition in while loop only.

For your document “Manuscrit-ExperimentShearSlab-BUI-REVISED-[V2]-test.docx”, there are following three cases:

  1. The shapes are inside Table. You need to use NodeImporter to export the table into new document.
  2. The previous paragraphs of Fig caption contains the text (a), (b), ( c ) and Shape node.
  3. The last case is; previous paragraphs of Fig caption contains the shape node.

For your document “MSSP_gear_test.docx”, there are following two cases:

  1. The shapes are inside GroupShape. You need to use NodeImporter to export the GroupShape into new document.
  2. The shape node contains the OLE object with progID “Visio.Drawing.11”. You need to use NodeImporter to export the paragraph node that contains the OLE.

We already shared the code examples for these cases in your other threads. We suggest you please list down all your use cases and write the code accordingly.

Hi @tahir.manzoor

Thanks for your feedback.

Instead of change this left and right margins. please provide solution as below to set page property for this scenario

Document dstDoc = new Document();
dstDoc.removeAllChildren();

            Section section = ((Paragraph) previousPara).getParentSection();
            NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
            Node newNode = importer.importNode(section, false);

            dstDoc.getSections().add(newNode);
            dstDoc.updatePageLayout();
            newNode = importer.importNode(previousPara, true);
            dstDoc.getFirstSection().getBody().appendChild(newNode);
            dstDoc.acceptAllRevisions();
            dstDoc.save(MyDir + "output"+i+".docx");

Thanks & regards,
priyanga G

@priyanga,

Thanks for your inquiry. In your case, you need to set the page setup properties according to source document. Please use the following generic code snippet to get the desired output.

for (Bookmark bm : doc.getRange().getBookmarks())
{
    if(bm.getName().startsWith("Bookmark"))
    {
        ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
        Document dstDoc = ExtractContents.generateDocument(doc, nodes);

        PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
        dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
        dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
        dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());
        dstDoc.save(MyDir + "output"+i+".docx");
        i++;
    }
}

Hi @tahir.manzoor,

Thank you very much .

Great .It’s working fine.

Thanks & regards,
Priyanga G

Hi @tahir.manzoor,

Thanks for your feedback.

while extracting images the text also extract as like in the source document .please kindly help me to solve the issue.

source code:

source document :mssp_test (2).zip (2.9 MB)

actual output:
MSSP_gear signature Revision submitted (2).zip (2.5 MB)

Thanks & regards,
Priyanga.G

@priyanga,

We are working over your query and will get back to you soon.

@priyanga,

Thanks for your patience. We have modified the code according to your requirement. Please use the following code example to get the desired output. Hope this helps you.

Document doc = new Document(MyDir + "mssp_test.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph  paragraph : (Iterable<Paragraph>) paragraphs)
{
    if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
    {
        Node PreviousPara = paragraph.getPreviousSibling();
        while (PreviousPara != null
                && PreviousPara.getNodeType() == NodeType.PARAGRAPH
                && (((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0 ||
                ((Paragraph)PreviousPara).getChildNodes(NodeType.GROUP_SHAPE, true).getCount() > 0)
                )

        {
            PreviousPara = PreviousPara.getPreviousSibling();
            if(PreviousPara.toString(SaveFormat.TEXT).trim().length() > 0 &&
                    (PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
                     PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
                     PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
                     PreviousPara.toString(SaveFormat.TEXT).trim().contains("(d)")
                    )
              )
                continue;
            else
                break;
        }

        if(PreviousPara == null)
        {
            builder.moveToDocumentStart();
            builder.insertParagraph();
            builder.startBookmark("Bookmark" + bookmark);
        }
        else
        {
            Node node = ((Paragraph)PreviousPara).getParentNode().insertAfter(new Paragraph(doc), PreviousPara);
            builder.moveTo(node);
            //builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
            builder.startBookmark("Bookmark" + bookmark);
        }

        builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
        builder.endBookmark("Bookmark" + bookmark);
        bookmark++;
    }
}

for (Bookmark bm : doc.getRange().getBookmarks())
{
    if(bm.getName().startsWith("Bookmark"))
    {
        ArrayList nodes =  ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
        Document dstDoc = ExtractContents.generateDocument(doc, nodes);

        PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
        dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
        dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
        dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());
        dstDoc.save(MyDir + "out\\output"+i+".docx");
        i++;
    }
}

Hi @tahir.manzoor,

Thank you very much.Thanks for all your support.

Thanks & regards,
Priyanga G

Hi @tahir.manzoor,

There are some issue after integration of this shared code.please kindly help me to solve this error.

Exception in thread “main” java.lang.IllegalArgumentException: Parameter name: paraIdx
at com.aspose.words.DocumentBuilder.zzZ(Unknown Source)
at com.aspose.words.DocumentBuilder.moveToParagraph(Unknown Source)
at com.proc.DisplayImageExtraction.MsspGear.main(MsspGear.java:69)

builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);

Thanks and regards,
Priyanga G