Hi @tahir.manzoor,
Thank you for your feedback.
please ,kindly share some sample to bookmark and extract the content inside the text box.
Thanks and regards,
priyanga G
Hi @tahir.manzoor,
Thank you for your feedback.
please ,kindly share some sample to bookmark and extract the content inside the text box.
Thanks and regards,
priyanga G
Thanks for your inquiry. Please use the following code example to get the desired output.
Document doc = new Document(MyDir + "Article reviewed [13-05-2017]_test.doc");
DocumentBuilder builder = new DocumentBuilder(doc);
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node PreviousPara = paragraph.getPreviousSibling();
while (PreviousPara != null
&& PreviousPara.getNodeType() == NodeType.PARAGRAPH
&& (PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0 ||
((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0))
{
PreviousPara = PreviousPara.getPreviousSibling();
}
if(PreviousPara == null)
{
builder.moveToDocumentStart();
builder.startBookmark("Bookmark" + bookmark);
}
else
{
builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
builder.startBookmark("Bookmark" + bookmark);
}
builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
builder.endBookmark("Bookmark" + bookmark);
bookmark++;
}
}
for (Bookmark bm : doc.getRange().getBookmarks())
{
if(bm.getName().startsWith("Bookmark"))
{
ArrayList nodes = ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
Document dstDoc = ExtractContents.generateDocument(doc, nodes);
dstDoc.save(MyDir + "output"+i+".docx");
i++;
}
}
Hi @tahir.manzoor,
Thank you very much.
It’s working fine.The code give the exact output as expected.
Thanks and regards,
priyanga G
Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.
Hi @tahir.manzoor,
I have an one more issue on the same scenario.
The extracted image position are changed in fig 2,fig3.
Please ,kindly help me to extract the images as same in the source document.
source document:
Test2:FiguresAAA-test2.zip (942.1 KB)
Test3:FiguresAAA-test3.zip (182.1 KB)
expected output:
output Test2:Figures_2.zip (936.0 KB)
output Test3:Figures_3.zip (181.0 KB)
actual output:
Output Test2: output2.zip (435.2 KB)
Output Test3:output3.zip (151.9 KB)
Thanks & regards,
priyanga G
Thanks for your inquiry. Please use the following code example to get the desired output. Hope this helps you.
Document doc = new Document(MyDir + "FiguresAAA-test2.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node PreviousPara = paragraph.getPreviousSibling();
while (PreviousPara != null
&& PreviousPara.getNodeType() == NodeType.PARAGRAPH
&& (((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0 ||
((Paragraph)PreviousPara).getChildNodes(NodeType.GROUP_SHAPE, true).getCount() > 0 ||
PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0)
)
{
PreviousPara = PreviousPara.getPreviousSibling();
}
if(PreviousPara == null)
{
builder.moveToDocumentStart();
builder.insertParagraph();
builder.startBookmark("Bookmark" + bookmark);
}
else
{
Node node = ((Paragraph)PreviousPara).getParentNode().insertAfter(new Paragraph(doc), PreviousPara);
builder.moveTo(node);
//builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
builder.startBookmark("Bookmark" + bookmark);
}
builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
builder.endBookmark("Bookmark" + bookmark);
bookmark++;
}
}
for (Bookmark bm : doc.getRange().getBookmarks())
{
if(bm.getName().startsWith("Bookmark"))
{
ArrayList nodes = ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
Document dstDoc = ExtractContents.generateDocument(doc, nodes);
dstDoc.getFirstSection().getPageSetup().setLeftMargin(doc.getFirstSection().getPageSetup().getLeftMargin() - 30);
dstDoc.getFirstSection().getPageSetup().setRightMargin(doc.getFirstSection().getPageSetup().getRightMargin() - 30);
dstDoc.save(MyDir + "output"+i+".docx");
i++;
}
}
Hi @tahir.manzoor,
It’s working fine for particular files.
It cannot be executed for the following two files.please ,kindly provide solution for the files using the previously shared code.
Input _1: Manuscrit-ExperimentShearSlab-BUI-REVISED-[V2]-test.zip (1.4 MB)
expected output: shear_expected output.zip (1.5 MB)
Input_2: MSSP_gear_test.zip (901.0 KB)
expected output:
Output1.zip (821.4 KB)
Thanks and regards,
priyanga G
Thanks for your inquiry. Please note that the code example shared in this forum thread will not work for all your cases. First you need to list down all your use cases and then write the code accordingly. You need to use the same approach i.e. bookmark the content and extract them. You need to change the condition in while loop only.
For your document “Manuscrit-ExperimentShearSlab-BUI-REVISED-[V2]-test.docx”, there are following three cases:
For your document “MSSP_gear_test.docx”, there are following two cases:
We already shared the code examples for these cases in your other threads. We suggest you please list down all your use cases and write the code accordingly.
Thanks for your feedback.
Instead of change this left and right margins. please provide solution as below to set page property for this scenario
Document dstDoc = new Document();
dstDoc.removeAllChildren();
Section section = ((Paragraph) previousPara).getParentSection();
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(section, false);
dstDoc.getSections().add(newNode);
dstDoc.updatePageLayout();
newNode = importer.importNode(previousPara, true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
dstDoc.acceptAllRevisions();
dstDoc.save(MyDir + "output"+i+".docx");
Thanks & regards,
priyanga G
Thanks for your inquiry. In your case, you need to set the page setup properties according to source document. Please use the following generic code snippet to get the desired output.
for (Bookmark bm : doc.getRange().getBookmarks())
{
if(bm.getName().startsWith("Bookmark"))
{
ArrayList nodes = ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
Document dstDoc = ExtractContents.generateDocument(doc, nodes);
PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());
dstDoc.save(MyDir + "output"+i+".docx");
i++;
}
}
Hi @tahir.manzoor,
Thanks for your feedback.
while extracting images the text also extract as like in the source document .please kindly help me to solve the issue.
source code:
source document :mssp_test (2).zip (2.9 MB)
actual output:
MSSP_gear signature Revision submitted (2).zip (2.5 MB)
Thanks & regards,
Priyanga.G
Thanks for your patience. We have modified the code according to your requirement. Please use the following code example to get the desired output. Hope this helps you.
Document doc = new Document(MyDir + "mssp_test.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node PreviousPara = paragraph.getPreviousSibling();
while (PreviousPara != null
&& PreviousPara.getNodeType() == NodeType.PARAGRAPH
&& (((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0 ||
((Paragraph)PreviousPara).getChildNodes(NodeType.GROUP_SHAPE, true).getCount() > 0)
)
{
PreviousPara = PreviousPara.getPreviousSibling();
if(PreviousPara.toString(SaveFormat.TEXT).trim().length() > 0 &&
(PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(d)")
)
)
continue;
else
break;
}
if(PreviousPara == null)
{
builder.moveToDocumentStart();
builder.insertParagraph();
builder.startBookmark("Bookmark" + bookmark);
}
else
{
Node node = ((Paragraph)PreviousPara).getParentNode().insertAfter(new Paragraph(doc), PreviousPara);
builder.moveTo(node);
//builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
builder.startBookmark("Bookmark" + bookmark);
}
builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
builder.endBookmark("Bookmark" + bookmark);
bookmark++;
}
}
for (Bookmark bm : doc.getRange().getBookmarks())
{
if(bm.getName().startsWith("Bookmark"))
{
ArrayList nodes = ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
Document dstDoc = ExtractContents.generateDocument(doc, nodes);
PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());
dstDoc.save(MyDir + "out\\output"+i+".docx");
i++;
}
}
Hi @tahir.manzoor,
There are some issue after integration of this shared code.please kindly help me to solve this error.
Exception in thread “main” java.lang.IllegalArgumentException: Parameter name: paraIdx
at com.aspose.words.DocumentBuilder.zzZ(Unknown Source)
at com.aspose.words.DocumentBuilder.moveToParagraph(Unknown Source)
at com.proc.DisplayImageExtraction.MsspGear.main(MsspGear.java:69)
builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
Thanks and regards,
Priyanga G
Thanks for your inquiry. We tested this code with “FiguresAAA-test2.docx”, “FiguresAAA-test3.docx”, and “mssp_test.docx”. We have not faced any exception. Please share the input document for which you are facing this exception.
Hi @tahir.manzoor,
Thanks for your quick reply.
I am using the same code for the following document.
It shows the error:com.aspose.words.Table cannot be cast to com.aspose.words.Paragraph
please ,kindly help me to solve the error in the line: builder.moveToParagraph(paragraphs.indexOf((Paragraph)PreviousPara), -1);
Input 1: Bernardo_et_al_RevisedPaper_aspose.zip (2.9 MB)
Thanks and regards,
Priyanga G
We are investigating this issue and will share the modified code according to your requirement. We apologize for your inconvenience.
Thanks for your patience. In your document, the shapes are inside table and it also contains charts. Please use the following code example to get the desired output.
Document doc = new Document(MyDir + "Bernardo_et_al_RevisedPaper_aspose.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
ArrayList tables = new ArrayList();
int bookmark = 1;
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node PreviousPara = paragraph.getPreviousSibling();
while (PreviousPara != null && PreviousPara.toString(SaveFormat.TEXT).trim().length() == 0
&& ((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() == 0)
PreviousPara = PreviousPara.getPreviousSibling();
while (PreviousPara != null
&& PreviousPara.getNodeType() == NodeType.PARAGRAPH
&& (((Paragraph)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0 || ((Paragraph)PreviousPara).getChildNodes(NodeType.GROUP_SHAPE, true).getCount() > 0)
)
{
PreviousPara = PreviousPara.getPreviousSibling();
if(PreviousPara.toString(SaveFormat.TEXT).trim().length() > 0 &&
(PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") ||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(d)")
)
)
continue;
else
break;
}
if(PreviousPara == null)
{
builder.moveToDocumentStart();
builder.insertParagraph();
builder.startBookmark("Bookmark" + bookmark);
builder.moveToParagraph(paragraphs.indexOf(paragraph), 0);
builder.endBookmark("Bookmark" + bookmark);
bookmark++;
}
else if(PreviousPara.getNodeType() == NodeType.PARAGRAPH)
{
Node node = ((Paragraph)PreviousPara).getParentNode().insertAfter(new Paragraph(doc), PreviousPara);
builder.moveTo(node);
builder.startBookmark("Bookmark" + bookmark);
builder.moveTo(paragraph);
//builder.writeln();
builder.endBookmark("Bookmark" + bookmark);
bookmark++;
}
else if(PreviousPara.getNodeType() == NodeType.TABLE)
{
if(((Table)PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
tables.add(((Table)PreviousPara));
}
}
}
for (Bookmark bm : doc.getRange().getBookmarks())
{
if(bm.getName().startsWith("Bookmark"))
{
ArrayList nodes = ExtractContents.extractContent(bm.getBookmarkStart(), bm.getBookmarkEnd(), true);
Document dstDoc = ExtractContents.generateDocument(doc, nodes);
PageSetup sourcePageSetup = ((Paragraph)bm.getBookmarkStart().getParentNode()).getParentSection().getPageSetup();
dstDoc.getFirstSection().getPageSetup().setPaperSize(sourcePageSetup.getPaperSize());
dstDoc.getFirstSection().getPageSetup().setLeftMargin(sourcePageSetup.getLeftMargin());
dstDoc.getFirstSection().getPageSetup().setRightMargin(sourcePageSetup.getRightMargin());
if(dstDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().startsWith("Fig"))
dstDoc.getLastSection().getBody().getLastParagraph().remove();
dstDoc.save(MyDir + "out\\output"+i+".docx");
i++;
}
}
for(Table table : (Iterable<Table>)tables)
{
Document dstDoc = new Document();
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(table, true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
dstDoc.save(MyDir + "out\\output" + i + ".docx");
i++;
}