Hi,
Please help in extracting the unnumbered image from the word document
Sample.zip (156.5 KB)
Thanks in advance.
Hi,
Please help in extracting the unnumbered image from the word document
Sample.zip (156.5 KB)
Thanks in advance.
Thanks for your inquiry. Please ZIP and attach your expected output documents here for our reference. We will then provide you more information about your query along with code.
Thanks for sharing the detail. Please use the following code example to get the desired output.
Document doc = new Document(MyDir + "sample.docx");
int i = 1;
for (Shape shape : (Iterable<Shape>) doc.getChildNodes(NodeType.SHAPE, true))
{
Document dstDoc = new Document();
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(shape.getParentParagraph(), true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
dstDoc.save(MyDir + "output"+i+".docx");
i++;
}
Hi Tahir
I am able to extract only two images…The 3rd image is not extracted.Sample.zip (121.1 KB)sample2.zip (424.7 KB)
In sample2.zip not all the images are extracted.
Thanks for your inquiry. The code example shared in my previous post works for the shared document (sample.docx).
In this case, we suggest you following solution.
Can you kindly share the code.
Thanks for your inquiry. Please use the following code example to get the desired output.
Document doc = new Document(MyDir + "sample2.docx");
int i = 1;
for (Paragraph paragraph : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true))
{
if(paragraph.toString(SaveFormat.TEXT).contains("Fig")
|| paragraph.toString(SaveFormat.TEXT).contains("(a)")
|| paragraph.toString(SaveFormat.TEXT).contains("(b)"))
{ System.out.println(paragraph.getText());
if(paragraph.getChildNodes(NodeType.SHAPE, true).getCount() > 0)
{
Document dstDoc = new Document();
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(paragraph, true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
dstDoc.save(MyDir + "output"+i+".docx");
i++;
}
else if(paragraph.getChildNodes(NodeType.SHAPE, true).getCount() == 0
&& paragraph.getPreviousSibling() != null
&& paragraph.getPreviousSibling().getNodeType() == NodeType.PARAGRAPH
&& ((Paragraph)paragraph.getPreviousSibling()).getChildNodes(NodeType.SHAPE, true).getCount() > 0)
{
Document dstDoc = new Document();
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode((Paragraph)paragraph.getPreviousSibling(), true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
dstDoc.save(MyDir + "output"+i+".docx");
i++;
}
}
}
I am not able to extract the labelled images a,b.I have attached the input and output file to it.
ReferenceDoclet_withSingleCell.zip (22 Bytes)
doc2_output.zip (303.6 KB)
Thanks for your inquiry. The ReferenceDoclet_withSingleCell.zip contains no document. Could you please ZIP and attach your input Word document for testing? We will investigate the issue on our side and provide you more information.
doc2_Sample.zip (297.4 KB)
Please find attached the input document.The expected output must contain every image as a single image.
The output must be doc2_output.zip (303.6 KB)
Thanks for your inquiry. Please use the following code example to get the desired output. Hope this helps you.
Document doc = new Document(MyDir + "doc2_Sample.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node PreviousPara = paragraph.getPreviousSibling();
if (PreviousPara != null &&
(PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") )
) {
PreviousPara = PreviousPara.getPreviousSibling();
if (PreviousPara != null && ((Paragraph) PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0) {
for (Shape shape : (Iterable<Shape>) ((Paragraph) PreviousPara).getChildNodes(NodeType.SHAPE, true))
{
Document dstDoc = new Document();
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(shape, true);
dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
dstDoc.save(MyDir + "output" + i + ".docx");
i++;
}
}
}
}
}
Thanks for the feedback. I have a document where I am not able to extract a , b images seperately.The sample input is 3.zip (642.2 KB)
Expected output is Expected_Output.zip (664.4 KB)
Thanks in advance.
Thanks for your inquiry. In this case, the images are inside the table node. You need to list down all your use cases and extract the images accordingly. Please use the following modified code to get the desired output.
Document doc = new Document(MyDir + "3.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node PreviousPara = paragraph.getPreviousSibling();
if (PreviousPara != null &&
(PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") )
)
{
if (PreviousPara != null && PreviousPara.isComposite() && ((CompositeNode) PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0) {
for (Shape shape : (Iterable<Shape>) ((CompositeNode) PreviousPara).getChildNodes(NodeType.SHAPE, true))
{
Document dstDoc = new Document();
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(shape, true);
dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
dstDoc.save(MyDir + "output" + i + ".docx");
i++;
}
}
}
}
}
I have extracted images from this code. .I need the label a,b on each labelled image extraction. Input document is Sample1.zip (596.6 KB)
Expected output is sample_output.zip (610.6 KB)
private static void unNumberedImageExtrac(Document interimdoc) throws Exception
{
Document doc = new Document(filearg);
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node PreviousPara = paragraph.getPreviousSibling();
if (PreviousPara != null &&
(PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)")||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(c)")||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(d)"))
) {
PreviousPara = PreviousPara.getPreviousSibling();
try{
if (PreviousPara != null && ((Paragraph) PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0) {
for (Shape shape : (Iterable<Shape>) ((Paragraph) PreviousPara).getChildNodes(NodeType.SHAPE, true))
{
Document dstDoc = new Document();
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(shape, true);
dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
dstDoc.getPreviousSibling();
dstDoc.save(folderName + "output_B" + i + ".docx");
dstDoc.save(folderName + "output_B" + i + ".jpeg");
dstDoc.save(folderName + "output_B" + i + ".pdf");
i++;
}
}
}
catch(Exception e){
}
}
}
}
}
Thanks for your inquiry. We are working over your query and will get back to you with code example. Thanks for your cooperation.
For this case, please use the following code example. Hope this helps you.
Document doc = new Document(MyDir + "Sample1.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph paragraph : (Iterable<Paragraph>) paragraphs)
{
if(paragraph.toString(SaveFormat.TEXT).trim().startsWith("Fig"))
{
Node PreviousPara = paragraph.getPreviousSibling();
if (PreviousPara != null &&
(PreviousPara.toString(SaveFormat.TEXT).trim().contains("(a)") ||
PreviousPara.toString(SaveFormat.TEXT).trim().contains("(b)") )
)
{
Node label = PreviousPara;
if(label != null)
{
PreviousPara = label.getPreviousSibling();
if (PreviousPara != null && PreviousPara.isComposite() && ((CompositeNode) PreviousPara).getChildNodes(NodeType.SHAPE, true).getCount() > 0) {
for (Shape shape : (Iterable<Shape>) ((CompositeNode) PreviousPara).getChildNodes(NodeType.SHAPE, true))
{
Document dstDoc = new Document();
NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(shape, true);
dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
newNode = importer.importNode(label, true);
dstDoc.getFirstSection().getBody().appendChild(newNode);
if(i%2 == 0)
dstDoc.getFirstSection().getBody().getLastParagraph().getRange().replace("(a)", "", new FindReplaceOptions());
else
dstDoc.getFirstSection().getBody().getLastParagraph().getRange().replace("(b)", "", new FindReplaceOptions());
dstDoc.save(MyDir + "output" + i + ".docx");
i++;
}
}
}
}
}
}