Extract charts from word document

Hi Team,
With regard to Extract charts from word document using figure caption .Finally remove the figure caption.
Hi Priya,

Thanks for your inquiry. We have already answered your query here in this post. Please follow that thread for further proceedings.

Hi Team,

I am able to extract the images using the following code .but not able to extract the chart objects.Now My query is to extract the chart objects from word document.
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true); </div>
for (Shape shape : (Iterable) shapes) {
if (shape.hasImage() && shape.getParentParagraph().getNextSibling() != null
&& shape.getParentParagraph().getNextSibling().getNodeType() == NodeType.PARAGRAPH) {
				if <span class="Apple-tab-span" style="white-space:pre">		</span>(shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT).startsWith("Fig")

||shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT).startsWith(“Sch”)) {
caption = shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT);
name = null;

Thanks in advance,
kind regards,
priyanga G
Hi Team,
Thank you for your quick response. I am able to extract the images using the following code .but not able to extract the chart objects.Now My query is to extract the chart objects from word document and each chart will be stored in each document. finally remove the figure caption under the charts.
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true); \
for (Shape shape : (Iterable) shapes) { if (shape.hasImage() && shape.getParentParagraph().getNextSibling() != null && shape.getParentParagraph().getNextSibling().getNodeType() == NodeType.PARAGRAPH) { if (shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT).startsWith("Fig") ||shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT).startsWith("Sch")) { caption = shape.getParentParagraph().getNextSibling().toString(SaveFormat.TEXT); name = null;



Thanks in advance,
kind regards,
priyanga G

Hi Priya,

Thanks for your inquiry. Could you please share your input and expected output documents here for our reference? We will then share the code example according to your requirement.

Hi Tahir,

Thank you for your quick reply.i have enclosed the input document and output files.I hope this will help you .










Thank you
kind regards,
Priyanga.G
Hi Priyanga,

Thanks for sharing the documents. Please use the following code example to achieve your requirements. Hope this helps you.

We suggest you please read about Aspose.Words' DOM.
Aspose.Words Document Object Model

Document doc = new Document(MyDir + "test+(4).docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape shape : (Iterable) shapes)
{
if(shape.hasChart())
{
Document dstDoc = new Document();

NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(shape, true);
dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
dstDoc.save(MyDir + "output"+i+".docx");
i++;
}
}

Hi Team,

Thank you for your help.Now I am able to extract chart objects from document but having some issues in bar chart .The issue is <span style=“font-size:10.0pt;font-family:
“Cambria”,“serif”;mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;
mso-bidi-font-family:“Times New Roman”;mso-ansi-language:EN-IN;mso-fareast-language:
EN-IN;mso-bidi-language:AR-SA”>Y-axis text cut in the Master PDF
” and text also collapsed with images .
Hi Priyanga,

Thanks for your inquiry. Could you please share your input document and code example to reproduce this issue at our end? We will investigate the issue on our side and provide you more information.

Hi team ,

Thank you for your help.Now I am able to extract chart objects from document but having some issues in bar chart .The issue is in bar chartY-axis text cut in the Master PDF” and text also collapsed with images .Here I have enclosed my document for reference. I am using the following code.


<pre style=“font-family: “Courier New”; font-size: 9pt;”>Document doc = new Document(MyDir + “test+(4).docx”);
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape shape : (Iterable) shapes)
{
if(shape.hasChart())
{
Document dstDoc = new Document();

NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
Node newNode = importer.importNode(shape, true);
dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
dstDoc.save(MyDir + “output”+i+".docx");
i++;
}
}









Thanks in advance,
Priyanga G
Hi Priyanga,

Thanks for sharing the detail. We have tested the scenario using latest version of Aspose.Words for Java 17.6 and have not found the shared issue. Please use Aspose.Words for Java 17.6. We have attached the output PDF with this post for your kind reference.

Hi Tahir,

Apologize me for the late reply .Nice output .It was working fine.Thank you aspose team also.


Thanks and kind regards,
Priyanga G
Hi Priyanga,

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

Hi Tahir,

Iam successfully extract the chart objects. Thank you for your kind help.My task is to extract the images betweeen paragraphs.Now Iam successfully extract the png,jpeg,chart objects are extracted and saved as pdf. but some of the images are not extract and not able to saved as pdf. I have enclosed the input and output documents .


Thanks & regards
priyanga G

@priyanga,

Thanks for your inquiry. Please use following modified code example to get the desired output. Hope this helps you.

Document doc = new Document(MyDir + "test+(14).docx");
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape shape : (Iterable<Shape>) shapes)
{
    if(shape.hasChart() || shape.hasImage())
    {
        Document dstDoc = new Document();

        NodeImporter importer = new NodeImporter(doc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
        Node newNode = importer.importNode(shape, true);
        dstDoc.getFirstSection().getBody().getFirstParagraph().appendChild(newNode);
        dstDoc.save(MyDir + "output"+i+".docx");
        i++;
    }
}

Hi Tahir ,
I am having hope in your solutions .sorry,I will need the solution in java.

Thanks and kind regards,
Priyanga G

Hi tahir,
Thank you very much for your solution.Sorry for the confusion .please ignore my previous message

Thanks & kind regards,
Priyanga G

@priyanga,

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.