Extract images with fig caption starting with legends a,b,c,d... numbered lists

Hi team,

As before requesting a work around solution to extract images with caption starting with a,b,c,d… numbered lists . The text a,b… is actually not a text rather it is a numbered list. Kindly help us to find out a work around solution to extract these images also.

Regards
Priya Dharshini J Ptest8.zip (2.7 MB)

@priyadharshini,

Thanks for your inquiry. You can build on the following code to find Shape associated with a,b,c,d numbered list:

Document doc = new Document(MyDir + @"test (8).docx");
doc.UpdateListLabels();

foreach(Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (para.IsListItem)
    {
        if (para.ListLabel.LabelString == "(a)")
        {
            // ascend up the document hierarchy to find the Shape
            // and then use ShapeRenderer to save it to disk like below
            Shape shape = null; // Some Shape
            ShapeRenderer renderer = shape.GetShapeRenderer();
            renderer.Save(MyDir + "image1.jpg", new ImageSaveOptions(SaveFormat.Jpeg));
        }
    }
}

Hope, this helps.

Best regards,
Awais Hafeez

Thanks for the reply Awais Hafeez, but i need to extract images with bulleted list caption using paragraph nodes and render them by embedding in another docx separately for each image. Please Help out.

Due to time consistency, requesting solution as soon as possible…

Regards
Priya Dharshini J P

@priyadharshini,

Thanks for your inquiry.

For example, the 37th page of ‘test (8).docx’ contains an image with a ‘bulleted list caption’ i.e. the text ‘(a) Hydrogen mass flow rate’. This list item is basically a Paragraph and you can detect it by using the Paragraph.IsListItem property. You also need to call Document.UpdateListLabels method to be able to correctly read list labels. Once you have found the list item, you then need to ascend up the document hierarchy to find the Shape (it depends on Document Structure). After the Shape is rendered to disk, you can insert it into a new Document. Please see the following code and articles:

Document doc = new Document(MyDir + @"test (8).docx");
doc.UpdateListLabels();

int i = 0;
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (para.IsListItem)
    {
        if (para.ListLabel.LabelString == "(a)")
        {
            if (para.PreviousPreOrder(doc).PreviousPreOrder(doc).NodeType == NodeType.Shape)
            {
                Shape shape = (Shape)para.PreviousPreOrder(doc).PreviousPreOrder(doc);
                ShapeRenderer renderer = shape.GetShapeRenderer();
                renderer.Save(MyDir + "image" + i + ".jpg", new ImageSaveOptions(SaveFormat.Jpeg));
                i++;
            }                        
        }
    }
}

Aspose.Words Document Object Model
Use DocumentBuilder to Insert Document Elements

Hope, this helps.

Best regards,
Awais Hafeez

Thank you @awais.hafeez

Can you provide the solution in JAVA

Regards
Priya Dharshini J P

@priyadharshini,

Thanks for your inquiry. Please try using the following Java code:

Document doc = new Document("D:\\temp\\test (8).docx");

doc.updateListLabels();

int i = 0;
for (Paragraph para : (Iterable<Paragraph>)doc.getChildNodes(NodeType.PARAGRAPH, true))
{
    if (para.isListItem())
    {
        if (para.getListLabel().getLabelString().equals("(a)"))
        {
            if (para.previousPreOrder(doc).previousPreOrder(doc).getNodeType() == NodeType.SHAPE)
            {
                Shape shape = (Shape)para.previousPreOrder(doc).previousPreOrder(doc);
ShapeRenderer renderer = shape.getShapeRenderer();
renderer.save("D:\\temp\\image" + i + ".jpg", new ImageSaveOptions(SaveFormat.JPEG));
                i++;
            }
        }
    }
}

Hope, this helps.

Best regards,
Awais Hafeez

Thanking You.

Regards
Priya Dharshini J P