Extract Images from Table Cells of Word DOCX Document & Save Images to Separate PDF Files using C# or Java

jan.kathir · July 23, 2020, 12:39pm

HI Team
I am facing an issue to extract the images which I mentioned below.Here a document contain tables each table consists of 2 images with separate fig caption while i am trying to extract the image it extracting 2 images in same file.My requirement is to extract the image separately .For your reference i had attached sample input and required output .Kindly provide solution for mentioned scenario it’s very needful.

Input::Input.zip (96.0 KB)
Output::output.zip (593.4 KB)

Thanks.

awais.hafeez · July 23, 2020, 1:36pm

@jan.kathir,

You can use the following C# Code of Aspose.Words for .NET API that extracts images from one DOCX Word document and saves those images into separate PDF files:

Document doc = new Document(@"E:\Temp\input\\List of figures .docx");

foreach (Shape shape in doc.GetChildNodes(NodeType.Shape, true))
{
    Cell cell = (Cell)shape.GetAncestor(NodeType.Cell);
    if (cell != null)
    {
        int cellIdx = cell.ParentRow.Cells.IndexOf(cell);
        string fileName = ((Row)cell.ParentRow.NextSibling).Cells[cellIdx].ToString(SaveFormat.Text).Trim();

        DocumentBuilder builder = new DocumentBuilder();
        builder.InsertNode(builder.Document.ImportNode(shape, true));
        builder.Document.Save(@"E:\Temp\input\\" + fileName + ".pdf");
    }
}

jan.kathir · July 25, 2020, 5:03am

@awais.hafeez
Thanks for your reply.

I need Aspose words in java not in .NET API .Kindly provide the solution.

Thanks

awais.hafeez · July 25, 2020, 7:40am

@jan.kathir,

Please try using the following Aspose.Words for Java equivalent code:

Document doc = new Document("E:\\Temp\\input\\List of figures .docx");
for (Shape shape : (Iterable<Shape>) doc.getChildNodes(NodeType.SHAPE, true)) {
    Cell cell = (Cell) shape.getAncestor(NodeType.CELL);
    if (cell != null) {
        int cellIdx = cell.getParentRow().getCells().indexOf(cell);
        String fileName = ((Row) cell.getParentRow().getNextSibling()).getCells().get(cellIdx).toString(SaveFormat.TEXT).trim();

        DocumentBuilder builder = new DocumentBuilder();
        builder.insertNode(builder.getDocument().importNode(shape, true));
        builder.getDocument().save("E:\\Temp\\input\\" + fileName + ".pdf");
    }
}

jan.kathir · July 26, 2020, 9:54am

@awais.hafeez

Thank you it’s working fine.