Getting Images of MS Word document?

How to get images of individual pages of MS Word document? Also which formats are supported?

Hi Wahaj,

Thanks for your inquiry.

*Wahaj Khan:

How to get images of individual pages of MS Word document?*

Images
in Word documents are either represented by Shape nodes or by DrawingML
nodes when loading into Aspose.Words’ DOM. Please read the
members of DrawingML and Shape class from here:
https://reference.aspose.com/words/net/aspose.words.drawing/shape/
https://reference.aspose.com/words/net/aspose.words.drawing/shape/

In your case, I suggest you please use the “PageSplitter”
example project as shown below. You can find PageSplitter project in Aspose.Words for .NET examples repository at GitHub.

Document doc = new Document(MyDir + "in.docx");
// Create and attach collector to the document before page layout is built.
LayoutCollector layoutCollector = new LayoutCollector(doc);
// This will build layout model and collect necessary information.
doc.UpdatePageLayout();
// Split nodes in the document into separate pages.
DocumentPageSplitter splitter = new DocumentPageSplitter(layoutCollector);
Document newDoc = splitter.GetDocumentOfPage(1);
NodeCollection shapes = newDoc.GetChildNodes(NodeType.Shape, true);
int imageIndex = 0;
foreach (Shape shape in shapes)
{
    if (shape.HasImage)
    {
        string imageFileName = string.Format(
        "Image.ExportImages.{0} Out{1}", imageIndex, FileFormatUtil.ImageTypeToExtension(shape.ImageData.ImageType));
        shape.ImageData.Save(MyDir + imageFileName);
        imageIndex++;
    }
}
// Newer Microsoft Word documents (such as DOCX) may contain a different type of image container called DrawingML.
// Repeat the process to extract these if they are present in the loaded document.
NodeCollection dmlShapes = newDoc.GetChildNodes(NodeType.DrawingML, true);
foreach (DrawingML dml in dmlShapes)
{
    if (dml.HasImage)
    {
        string imageFileName = string.Format(
        "Image.ExportImages.{0} Out{1}", imageIndex, FileFormatUtil.ImageTypeToExtension(dml.ImageData.ImageType));
        dml.ImageData.Save(MyDir + imageFileName);
        imageIndex++;
    }
}
Document doc = new Document(MyDir + "in.docx");
LayoutCollector layoutCollector = new LayoutCollector(doc);
LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int imageIndex = 0;
foreach (Shape shape in shapes)
{
    var renderObject = layoutCollector.GetEntity(shape);
    layoutEnumerator.Current = renderObject;
    int page = layoutEnumerator.PageIndex;
    if (shape.HasImage)
    {
        string imageFileName = string.Format(
        "Image.ExportImages.{0} Out{1}", imageIndex, FileFormatUtil.ImageTypeToExtension(shape.ImageData.ImageType));
        shape.ImageData.Save(MyDir + imageFileName);
        imageIndex++;
    }
}
// Newer Microsoft Word documents (such as DOCX) may contain a different type of image container called DrawingML.
// Repeat the process to extract these if they are present in the loaded document.
NodeCollection dmlShapes = doc.GetChildNodes(NodeType.DrawingML, true);
foreach (DrawingML dml in dmlShapes)
{
    var renderObject = layoutCollector.GetEntity(dml);
    layoutEnumerator.Current = renderObject;
    int page = layoutEnumerator.PageIndex;
    if (dml.HasImage)
    {
        string imageFileName = string.Format(
        "Image.ExportImages.{0} Out{1}", imageIndex, FileFormatUtil.ImageTypeToExtension(dml.ImageData.ImageType));
        dml.ImageData.Save(MyDir + imageFileName);
        imageIndex++;
    }
}

*Wahaj Khan:

Also which formats are supported?*

Aspose.Words supports many of image formats e.g .JPEG, .BMP, .PNG, .EMF, .TIFF, .SVG.

May be there is some misunderstanding… basically I want to generated images of the pages of the word document so that I can render instead of extracting images inside word document. Is the code supplied above performs that?

Also what output image formats is supported?

Hi Wahaj,

Thanks for sharing the detail. Please use the following code example to convert each page of document to image file format.

ImageSaveOptions class allows to specify additional options when rendering document pages or shapes to images. The output save format can be Tiff, Png, Bmp, Emf or Jpeg.

Please let us know if you have any more queries.

Document doc = new Document(MyDir + "in.docx");
ImageSaveOptions options = new ImageSaveOptions(SaveFormat.Png);
for (int i = 0; i < doc.PageCount; i++)
{
    options.PageCount = 1;
    options.PageIndex = i;
    doc.Save(MyDir + "Page_" + i + ".png", options);
}

ASA Tahir,

Is this also supported for Aspose Total for Java?

Wahaj

Hi Wahaj,

Thanks for your inquiry. Aspose.Words for .NET and Aspose.Words for Java are “twin brothers” products and together cover most of the popular development environments and deployment platforms.

Yes, you can use Aspose.Words for Java to achieve the same requirements. Please check the following code example.

Document doc = new Document(MyDir + "in.docx");
ImageSaveOptions options = new ImageSaveOptions(SaveFormat.PNG);
options.setPageCount(1);
for (int i = 0; i < doc.getPageCount(); i++)
{
    options.setPageIndex(i);
    doc.save(MyDir + "Page_" + i + ".png", options);
}

Ok thanks.

How about MS Excel to image ?

Also I believe this support the older version of MS Office document i.e. .xl and .doc?

Wahaj

Hi Wahaj,

Thanks for your inquiry. Yes, you can convert pages of Doc file format to images using Aspose.Words. With Aspose.Words, you can load file formats mentioned in following documentation link into Aspose.Words DOM.
https://reference.aspose.com/words/java/com.aspose.words/LoadFormat

Once you have loaded a file of LoadFormat into Aspose.Words DOM, you can easily convert it to image file format.

Regarding Excel documents to image conversion, please read following documentation link.
https://docs.aspose.com/cells/java/converting-worksheet-to-different-image-formats/

Hope this answers your query. Please let us know if you have any more queries.