How to extract image from document?

Hi,Support,

I have question that how to extract all or custom image from document and then save as jpeg/png/gif/bmp/tiff.
My code is as follow based on VB.net:

Private Function ExtractImages(ByVal sFile As String, ByVal Path As String) As Long
    Dim doc As Global.Aspose.Words.Document = New Global.Aspose.Words.Document(sFile)
    Dim shapes As Global.Aspose.Words.NodeCollection = doc.GetChildNodes(Global.Aspose.Words.NodeType.Shape, True)
    Dim imageIndex As Long = 0
    Dim imageFileName As String
    Dim Pn As Long = shapes.Count
    For Each shape As Global.Aspose.Words.Drawing.DrawingML In shapes
        If shape.ImageData.HasImage Then
            imageFileName = String.Format("Image.ExportImages.{0}_out.tiff", imageIndex)
            'shape.ImageData.getImageData().save(Path & "\" & imageFileName)
            shape.ImageData.Save(Path & "\" & imageFileName)
            imageIndex = imageIndex + 1
            Application.DoEvents()
        End If
    Next
End Function

My questions are:

  1. how to save extracted image as jpeg/png/gif/bmp format file?
  2. how to save image with parameter such as resolution, dpi, new size,quality,gray/whiteblack mode, compress value or mode?
  3. how to get the source size the picture? for example, the original size of the embedded picture is 3000x2000, but the code shape.widh maybe return the value =565.
  4. how to get and save the source image? for example, a picture with size=3000x2000 and full Exif information and dpi=300 embedded into main document, how to get and save the source image into jpeg file without any change for it? that’s is to say, the size of the saved image is still 3000x2000,not 565x423, and the dpi is still 300, not 72 or 96, and all the exif information kept.

How to achieve this purpose?

Thanks for your help!

Ducaisoft

@ducaisoft,

I think, you can meet these requirements by using the following code:

Document doc = new Document("E:\\temp\\input.docx");
           
int i = 0;
foreach (Shape img in doc.GetChildNodes(NodeType.Shape, true))
{
    if (img.HasImage)
    {
        double widthPixels = ConvertUtil.PointToPixel(img.Width, 72);
        double heightPixels = ConvertUtil.PointToPixel(img.Height, 72);

        ShapeRenderer renderer = img.GetShapeRenderer();

        ImageSaveOptions opts = new ImageSaveOptions(SaveFormat.Png);
        opts.ImageColorMode = ImageColorMode.Grayscale;
        opts.HorizontalResolution = (float)widthPixels;
        opts.VerticalResolution = (float)heightPixels;

        renderer.Save("E:\\temp\\img_" + i + ".png", opts);
        i++;
    }
} 

Please try different options of ImageSaveOptions and Shape classes to get the desired results. Also, you can convert the above C# code to VB.NET yourself by using some converter. For example:

Thanks for your suggestion.
It seems it can extract images and save as different imageformat.
But it seems it can not extract and save images with original size and quality as well as keeping its exif infomation. For example, In MS Word, this can be achieve like this:
ActiveDocument.InlineShapes(I).reset ’ reset the scaled insert image to its original size, this code can extract
ActiveDocument.Saveas HtmlFile 'this code can extract inserted images with original size and exif information.

Is it possible to achieve this by Aspose.words.dll? or this feature may be updated in the future version.

@ducaisoft,

Please ZIP and attach your sample Word document (.docx file) containing the images here for testing. What original size you expect for the extracted images? We will then investigate the scenario on our end and provide you more information.

Ok!
Please refer to the demo document and view my VBA code for your investigation.
Input(embeded images).zip (842.7 KB)

@ducaisoft,

We are working on your query and will get back to you soon.

@ducaisoft,

We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-18157. We will further look into the details of this requirement and will keep you updated on the status of the linked issue.

@ducaisoft,

You can use “ShapeRenderer and “ImageSaveOptions” to complete these tasks. e.g.

Shape img = doc.FirstSection.Body.Shapes[0];
ShapeRenderer renderer = img.GetShapeRenderer();
// Pass required format.
ImageSaveOptions so = new ImageSaveOptions(SaveFormat.Jpeg);
// Set resolution in DPI.
so.VerticalResolution = 204;
so.HorizontalResolution = 196;
// Set quality (compression-ratio).
so.JpegQuality = 100;
// Set gray/whiteblack/none mode.
so.ImageColorMode = ImageColorMode.Grayscale; 

However, “ImageSaveOptions” constructor description has not “GIF” among supported formats. We will update the description soon.

And

Actually, document package contains pictures with small size and without EXIF information. We used https://exifinfo.org site to check such information.

We checked in the MS Word 2003 + Microsoft Office Compatibility Pack this macro:

Application.ActiveDocument.InlineShapes(i).Reset

And have observed that sizes of images are not changed. I.e we see original sizes. Similar operation can be made in the MS Word GUI from the menu “Format->Reset Picture->Reset Picture & Size”.

We created “imageTest.docx” (ImageTest.zip (858.1 KB)), which was saved by MS Word with “Do not compress images in file” option. Shape in the document has size 292x194 points at this case. However, original image, which fills the shape, has size 3204x2136 points (4272x2848 pixels). So, sizes can be extracted with the following code:

Document doc = new Document(@"imageTest.docx");
 
Shape shape = doc.FirstSection.Body.Shapes[0];
// Shape size.
double width = doc.FirstSection.Body.Shapes[0].Width;
double height = doc.FirstSection.Body.Shapes[0].Height;
 
// Size of the image which fills the shape.
double fillHeight = shape.ImageData.ImageSize.HeightPoints;
double fillWidht = shape.ImageData.ImageSize.WidthPoints;

And the result will be as expected. We think, you want to obtain these sizes, but saved the file with compression option and lost original fill size.

According to keeping EXIF information – currently there is only one option i.e. save image as is from the package. We created and attached one more document where image has Exif information i.e. “imageTest2.docx” (imageTest2.zip (1.8 MB)).

Document doc = new Document(@"imageTest2.docx");
Shape shape = doc.FirstSection.Body.Shapes[0];
shape.ImageData.Save(@"imageTest2_extractedImage.jpg");

However if you want to change resolution, color mode or quality then you have to use “ShapeRenderer”, which currently does not preserve EXIF information.

We think that you do not want to add an API to access the Exif information and your aim is to re-save picture and preserve original size and EXIF data. This task can be completed currently with “ImageData.Save” method. However, it also looks like that you want to change the resolution, color mode and quality along with preserving EXIF data. This case requires changes in the “ShapeRenderer”.

There are too many questions and we actually do not fully understand your needs. As far as we understand, main question is to get ORIGINAL source image. For this, you can use the above methods to get the original image: Shape.ImageData.Save() or Shape.ImageData.ToImage() or Shape.ImageData.ToByteArray().

Please share your further input on this topic. Thanks for your cooperation.