Free Support Forum - aspose.com

Extracting different types of embedded objects

Dear,

I would like to understand how I can retrieve all embedded objects from the Docx document.
I understand how I can retrieve shapes, but they are only very limited in types.

How can I detect/retrieve an embedded TIFF image?
Would this be using OLEObject/OleControl of some kind?

A small code snippet would be appreciated.

Many thanks.
Patrick

@PatrickVB

Thanks for your inquiry. Please check following code snippet to extract OLE objects and images from Word document. Please note currently Aspose.Words does not support to extract TIFF image. However as a workaround you may check Unknown type and save it as PNG. We have already logged an issue WORDSNET-15524 for TIFF image extraction. We have linked your post to the issue id and will notify you as soon as it is resolved.

Document doc = new Document("Doc_with_tiff.docx");

NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int imageIndex = 0;
foreach (Shape shape in shapes)
{
    // ole objects
    if (shape.OleFormat != null)
    {
        Console.Out.WriteLine(shape.OleFormat.SuggestedExtension);
        shape.OleFormat.Save(String.Format("E:\\Data\\out_{0}.{1}", imageIndex++, shape.OleFormat.SuggestedExtension));
    }
    // images
    if (shape.HasImage)
    {
        if (shape.ImageData.ImageType.ToString() == "Unknown")
        {
            string imageFileName = string.Format("E:/Data/" +
                    "Image.ExportImages.{0}_out{1}", imageIndex, ".PNG");
            shape.ImageData.Save(imageFileName); }
        else
        {
            string imageFileName = string.Format("E:/Data/" +
                "Image.ExportImages.{0}_out{1}", imageIndex, FileFormatUtil.ImageTypeToExtension(shape.ImageData.ImageType));
            shape.ImageData.Save(imageFileName);
        }
        imageIndex++;
    }
}

Best Regards,

Many thanks for your answer.

In case I would like to have more certainty about the fact that the image is effectively TIFF, I could use probably a mime type detection library or something similar.

We have a requirement in our system that the document can only contain TIFF/JPEG and PNG images. No other image formats are acceptable.

So before accepting/rejecting such a document we must be 100% sure that the image is indeed a TIFF image or not.
What do you think of that approach.

Do you have any mime type detection capabilities with the Apose Product suite (eg in Aspose Imaging).

Kind regards

Patrick

Dear Titla,

I forgot to mention thsat I’m using the Aspose Words Java version, not the dot net version. But I assume that this does not make any difference?

Regards

Patrick

@PatrickVB

Thanks for your feedback. As a workaround to detect TIFF image, you can check manually first two bytes of unknown image or use Aspose.Imaging to detect image type of unknown image, until above logged issue is resolved. Other two image types( PNG and JPEG) can easily detected with Aspose.Words.

com.aspose.words.Document doc = new	com.aspose.words.Document("D:/Downloads/Doc_with_tiff.docx");

int i = 0;

// Get collection of shapes

NodeCollection<Shape> shapes = doc.getChildNodes(NodeType.SHAPE, true);

// Loop through all shapes

for (Shape shape : shapes)

{

if (shape.hasImage())

{
	if (shape.getImageData().getImageType() == 1)
    {
		System.out.println(imageType);
		String imageFileName = ("Image.ExportImages_"+ i++ + ".PNG");
				shape.getImageData().save(imageFileName); 
				}
	
    else
    {
String imageFileName = ("Image.ExportImages_"+ i++ + 
FileFormatUtil.imageTypeToExtension(shape.getImageData().getImageType()));
shape.getImageData().save(imageFileName);}
}

}

Aspose.Imaging code:

int imageType = (int)com.aspose.imaging.Image.getFileFormat(shape.getImageData().toStream());

String image = null;
   switch (imageType) {
       case 0:  image = "Undefined";
            break;
   case 1:  image = "Custom";
            break;
   case 2:  image = "BMP";
            break;
   case 4:  image = "GIF";
            break;
   case 8:  image = "JPEG";
            break;
   case 16: image = "PNG";
            break;
   case 32: image = "TIFF";
            break;
   case 64: image = "PSD";
            break;
   case 128: image = "DXF";
	     break;
   }
  System.out.println(image);

Best Regards,