We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extract embedded object (gif-jpg-png- txt- html) from word document

Hi,
I understand that I can extract Office Object and PDF out from word document using aspose component by detecting the program ID. But If I have embedded object such as gif,jpg,png, txt, html in my word document, Is it possible for me to extract it out ? as aspose detect those object as “package”.
If it can extract it, can i have sample code on how to extract it using aspose ?
Thanks
Andy

Hi
Thanks for your request. I think that you can try using the following code as a workaround.

Document doc = new Document(@"Test188\in.doc");
// Get collection of shapes
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int i = 0;
foreach (Shape shape in shapes)
{
    if (shape.OleFormat != null)
    {
        // Generate file name
        string name = @"Test188\obj_" + shape.OleFormat.ProgId + i.ToString();
        // Detect prog Id
        if (shape.OleFormat.ProgId == "Package")
        {
            // Save object to memory
            MemoryStream strm = new MemoryStream();
            shape.OleFormat.Save(strm);
            try
            {
                // If image is valid image then save it as PNG image
                Bitmap img = (Bitmap)Bitmap.FromStream(strm);
                img.Save(name + ".png", ImageFormat.Png);
            }
            catch
            {
                Stream file = new FileStream(name + ".object", FileMode.Create);
                strm.WriteTo(file);
            }
        }
        i++;
    }
}

Also note that MS Word detect progID as “Package” too.
Best regards.

Hi,
Thanks for the suggestion. I have already tried the code, and it seems that it took quite sometime to create the image file.
Besides that, embedded html and embedded txt file is also recognize as “package”. How can I extract it out from the word document?
Thanks for your help
Regards,
Andy

Hi
Thanks for your request. Aspose.Words allows you to extract any OLE embedded object from the document. You just need to detect format of the extracted file. For example you can try using Document.DetectFileFormat method to detect HTML files.
LoadFormat format = Document.DetectFileFormat(oleStream);
Also you can create your custom logic to detect format of other files.
Best regards.

Hi.

Thanks for the suggestion. It works.

But I encounter a new problem. Aspose.words doesn’t recognize an embedded text document. How can I extract a text document (.txt) inside word document using aspose.words?

Thanks
Andy

Hi
Thanks for your inquiry. Unfortunately there is no way of doing it, basically. A stream is just a sequence of bytes and there is no direct way to determine if this stream is text document or not. Please see the following link to learn more about this problem:
https://bytes.com/topic/c-sharp/answers/235065-how-determine-stream-type
Best regards.