How to Extract embedded non-office ole object like zip file


I have a word file, say abc.doc.

I embedded the zip file, say, inside the abc.doc going through the process
Insert > Object > create from file

Now i want to extract that zip file and save it in my local path, say D:\myDirectory.

Is it possible to do that? If Yes, How?

Thank you

Thanks for your request. Yes, of course you can extract OLE embedded objects using Aspose.Words. Please see the following code:

// Open document
Document doc = new Document(@"Test091\in.doc");
// OLE embedded objects are available through OleFormat property of Shape object
// So first we should extract shapes from the document
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
// Index will be used to generate unique name for each extracted object
int oleIndex = 0;
// Define variable that specifies base path 
string basePath = @"Test091\obj_{0}.{1}";
// Loop through all shapes
foreach (Shape shape in shapes)
    // Check whether shape contains OLE object
    if (shape.OleFormat != null)
        // We can determine type of an OLE object using ProgId property
        switch (shape.OleFormat.ProgId)
            // ZIP archive ProgId = Package
            case "Package":
                shape.OleFormat.Save(String.Format(basePath, oleIndex, "zip"));
            // DOC files
            case "Word.Document.8":
                shape.OleFormat.Save(String.Format(basePath, oleIndex, "doc"));
            // By default let's save OLE object with .object extension
                shape.OleFormat.Save(String.Format(basePath, oleIndex, "object"));
        // Increase index

I hope this could help you.
Best regards.

Thanks for replying,

Just want to ask what if there is the rar file, zip file,tar file and any other file format other than office file embedded inside the doc file.

Can we distinguish them if they all are shown as “ProgId=Package”.

Thank you once again.

Thanks for your request. Aspose.Words does not allow determining this. Unfortunately, Aspose.Words also does not allow getting an original file name of OLE object. This is an issue #3257 in our defect database. You can create your own logic to determine file format.
Best regards.


Since Aspose.Cells has OLEObject that provide the sourcefullname(i.e. original file name), I thought that Aspose.Words has some sort of OLEObject that will give us the sourcefullname. But as you said “Aspose.Words also does not allow getting an original file name of OLE object” then Do we have to use “DocumentVisitor” class with our logic to identiy the format of the embedded object.

If there is any other way to identify the file format of the embedded object in word file i will be happy to hear that.

Thanks once again.

Thanks for your request. Actually, there could be several types of OLE objects: embedded (stored in the document) and linked.
You can get name of linked object using Aspose.Words. See the following link for more information:
What kind of OLE object are you using? Could you please attach sample document for testing?
Best regards.

Thank you for your help.

We were able to extract non-office ole like rar, tar, zip, cpp, rdp, bmp from the msword file.
In order to identify which file is in the package we use GetOLEEntry.

Once again thanks for your time.

The issues you have found earlier (filed as 3257) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.