Extract OLE Object(PDF) from a word file

Hi,
Can someone please please please help me to extract a embedded PDF object from a word file . This is a huge headache for me now. I am new to Aspose. Please give me the exact code what i should use along with the references and dlls i need to download for this.

Please guys, its really very urgent for me.

Hello,
Thank you for your request.
Please look the code proposed by Alexey here in this post.
Perhaps this is what you are looking for.

Hi Sathish,
Thanks for your request. The code provided in the thread mentioned by Victor is obsolete a bit. It will work, but the current API provides a more convenient way to determine extension of the file. Here is updated code:

// Open document
Document doc = new Document(@"Test091\in.doc");
// OLE embedded objects are available through OleFormat property of Shape object
// So first we should extract shapes from the document
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
// Index will be used to generate unique name for each extracted object
int oleIndex = 0;
// Define variable that specifies base path
string basePath = @"Test091\obj_{0}.{1}";
// Loop through all shapes
foreach(Shape shape in shapes)
{
    // Check whether shape contains OLE object
    if (shape.OleFormat != null)
    {
        shape.OleFormat.Save(String.Format(basePath, oleIndex, shape.OleFormat.SuggestedExtension));
        // Increase index
        oleIndex++;
    }
}

Best regards,

Thanks a lot guys, I somehow managed to Extract the PDF object from the Word document.

But now am struck with HTM. It also saves HTM documents as PNG or JPG. Any idea how to resolve it. Can you give me the code.

Am calling this code FileFormatInfo Format = FileFormatUtil.DetectFileFormat(SourceFile); to detect the file format, but unfortunately It is not detecting the exact source file format.

This is going to my last hindrance in my project and i have to overcome it.

Thanks to Aspose.

I somehow managed to Extract the PDF object from the Word document.

But now am struck with HTM. It also saves HTM documents as PNG or JPG. Any idea how to resolve it. Can you give me the code.

Am calling this code FileFormatInfo Format = FileFormatUtil.DetectFileFormat(SourceFile); to detect the file format, but unfortunately It is not detecting the exact source file format.

This is going to my last hindrance in my project and i have to overcome it.

Thanks to Aspose.

Hello,
Thank you for additional information.
Could you send this file to us. I will analyze it and do some advise.

This is the code am using. Now the last Default statement executes when the object is a package file. It saves all other types as JPG. Please let me know what modification should i do to this. Also you can try embedding any HTML document in a word file and i want to extract it.
PLEASE LET ME KNOW HOW CAN I SEND ME ENTIRE SOLUTION TO YOU? I DONT FIND ANY ATTACH FILE OPTION HERE

public void Extract(string SourceFile, string DestinationFolder, string SourceFileName)
{
    Aspose.Words.Document doc = new Aspose.Words.Document();
    doc = new Aspose.Words.Document(SourceFile);
    Aspose.Words.NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
    int i = 1;
    foreach(Shape shape in shapes)
    {
        if (shape.OleFormat != null)
        {
            // Get extension of th eOL object.
            string ext = "object";
            switch (shape.OleFormat.ProgId)
            {
                case "Excel.Sheet.8":
                    ext = "xls";
                    break;
                case "Excel.Sheet.12":
                    ext = "xlsx";
                    break;
                case "AcroExch.Document.7":
                    ext = "pdf";
                    break;
                case "Word.Document.8":
                    ext = "doc";
                    break;
                case "Word.Document.12":
                    ext = "docx";
                    break;
                case "PowerPoint.Show.8":
                    ext = "ppt";
                    break;
                case "PowerPoint.Show.12":
                    ext = "pptx";
                    break;
                default:
                    ext = "jpg";
                    break;
            }
            shape.OleFormat.Save(String.Format(DestinationFolder + "/" + SourceFileName + "_{0}.{1}", i, ext));
            i++;
        }
    }
}

Most likely you click the link “Quick replay”.
Near (below) with the title of my response is a button “Reply”. Clicking on it brings you to an expanded form of an answer. There you can attach the file.
I have now tried to create a simple doc file and put into it a simple html file. I use the code, which gave Alexey. And I managed to get the embedded document. (see my attachments)

Thank you for all your Excellent Support. I am able to achieve it finally. I really appreciate your prompt response and support.