We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extracting embedded object

In my application, I need to extract ole objects, sometimes the program ID returned for the ole object is “Package”, and if I save the ole data to a file, the file is not openable, it looks like there is some kind of “wrapper” around it. Is there a way to save just the embedded ole data?

Please see the attached example, this is a Word doc that has a msg file embedded. And I saved the extracted data in EmbeddeMsg_0.unknown. If I rename the EmbeddeMsg_0.unknown to EmbeddeMsg_0.msg, it is not openable in Outlook.

Thanks for looking into it.

Hi,

Thank you for reporting the issue to us. I have logged it as #3154; however, we did not plan to improve OLE objects extracting anytime soon. As far as I know Word indeed adds some wrapping bytes to the real data but it is not documented and requires significant efforts to research. Thank you for understanding.

We noticed that we can extract embedded word, excel,ppt and pdf without having the “wrapper”. Are these ole types done differently than the ole type “Package”? What would it take to “unwrap” ole objects? We are trying to figure out a solutions, we would really appreciate if it you can shed some light on this.

Thanks.

Sorry I don’t know. It’s just not documented.

In this case the embedded object is an OLE1 object and takes few extra steps to extract. A sample project is attached here
https://apireference.aspose.com/words/net/aspose.words.drawing/oleformat

Next version of Aspose.Words will even provide an easier way to extract OLE objects regardless of whether they are OLE1 or OLE2.

Roman,

I emailed you the documents. Here are the details.

\ExtractNotExtractingPdfs : The pdfs have an unknown program id even though they are Pdfs.
\ExtractSkipped: All of the embedded objects are skipped

Let me know what you find.

Thanks,
Nathan

Thanks for providing the sample documents. We will research the issue and inform you of the results in a couple of days.

Best regards,

Roman,

It looks like the issues in \ExtractNotExtractingPdfs are handled now by using the code for Ole1 and Ole2. Aspose is still however not able to find the embedded object in \ExtractSkipped.

Thanks,
Nathan

Nathan,

I have checked the ExtractSkipped file with the following code:

Document doc = TestUtil.Open(@"nparis\ExtractSkipped\CV-file1039680\_326\_001.doc");
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int i = 0;
foreach (Shape shape in shapes)
{
    if (shape.ShapeType == ShapeType.OleObject)
    {
        shape.OleFormat.Save(TestUtil.BuildTestFileName(@"nparis\ExtractSkipped\oleObject" + i + ".doc"));
        i++;
    }
}

And it successfully extracted all 12 files - eleven Word documents and one Excel spreadsheet. All files are opened in MS Office with no problem. Please let me know what exactly is the problem that you are experiencing so that I could reproduce it on my side.

I have also noticed that the original file name of the embedded object is not exposed in our API. Do you think it would be convenient to extract and expose it so that you could easily check the name of the file? In the least it could serve you a purpose of defining the correct extension for saving the extracted OLE object.

Best regards,

I will test this later today and make sure there isnt something in the code we are missing. With regards to providing the name of the file, that would be very useful and we were even going to request it. So you are one ahead of us

Thanks,
Nathan

I tested and am still not seeing the embedded objects. After the following code the count for the NodeCollection is only 2 and neither is a of type ShapeType.OleObject

NodeCollection shapes = asposeDoc.GetChildNodes(NodeType.Shape, true);
foreach (Shape shape in shapes)
{
if ( shape.ShapeType == ShapeType.OleObject )

The version of Aspose.Words we are using is 4.2.6.0.

Thanks

Ok, then please double check the two things - whether you are opening the correct file and if the license is properly activated. Note that if the license is not activated properly, then the document could be truncated as a result of evaluation version limitation, effectively cutting the embedded objects.

Then, if nothing helps, please compose and attach a small test project exposing the problem - I will try to check it myself.

Best regards,

I checked and our license code was commented out because of a bug we use to have. Once the license code was uncommented we see all of the files.

Thanks

Ok, it is great that the problem is solved.

Also, your request for exposing original file name of OLE object in the Aspose.Words API is logged to our defect base as issue #3257. We will try to implement it in one of the next hotfixes. I will post a notification here in this thread as soon as it will be done.

Best regards,

Great, thank you.

The issues you have found earlier (filed as 3257) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(31)