In my application, I need to extract ole objects, sometimes the program ID returned for the ole object is “Package”, and if I save the ole data to a file, the file is not openable, it looks like there is some kind of “wrapper” around it. Is there a way to save just the embedded ole data?
Please see the attached example, this is a Word doc that has a msg file embedded. And I saved the extracted data in EmbeddeMsg_0.unknown. If I rename the EmbeddeMsg_0.unknown to EmbeddeMsg_0.msg, it is not openable in Outlook.
Thank you for reporting the issue to us. I have logged it as #3154; however, we did not plan to improve OLE objects extracting anytime soon. As far as I know Word indeed adds some wrapping bytes to the real data but it is not documented and requires significant efforts to research. Thank you for understanding.
We noticed that we can extract embedded word, excel,ppt and pdf without having the “wrapper”. Are these ole types done differently than the ole type “Package”? What would it take to “unwrap” ole objects? We are trying to figure out a solutions, we would really appreciate if it you can shed some light on this.
It looks like the issues in \ExtractNotExtractingPdfs are handled now by using the code for Ole1 and Ole2. Aspose is still however not able to find the embedded object in \ExtractSkipped.
I have checked the ExtractSkipped file with the following code:
Document doc = TestUtil.Open(@"nparis\ExtractSkipped\CV-file1039680\_326\_001.doc");
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int i = 0;
foreach (Shape shape in shapes)
{
if (shape.ShapeType == ShapeType.OleObject)
{
shape.OleFormat.Save(TestUtil.BuildTestFileName(@"nparis\ExtractSkipped\oleObject" + i + ".doc"));
i++;
}
}
And it successfully extracted all 12 files - eleven Word documents and one Excel spreadsheet. All files are opened in MS Office with no problem. Please let me know what exactly is the problem that you are experiencing so that I could reproduce it on my side.
I have also noticed that the original file name of the embedded object is not exposed in our API. Do you think it would be convenient to extract and expose it so that you could easily check the name of the file? In the least it could serve you a purpose of defining the correct extension for saving the extracted OLE object.
I will test this later today and make sure there isnt something in the code we are missing. With regards to providing the name of the file, that would be very useful and we were even going to request it. So you are one ahead of us
I tested and am still not seeing the embedded objects. After the following code the count for the NodeCollection is only 2 and neither is a of type ShapeType.OleObject
NodeCollection shapes = asposeDoc.GetChildNodes(NodeType.Shape, true);
foreach (Shape shape in shapes)
{
if ( shape.ShapeType == ShapeType.OleObject )
…
The version of Aspose.Words we are using is 4.2.6.0.
Ok, then please double check the two things - whether you are opening the correct file and if the license is properly activated. Note that if the license is not activated properly, then the document could be truncated as a result of evaluation version limitation, effectively cutting the embedded objects.
Then, if nothing helps, please compose and attach a small test project exposing the problem - I will try to check it myself.
Also, your request for exposing original file name of OLE object in the Aspose.Words API is logged to our defect base as issue #3257. We will try to implement it in one of the next hotfixes. I will post a notification here in this thread as soon as it will be done.