Extracting MS Word file from OLE Object (C# .NET)

@rausch,

If you have paid support subscription then you can log a ticket in Paid Support help desk separately.

@rausch,

We have investigated your following issue:

We like you to understand that compound file is just a storage for OLE object so before placing OLE object into compound storage it must be created. For more information you can please see this link. In your code you can use existing compound storage:


var _presentation = new Presentation(source);

var presentationSlides = _presentation.Slides;
foreach (var presentationSlide in presentationSlides)
{
var shapes = presentationSlide.Shapes;
foreach (var shape in shapes)
{
if (!(shape is OleObjectFrame)) continue;
var ole = shape as OleObjectFrame;
if (!ole.ObjectProgId.Contains(“Word.Document”)) continue;

    using (var ms = new MemoryStream(ole.ObjectData))
    {
        var compoundFile = new CompoundFile(ms);
        var stream = compoundFile.RootStorage.GetStream("Package");
        var packageData = stream.GetData();

        using (var packageDataStream = new MemoryStream(packageData))
        {
            var word = new Document(packageDataStream);
            word.Save(target);
            var wordStream = new MemoryStream();

            word = new Document(target);
            word.Save(wordStream, Aspose.Words.SaveFormat.Docx);
            wordStream.Position = 0;

            // Do not create new compound storage. Just change package stream in existing one.
            compoundFile.RootStorage.Delete("Package");
            compoundFile.RootStorage.AddStream("Package").SetData(wordStream.ToArray());

            // save compound file back to ObjectData. Not stream but compound storage.
            MemoryStream compMs = new MemoryStream();
            compoundFile.Save(compMs);
            ole.ObjectData = compMs.ToArray();
        }
    }
}

}

Please not that currently Ole control window opens in wrong size. This issue is likely to get fixed in Aspose.Slides for .NET19.11. Please also notice that new Ole object won’t show new image representation of it because of Aspose.Slides can’t create images from non-presentation objects. I hope the shared information will be helpful.

The issues you have found earlier (filed as SLIDESNET-39130,SLIDESNET-39874) have been fixed in this update.

Hello,
I did try new component. Ole in PowerPoint is not corrupted anymore. Thats a good step.
Nevertheless the provided solution still does not work for me.

It looks like, that this does to update correct:

            // Do not create new compound storage. Just change package stream in existing one.
            compoundFile.RootStorage.Delete("Package");
            compoundFile.RootStorage.AddStream("Package").SetData(wordStream.ToArray());

If I do

          compoundFile.Save("c:\\test.docx");

and check it after in Word, the compoundFile content is still the same as before modification.

NOTE: If I save the wordStream I have directly to file system and check I have a modified word document.

In addition do I still need to use OpenMcdf or can’t I just add the wordStream.ToArray() to the ole.ObjectData ?? This word perfect in Aspose.Cells…

Please try yourself get embedded word, modify anything inside this word after loading and save the modified word back to PowerPoint OLE object. I think you did not try this before, do you?

@rausch,

It’s good to know things are fine on your end. I like to share that you need to use the sample that I have shared previously with you for extraction.

I think I wrote misunderstanding. Of course I tried your code sample.
The Word gets extracted correctly, the PowerPoint is not corrupted anymore after saving like before latest release, but the modified Word content is not embedded.
Thats why I asked if you changed any content to compare PowerPoint before and after!?

@rausch

Can you please elaborate your query in more details. Also please share comparison screenshot with us so that we may further investigate to help you out.

I used your code, unfortunately to many debug code and testing messed up some tests. Finally I managed to update the PowerPoint with a valid Word Object! Already made the code to generate my own preview image, means finally I can implement the feature to load, modify and save embedded Words inside PowerPoint.
Took a while :slight_smile: but nevertheless thanks for your support.

Sidenote: Not part of Aspose but as you recommended OpenMCDF >> I just created a new template PowerPoint file and tested with this. Now I get

	$exception	{"Invalid OLE structured storage file"}	OpenMcdf.CFFileFormatException

Checked github.com repo about OpenMCDF, used latest NuGet version 2.2.1.3 but now newly created PowerPoint where I embeddd Word files get this error now.

It would be far more easy and robust if you just made it possible to use:

ole.ObjectData = word.WordStream.ToArray();

To set it directly. Note: i do this with aspose cells as often said and there it works pretty fine.

Here is my latest PowerPoint test that fails to work.

powerpoint.pptx.zip (78.6 KB)

@rausch ,

I have observed your last post and like to share that Aspose.Cells and Aspose.Slides are two different APIs internally and cannot be compared. We appreciate that you have been able to extract required information based on information provided. We have already shared the mechanism that you can adopt using Aspose.Slide for extraction.

Hello,
they are probably two different API but handling OLE objects in office is more or less the same in PowerPoint and Excel, or not?! That is why I compared the functionality as cell provide direct usage without OpenMCDF and I would be very happy if I didn’t face the problem that openMcdf now throws exceptions for newest office documents… looks like Microsoft changed something inside…

@rausch,

I humbly disagree here. These are two completely APIs internally and manged by separate teams. I agree with your point of view that apparently the are part of Office Suite. I am not sure about any change being done on MS Office. However, if you encounter any issue, please feel free to share with us.

There is nothing to disagree, I just asked about it :).
I understand that there are two teams, just wondering that cells teams seems to manage this without and third party component.
However before I spend hours of time can you check if you can handle my last posted PowerPoint file as it won’t load in my test system using your posted sample code.

@rausch,

I have observed your above comments. Can you please be kind enough to share the sample code or project that you have used to reproduce the issue using new presentation that you have shared in your former post. This way we will be on same page and I may log the issue based on code sample reproducing it.

Hello, I used exactly the same code you kindly provided. With my “old” PowerPoint I shared in first place it works. With a newly created PowerPoint (latest Office 365) version I get an error from OpenMcdf like mentioned before.

@rausch,

We have worked with sample presentation shared by you. Actually you can’t open this OLE embedded data via OpenMcdf, because this data is not a Microsoft Compound File, this is a plain DOCX. This can be checked using ole.EmbeddedFileExtension property. OleObjectFrame ObjectData can contain different types of data (compound streams, zip-files, Word or Excel documents, etc.). In this case, you can use it directly from OleObjectFrame.ObjectData - if EmbeddedFileExtension is “.docx” then there’s no need to proceed any further actions using CompoundFile.

Please try using following sample project.

TestOLEExtract-modified.zip (100.0 KB)

Hello, thank you, I will check your code. I did not even know that there is a difference.
In both files I did the same > embedded a word file using PowerPoint to create the template. I do not understand when it gets a compound file and when it will be plain word.

@rausch,

I have observed your comments and like to share that I am not sure how you have added the file in PowerPoint. However, as I said earlier in my earlier post regarding types of OLE Object data. You can use the sample project shared on our end to extract the data by verifying from ole.EmbeddedFileExtension property.

Hello,
I am pretty sure in both files, with direct word and compound file, I used to do it the same way. Nevertheless your hint about file extension helped to solve the issue so now I can open, modify and save word back to powerpoint regardless the format.
Thanks for your help and patience!
Bye Yves

@rausch,

Thank you for your understanding too and its good things are finally resolved.