Extracting MS Word file from OLE Object (C# .NET)

Hello,

I have a PowerPoint presentation which contains a Word document in the master slide.
I load the presentation, get through all slides and fetch the OLE object. If it is a word document I want to get it and process it with Aspose Word functionalities.

My approach is to have a header in a PowerPoint as Word to handle Field Updates for special fields. Reason is that there are no content fields in PowerPoint like in Word and set values of ActiveX fields corrupts the document (see Form Controls corrupting on saving PPTX (C# .NET)) I wanted to do the same in PowerPoint and later Visio (but step by step).

Here is my code that handle it.

var presentationMasterSlides = _presentation.Masters;
foreach (var presentationMasterSlide in presentationMasterSlides)
{
var shapes = presentationMasterSlide.Shapes;
foreach (var shape in shapes)
{
var ole = shape as OleObjectFrame;
if (ole == null || !ole.ObjectProgId.Contains(“Word.Document”)) continue;
ole.UpdateAutomatic = true;

var word = new Word();
using (var ms = new MemoryStream(ole.ObjectData))
{
// creates a file, but is corrupt, word open can fix it and i see the correct content
// for debug porpuses
File.WriteAllBytes(@“D:\directstream.docx”, ms.ToArray());
// i now load the file to my word class
word.LoadFromStream(ms);
// works fine in excel, crashes here
word.SetFormFieldValue(name, value);
ole.ObjectData = word.WordStream.ToArray();
}
}
}

I attached the powerpoint.cs and word.cs classes and sample powerpoint.
Hope you can tell me what I need to do to make it work.



Hi Yves,

I have observed the information shared and unfortunately I have not been able to understand what issue is incurring on your end. Can you please share the details of your requirement along with expected output and what output you are getting from Aspose.Slides. Also, please share a working sample project that I may use and test on my end to help you out.

Many Thanks,

Hello,
my problem is that ole.ObjectData contains data that can not be loaded by aspose word component. Therefor the component in my implenentation gets a null pointer exception.
If I hand over to a ole.ObjectData MemoryStream and save it then the file is corrupted.
So I do not know ole.ObjectData has a defect or something else is not working.





Hi Yves,

I have worked over the issue in light of information shared by you. I have been able to observe the UnSupported format exception when loading the extracted Ole object data in the form of stream using Aspose.Words. An issue with ID SLIDESNET-37527 has been created in our issue tracking system to investigate and resolve the issue w.r.t Aspose.Slides end. This thread has been linked with the issue so that you may be automatically notified once the issue will be fixed.

For reference, I have tried using the following sample code on my end to reproduce the issue.

public static void LoadPresWithOle()
{
String path = @"D:\Aspose Data\Archiv\";
Presentation _presentation = new Presentation(path + "powerpoint.pptx");
var presentationSlides = _presentation.Slides;
var presentationMasterSlides = _presentation.Masters;

foreach (var presentationMasterSlide in presentationMasterSlides)
{
var shapes = presentationMasterSlide.Shapes;
foreach (var shape in shapes)
{
if (shape is OleObjectFrame)
{
var ole = shape as OleObjectFrame;
if (ole == null || !ole.ObjectProgId.Contains("Word.Document")) continue;
// Console.WriteLine("progid", Company, "=======>" + ole.ObjectProgId + " => " + ole.ObjectData.Length);
ole.UpdateAutomatic = true;

var word = new Aspose.Words.Document();
using (var ms = new MemoryStream(ole.ObjectData))
{

ms.Position = 0;
word = new Aspose.Words.Document(ms);
word.Save(path+"Documnet.docx",Aspose.Words.SaveFormat.Docx);
}
}
}
}

}

We are sorry for your inconvenience,

Hi Yves,


The issue seems to be related to Aspose.Words as Ole data is getting properly saved when working using PowerPoint. If we look over the above code the issue seems related to Aspose.Words as it is causing issue when loading the extracting Ole data. I will also request Aspose.Words support team to verify the issue on Aspose.Words end as well.

Best Regards,

Hello,
do we have any news about this?

Hi Yves,

Our product team has investigated the issue on their end. Actually, the extracted data can be saved as a word file and we don't change anything in this file, returning the data "as-is". Although Word can't open this file, this file retrieved from the PowerPoint document it self (without Slides interaction) also cannot be opened.

Can you please share how you have created this PowerPoint document. Have you embedded this Word document by yourself? Can you please give us this document before it embedded?

Best Regards,

Hello,
I did use current version of Office 2016 to create PowerPoint and Word document.
I saw that the Word was in compatibility mode. Converted it, but no change.

I just made a total new PowerPoint file, added a Object type Word document and insert just one text. Same error.

I attach this file

Edit: Using PowerPoint 2016 MSO (16.0.6925.1018) 32bit

Hi Yves,


Thank you for sharing information. I have also made request you to please also share the presentation without embedded document inside that. Please provide the requested information so that I may share that with our product team.

Best Regards,

Didn’t see you want a blank powerpoint also, I just added it.

Hi Yves,

Thank you for sharing the requested information. I have shared the information in our issue tracking system for our product team consideration. I will share further feedback with you as soon as it will be shared by our product team.

Many Thanks,

Anything new about this issue?

Hi Yves,


I have observed your comments. Our product team is investigating this issue in details, We will share good news with you soon.

Best Regards,

Any news about this issue?
I can not use PowerPoint at the moment in my project as long I can embed and handle the Word object without errors.
Thanks for update.

Hi Yves,


I have observed your comments. I regret to share that issue is still unresolved. I have requested our product team to share further feedback with us regarding this issue. I request for your patience until issue gets resolved.

We are sorry for your inconvenience,

Still no news about this?
PowerPoint is still not usable in my project until this is solved.

Hi Yves,


I have observed your comments. I regret to share that issue is still unresolved. I have requested our product team to share further feedback with us regarding this issue. I request for your patience until issue gets resolved.

We are sorry for your inconvenience,

Hello,
I tried test the issue on latest Slides DLL component (17.5) and the problem still remains.

The bug is reported close to a year now, any news?
Wondering that no one else run into problems read or set word OLE objects in Slides component.

Hi Yves,

Our product team has investigated the issue on their end. Actually, the
data which is contained in OleObjectFrame.ObjectData property is not a
genuine Word document. Please note that embedded Office documents are
not stored in presentation as a document itself, but as an OLE
container.

We don't have a public API which will allow to extract this stream and use as ready-to-use Word document, but it can be easily achieved using third-party open source library, such as OpenMcdf.

First, you need to reference this library. It's available as NuGet package, and you can add it to your project using Package Manager Console:

Install-Package OpenMcdf

Then, the following slightly modified code can be use to work with embedded document using Aspose.Words:


using (Presentation pres = new Presentation("Presentation.pptx"))
{
ISlideCollection slides = pres.Slides;

foreach (ISlide presentationMasterSlide in slides)
{
IShapeCollection shapes = presentationMasterSlide.Shapes;
foreach (IShape shape in shapes)
{
if (shape is OleObjectFrame)
{
OleObjectFrame oleObjectFrame = (OleObjectFrame)shape;
if (!oleObjectFrame.ObjectProgId.Contains("Word.Document"))
{
continue;
}

oleObjectFrame.UpdateAutomatic = true;

using (MemoryStream memoryStream = new MemoryStream(oleObjectFrame.ObjectData))
{
memoryStream.Position = 0;
OpenMcdf.CompoundFile compoundFile = new CompoundFile(memoryStream);
OpenMcdf.CFStream stream = compoundFile.RootStorage.GetStream("Package");
byte[] packageData = stream.GetData();

using (MemoryStream packageDataStream = new MemoryStream(packageData))
{
Document word = new Document(packageDataStream);
word.Save("Document.docx", Aspose.Words.SaveFormat.Docx);
}
}
}
}
}
}

I hope the shared information will be helpful.

Many Thanks,

Thank you for your response.
I will check your code and try it.

But for clarification:
- I load a PowerPoint which include a Word Document as OLE
- I get the OLE stream and use Aspose.Words to load from stream (this fails!)
- I just saved the OLE to disk to check the data (won’t do this if all runs fine)

The above thing works within Excel!
Doing the same. loading OLE, load Stream to Word using Aspose.Words, work on Word and save it back to Excel.

So what is the difference as it works in Aspose.Cells but not in Aspose.Slides making it necessary to get another third party component?