Extracting MS Word file from OLE Object (C# .NET)

Any news about this issue?
I can not use PowerPoint at the moment in my project as long I can embed and handle the Word object without errors.
Thanks for update.

Hi Yves,


I have observed your comments. I regret to share that issue is still unresolved. I have requested our product team to share further feedback with us regarding this issue. I request for your patience until issue gets resolved.

We are sorry for your inconvenience,

Still no news about this?
PowerPoint is still not usable in my project until this is solved.

Hi Yves,


I have observed your comments. I regret to share that issue is still unresolved. I have requested our product team to share further feedback with us regarding this issue. I request for your patience until issue gets resolved.

We are sorry for your inconvenience,

Hello,
I tried test the issue on latest Slides DLL component (17.5) and the problem still remains.

The bug is reported close to a year now, any news?
Wondering that no one else run into problems read or set word OLE objects in Slides component.

Hi Yves,

Our product team has investigated the issue on their end. Actually, the
data which is contained in OleObjectFrame.ObjectData property is not a
genuine Word document. Please note that embedded Office documents are
not stored in presentation as a document itself, but as an OLE
container.

We don't have a public API which will allow to extract this stream and use as ready-to-use Word document, but it can be easily achieved using third-party open source library, such as OpenMcdf.

First, you need to reference this library. It's available as NuGet package, and you can add it to your project using Package Manager Console:

Install-Package OpenMcdf

Then, the following slightly modified code can be use to work with embedded document using Aspose.Words:


using (Presentation pres = new Presentation("Presentation.pptx"))
{
ISlideCollection slides = pres.Slides;

foreach (ISlide presentationMasterSlide in slides)
{
IShapeCollection shapes = presentationMasterSlide.Shapes;
foreach (IShape shape in shapes)
{
if (shape is OleObjectFrame)
{
OleObjectFrame oleObjectFrame = (OleObjectFrame)shape;
if (!oleObjectFrame.ObjectProgId.Contains("Word.Document"))
{
continue;
}

oleObjectFrame.UpdateAutomatic = true;

using (MemoryStream memoryStream = new MemoryStream(oleObjectFrame.ObjectData))
{
memoryStream.Position = 0;
OpenMcdf.CompoundFile compoundFile = new CompoundFile(memoryStream);
OpenMcdf.CFStream stream = compoundFile.RootStorage.GetStream("Package");
byte[] packageData = stream.GetData();

using (MemoryStream packageDataStream = new MemoryStream(packageData))
{
Document word = new Document(packageDataStream);
word.Save("Document.docx", Aspose.Words.SaveFormat.Docx);
}
}
}
}
}
}

I hope the shared information will be helpful.

Many Thanks,

Thank you for your response.
I will check your code and try it.

But for clarification:
- I load a PowerPoint which include a Word Document as OLE
- I get the OLE stream and use Aspose.Words to load from stream (this fails!)
- I just saved the OLE to disk to check the data (won’t do this if all runs fine)

The above thing works within Excel!
Doing the same. loading OLE, load Stream to Word using Aspose.Words, work on Word and save it back to Excel.

So what is the difference as it works in Aspose.Cells but not in Aspose.Slides making it necessary to get another third party component?


Hi Yves,

I like to share that this case is applicable for both MS Excel and MS Word OLE objects. You may use the same approach for MS Excel OLE object as well.

Many Thanks,

Hello,
maybe I did not make myself clear as this is not my native language.

In Excel I do NOT need to use any 3rd party because it is working there!

See example:

public void UpdateEmbeddedWords()
{
if (_fieldsToUpdate.Count == 0) return;
if (string.IsNullOrEmpty(_path)) return;
var workbookSheets = _workbook.Worksheets;
try
{
foreach (var sheet in workbookSheets)
{
_embeddedWords = sheet.OleObjects;
foreach (var ole in _embeddedWords)
{
// Specify each file format based on the oleobject format type.
if (ole.FileFormatType != FileFormatType.Doc && ole.FileFormatType != FileFormatType.Docx) continue;

var ms = new MemoryStream();
ms.Write(ole.ObjectData, 0, ole.ObjectData.Length);
var word = new Word();
word.LoadFromStream(ms);

foreach (var pair in _fieldsToUpdate)
{
word.SetFormFieldValue(pair.Key, pair.Value);
}

// we have to create our own preview image
using (var renderedImage = new Bitmap(word.WordImageStream))
{
var bitmap = new Bitmap(ImageTrim(renderedImage));
using (var bitmapStream = new MemoryStream())
{
bitmap.Save(bitmapStream, ImageFormat.Png);

// Set OleObject’s frame size to the image size
// ole.Height = bitmap.Height;
// ole.Width = bitmap.Width;
// Set OleObject’s image date to the image stream
ole.ImageData = bitmapStream.ToArray();
}
}

ole.FileFormatType = FileFormatType.Docx;
ole.ObjectData = word.WordStream.ToArray();
// update the preview content of the OLE object
// ole.AutoLoad = true;
}
}
}
catch (Exception ex)
{
_logger.ToLog(“Error in UpdateEmbeddedWords:\n” + ex, Company, "File: " + _path, Logfile, “component”);
}
}

EDIT
The code above shows a method that gets all embedded objects in an excel document, loads the OLE stream and use the Word-Class to handle it there.
.Load() in word do not fail, so the OLE object is fine.

I use the same approach in PowerPoint, but there it fails.


Hi Yves Rausch,

Thank you for sharing the information. I got your point that in case of Aspose.Cells, you don’t require a third party tool to extract MS Word OLE object. Where as in case of Aspose.Slides, one require a third party tool to extract OLE data. I have shared the information with our product team in associated ticket and will get back to you with a feedback as soon as it will be shared by our product team.

Many Thanks,

Any news about .Slides fix for this?
I wait since 2 years now for a chance to handle FormFields OR OLE objects OR ActiveX fields in Slides (several posts about the 3 possibilities).

@rausch,

Our product team has investigate the issue of extraction of embedded Word or Excel from PowerPoint file. Actually, this is not a limitation on Aspose.Slides end but implementation behavior in PowerPoint. Actually, when you embed Word or Excel file as OLE object in PowerPoint directly, it adds the file as OLE Object. You can try adding a Word file in PowerPoint presentation and then saving the presentation. You can then extract the saved presentation using WinRar or other archiving software. There will, “\Presentation with embedded Word.pptx\ppt\embeddings\oleObject1.bin” in extracted presentation. If you rename it to “oleObject1.docx” (for example) and try to open it via Word, we will get an error. Because this is not a correct Word document. And Aspose.Words will not be able to open this embedded object, too. This is not an issue with Aspose.Slides but a limitation in PowerPoint it self. We have internally added an issue with ID SLIDESNET-39130 to investigate any work around to extract the embedded Word OLE and will share feedback with you as soon as it will be further shared by our product team.

Thank you for the feedback.
As Microsoft changed its behavior to “non-goddess-programmers” like us, any chance to address this on their site?

@rausch,

We may help you only with issues related to Aspose.Slide and any thing that is limited by PowerPoint also gets limited by Aspose.Slides as well.

So what you can offer as solution for ?

  • I cannot work with FormFields ActiveX as it corrupts the file
  • I cannot work with embedded Words as the words are corrupt
  • I did not manage to set a text-area or equal to make it usable as FormField to read/write text value

So for my purpose PowerPoint I can’t use at the moment at all.
My renewal for Total.Net is on the decision desktop. Any suggestions?

@rausch,

I regret to share that the at present the support for extracting OLE data using public APIs like Aspose.Words or Aspose.Cells is unavailable and an issue with ID SLIDESNET-39130 has already been created in our issue tracking system and shared with you. I already have shared the only possible approach at the moment over following link. I request for your patience till the time our product team provide the requested support.

You wrote that you posted a link with a possible solution/workaround, can’t find it.
You also mentioned that there are third-party libraries, but your post ends with …

please share those information so I can take a look.

Sidenote: Why I can do this all fine with Excel Aspose.Cells and not with PowerPoint using Apose.Slides?

@rausch,

I suggest you to please click my name “mudassir” in my following post link and it will expand the post with workaround information that I have already shared earlier with you.

Secondly, one cannot compare Aspose.Slides with Aspose.Cells as two API are completely different APIs and may not be compared.

Thanks for the hint, didn’t know I can expand the quote anser like this. I can take a look at this tomorrow then. Thanks.

About this, even if this is different API, that won’t change the way Microsoft handles OLE objects. So PowerPoint has different OLE implementation then Excel then it seems.

@rausch,

I have observed your comments and like to mention that issue SLIDESNET-39130 has been created for this perceptive to implement some internal mechanism for accessing the OLE frame data. For now, the suggested option is the workaround sample code that I have shared with you.