Load Excel with embedded Word and modify Word

Hello,
I need to load a Excel document which contains Word embedded OLE documents.
In my case I need to see if a Excel has such OLE objects, if so load them and modify the OLE content. After modifications I need to update the OLE objects and save the Excel document.

Example:
Excel document contains Header as embedded Word.
I want to load the Excel file, then load the Word content, modify a content or form field.
Then I save the Excel document which contains the updated/embedded Word content.

How this can be achieved?

Thanks for help. Kind regards, Yves

Hi Yves,


Thank you for contacting Aspose support.

Please review the article on managing OleObjects from Aspose.Cells’ perspective, however, for your requirement, you can access the OleObject from the Worksheet, retrieve its object data to create an instance of Aspose.Words.Document, manipulate the document and save the result back as object data of the existing OleObject. Here is the code snippet that demonstrates how to access object data and re-set it with updated copy. Please note, for testing purposes, I have just replaced some text from the document.

C#

MemoryStream outStream = new MemoryStream();
var book = new Aspose.Cells.Workbook(dir + “Embedded-Word-Document.xlsx”);
var ole = book.Worksheets[0].OleObjects[0];
if (ole != null && (ole.FileFormatType == FileFormatType.Docx || ole.FileFormatType == FileFormatType.Doc))
{
var doc = new Aspose.Words.Document(new MemoryStream(ole.ObjectData));
doc.Range.Replace(“testing”, “Aspose”, false, true);
doc.Save(outStream, Aspose.Words.SaveFormat.Docx);
ole.ObjectData = outStream.ToArray();
}
book.Save(dir + “modified.xlsx”);


Please note, for any concerns regarding Aspose.Words for .NET API, you have create a thread in respective forum, that is; Aspose.Words support forum.

Hello,
thank your for your feedback. I implemented it as suggested but run into 2 problems.

1. Some of the ole objects are doc, some docx.
Is there a way to transform embedded ole object from doc to docx?

2. Return the modified word to the ole in excel do not update this in my scenario. I can not find out why.

Part of my Excel class that set form field values:

public void SetFormFieldValue(string name, string value)
{
if (string.IsNullOrEmpty(_path)) return;
var workbookSheets = _workbook.Worksheets;
try {
foreach (var sheet in workbookSheets)
{
var boxes = sheet.TextBoxes;
foreach (var box in boxes)
{
if ( box.Name == name ) box.Text = value;
}

_embeddedWords = sheet.OleObjects;
foreach (var ole in _embeddedWords)
{
// Specify each file format based on the oleobject format type.
if (ole.FileFormatType != FileFormatType.Doc && ole.FileFormatType != FileFormatType.Docx) continue;

var word = new Word();
var ms = new MemoryStream();
ms.Write(ole.ObjectData, 0, ole.ObjectData.Length);
word.LoadFromStream(ms);
word.SetFormFieldValue(name, value);
word.SaveAs(@“D:\debug.docx”, “docx”);
ole.ObjectData = word.WordStream.ToArray();
}
}
}
catch (Exception ex)
{
_logger.ToLog(“Error in SetFormFieldValue:\n” + ex, Company, "File: " + _path, Logfile);
}
}

Part of the Word class that handles the Setformfield value:

public MemoryStream WordStream
{
get
{
var ms = new MemoryStream();
_document.Save(ms, SaveFormat.Doc);
return ms;
}
}

As you can see I save a debug.docx file to check if the content gets updated. The doc file contains my changes. So loading the ole object into my word class and modify it works as expected.

But returning the word document back to the OLE in Excel do not work as I want. Any ideas?

I added my two classes for your reference.

I just found out, my Excel document got updated correct!
BUT, the excel display old content. The display is a picture preview??? If so, how to update this, too?

The Excel must update its preview information of the embedded word OLE object somehow?!

EDIT Added a screenshot. Getting incompatible messages on a checkup with Excel (using Excel 2016). Maybe there is a issue?

EDIT2 Found some reference in the net that describe the same problem: embed - Update embedded excel file programmatically - Stack Overflow . So it seems that the WMF file is not updated?! Shouldn’ the aspose component do this?

Hi Yves,

rausch:


1. Some of the ole objects are doc, some docx.
Is there a way to transform embedded ole object from doc to docx?

This needs to be inquired in Aspose.Words support forum for detailed answer. However, as per my understanding of Aspose.Words APIs, you can transform the format of a document by specifying the appropriate value from the SaveFormat enumeration in the Document.Save method.

rausch:


2. Return the modified word to the ole in excel do not update this in my scenario. I can not find out why.
As you can see I save a debug.docx file to check if the content gets updated. The doc file contains my changes. So loading the ole object into my word class and modify it works as expected.

But returning the word document back to the OLE in Excel do not work as I want. Any ideas?

I believe the above problem is that preview of the embedded document is not being updated in the spreadsheet file so you are thinking that the embedded object (document) has not been updated. Let me explain it in my next post.
Hi again,

rausch:
I just found out, my Excel document got updated correct!
BUT, the excel display old content. The display is a picture preview??? If so, how to update this, too?

The Excel must update its preview information of the embedded word OLE object somehow?!

**EDIT2** Found some reference in the net that describe the same problem: http://stackoverflow.com/questions/5391438/update-embedded-excel-file-programmatically . So it seems that the WMF file is not updated?! Shouldn' the aspose component do this?

Regarding your main concerns as quoted above, unfortunately, the current implementation of Aspose.Cells APIs do not provide any means to auto refresh the preview of the embedded object. Excel behaves in a similar way, that is; the preview of updated OleObject does not refresh unless you double click on it. That said, we can provide an option with future releases of the Aspose.Cells APIs which could allow you to output the spreadsheet in Protected View. As soon as the user clicks on Enable Editing, the preview of the OleObject will be refreshed automatically. Please check the attached spreadsheets for your reference. The spreadsheet Modified.xlsx was generated from Embedded-Word-Document.xlsx by updating the OleObject (document) using the exact code shared earlier in my response. You will notice, that the resultant file is in Protected View. Try clicking the Enable Editing button to see the effect on the preview of the OleObject. If this is an acceptable solution to you then please let us know so we can log an appropriate ticket for proper feasibility analysis.

rausch:


**EDIT** Added a screenshot. Getting incompatible messages on a checkup with Excel (using Excel 2016). Maybe there is a issue?

I am not sure why you are getting the error as shown in the attached snapshot. In order to investigate the matter, please create a spreadsheet manually in Excel 2016 and embed the same file (which caused this error) and share it here along with steps to replicate the error.
babar.raza:
Hi Yves,

This needs to be inquired in Aspose.Words support forum for detailed answer. However, as per my understanding of Aspose.Words APIs, you can transform the format of a document by specifying the appropriate value from the SaveFormat enumeration in the Document.Save method.


I found already. What I ment was to change the FormatType of the ole object. I saw the ole object has a setter so I could change the type and then add the content as docx.
babar.raza:
Regarding your main concerns as quoted above, unfortunately, the current implementation of Aspose.Cells APIs do not provide any means to auto refresh the preview of the embedded object. Excel behaves in a similar way, that is; the preview of updated OleObject does not refresh unless you double click on it. That said, we can provide an option with future releases of the Aspose.Cells APIs which could allow you to output the spreadsheet in Protected View. As soon as the user clicks on Enable Editing, the preview of the OleObject will be refreshed automatically. Please check the attached spreadsheets for your reference. The spreadsheet Modified.xlsx was generated from Embedded-Word-Document.xlsx by updating the OleObject (document) using the exact code shared earlier in my response. You will notice, that the resultant file is in Protected View. Try clicking the Enable Editing button to see the effect on the preview of the OleObject. If this is an acceptable solution to you then please let us know so we can log an appropriate ticket for proper feasibility analysis.


OK, I understand. Problem is that won't solve my requirements.
I do load a Excel document, update embedded word content and finally save the Excel AND create a PDF document. So I need a solution that the preview gets updated, too. Any possibilities?


Hi again,


The solution I have shared earlier will be OK if you wish to save the result in spreadsheet format and pass on the spreadsheet to the customer. However, in case of saving the result in PDF format, the preview of the OleObject will not be updated accordingly. Let me discuss this matter with the product team to find any other solution for the later case. In the meanwhile, please confirm if I should log a feature request to handle the scenario for resultant spreadsheet formats as discussed above.
babar.raza:
Hi again,

The solution I have shared earlier will be OK if you wish to save the result in spreadsheet format and pass on the spreadsheet to the customer. However, in case of saving the result in PDF format, the preview of the OleObject will not be updated accordingly. Let me discuss this matter with the product team to find any other solution for the later case. In the meanwhile, please confirm if I should log a feature request to handle the scenario for resultant spreadsheet formats as discussed above.


Yes, please try find a solution, maybe it is possible to create a preview image using word and then pass the image, too?
And yes, please make a feature request.

Hi Yves,


Sure, I will investigate the matter to propose a solution for PDF rendering. For now, I have logged a feature request under the ticket CELLSNET-44482 to provide an option to save the spreadsheet in Protected View so that when user clicks the Enable Editing, the preview of OleObject refreshes automatically.
Hi again,

rausch:

Yes, please try find a solution, maybe it is possible to create a preview image using word and then pass the image, too?

Based on your suggestion as quoted above, I have found a way to create the snapshot of Word document (first page) and use it as OleObject's image data. Please check the following piece of code and its resultant PDF. Please note, the code is setting the OleObject's size to the image size obtained from the Aspose.Words for .NET API so you can adjust it according to your requirements.

C#

MemoryStream outStream = new MemoryStream();
var book = new Aspose.Cells.Workbook(dir + "Embedded-Word-Document.xlsx");
var ole = book.Worksheets[0].OleObjects[0];
if (ole != null && (ole.FileFormatType == FileFormatType.Docx || ole.FileFormatType == FileFormatType.Doc))
{
var doc = new Aspose.Words.Document(new MemoryStream(ole.ObjectData));
doc.Range.Replace("testing", "Aspose", false, true);

// Render first page of Word document to an image stream
var options = new Aspose.Words.Saving.ImageSaveOptions(Aspose.Words.SaveFormat.Jpeg);
options.PageIndex = 0;
options.PageCount = 1;
MemoryStream imgStream = new MemoryStream();
doc.Save(imgStream, options);
// Create a Bitmap from stream to get image size Bitmap bitmap = new Bitmap(imgStream);
// Set OleObject's frame size to the image size
ole.Height = bitmap.Height; ole.Width = bitmap.Width;
// Set OleObject's image date to the image stream ole.ImageData = imgStream.ToArray();
doc.Save(outStream, Aspose.Words.SaveFormat.Docx); ole.ObjectData = outStream.ToArray(); } book.Save(dir + "modified.pdf", new Aspose.Cells.PdfSaveOptions() { OnePagePerSheet = true });

Hi,

Thanks for using Aspose.Cells.

This is to inform you that we have fixed your issue CELLSNET-44482 now. We will soon provide the fix after performing QA and including other enhancements and fixes.

Hello,
code works like you suggested.
Problem I run into now: the preview image is a full DINA4 page, not only the “content” which is much smaller. The word contains a “document content header with logo and some parameter infos”.
Any chance to create a preview only from the pure content?

Hi Yves,


Thank you for the confirmation on provided solution.

In order to address your recently shared concerns, I have to get help from the experts of Aspose.Words APIs so I am going to move this thread to Aspose.Total support forum. Moreover, I have intimated the concerned team to check the scenario and provide a solution for it.

Hi,

Thanks for your using Aspose.Cells.

Please download and try the latest fix: Aspose.Cells for .NET v8.8.2.1 and let us know your feedback.

Please try OleObject.AutoLoad property with this fix for your needs.

Hello,
this DLL is compatible with older versions? When replacing this in my project I get many errors. For example for workbook sheets there is no .Textbox anymore.
The DLL is like 1MB smaller then my previous one.
I am using .NET 4 full client version.

Hi,

Thanks for your posting and using Aspose.Cells.

You actually need a .NET ClientProfile DLL or .NET 4.0 DLL. To get it, you will have to wait for official release v8.8.3 which is expected somewhere in late June and mid July. However, we have logged your comment in our database against this issue for product team consideration. Product team may provide you your required DLL earlier if possible. Once, there is some news for you, we will let you know asap.

Hi Yves,

Thank you for your inquiry.

With Aspose.Words, you can save Word document’s page to image. The Word document’s pages have specific page height and width. Aspose.Words renders the complete page to image. If you want to remove the empty area of image, please use following .NET code. Hope this helps you.

var doc = new Aspose.Words.Document(MyDir + "Test020.docx");
doc.Range.Replace("testing", "Aspose", false, true);

PageInfo pageInfo = doc.GetPageInfo(0);
const float MyScale = 1.0f;

// Let's say we want the image at this resolution.
const float MyResolution = 200.0f;

Size pageSize = pageInfo.GetSizeInPixels(MyScale, MyResolution);
MemoryStream stream = new MemoryStream();
using (Bitmap img = new Bitmap(pageSize.Width, pageSize.Height))
{
    img.SetResolution(MyResolution, MyResolution);

    using (Graphics gr = Graphics.FromImage(img))
    {
        // You can apply various settings to the Graphics object.
        gr.TextRenderingHint = TextRenderingHint.AntiAliasGridFit;

        // Fill the page background.
        gr.FillRectangle(Brushes.Transparent, 0, 0, pageSize.Width, pageSize.Height);

        // Render the page using the zoom.
        doc.RenderToScale(0, gr, 0, 0, MyScale);
    }
    img.Save(stream, ImageFormat.Png);
}

Bitmap croppedImage;

// Load the image into a new bitmap.
using (Bitmap renderedImage = new Bitmap(stream))
{
    // Extract the actual content of the image by cropping transparent space around
    // the rendered shape.
    Rectangle cropRectangle = FindBoundingBoxAroundNode(renderedImage);

    //croppedImage = new Bitmap(cropRectangle.Width, cropRectangle.Height);
    croppedImage = new Bitmap(cropRectangle.Width, cropRectangle.Height);
    croppedImage.SetResolution(200, 200);

    // Create the final image with the proper background color.
    using (Graphics g = Graphics.FromImage(croppedImage))
    {
        g.Clear(Color.White);
        g.DrawImage(renderedImage, new Rectangle(0, 0, croppedImage.Width, croppedImage.Height), cropRectangle.X, cropRectangle.Y, cropRectangle.Width, cropRectangle.Height, GraphicsUnit.Pixel);
    }
}

croppedImage.Save(MyDir + "Out.bmp", ImageFormat.Bmp);

///
/// Finds the minimum bounding box around non-transparent pixels in a Bitmap.
///

public static Rectangle FindBoundingBoxAroundNode(Bitmap originalBitmap)
{
    Point min = new Point(int.MaxValue, int.MaxValue);
    Point max = new Point(int.MinValue, int.MinValue);

    for (int x = 0; x < originalBitmap.Width; ++x)
    {
        for (int y = 0; y < originalBitmap.Height; ++y)
        {
            // Note that you can speed up this part of the algorithm by using LockBits and unsafe code instead of GetPixel.
            Color pixelColor = originalBitmap.GetPixel(x, y);

            // For each pixel that is not transparent calculate the bounding box around it.
            if (pixelColor.ToArgb() != Color.Empty.ToArgb())
            {
                min.X = Math.Min(x, min.X);
                min.Y = Math.Min(y, min.Y);
                max.X = Math.Max(x, max.X);
                max.Y = Math.Max(y, max.Y);
            }
        }
    }

    // Add one pixel to the width and height to avoid clipping.
    return new Rectangle(min.X, min.Y, (max.X - min.X) + 1, (max.Y - min.Y) + 1);
}

Hi,


Please try our latest version/fix: Aspose.Cells for .NET v8.8.2.3 (attached) for .NET Framework 4.0 for your requirements.

Let us know if you still have any issue regarding Aspose.Cells APIs.

Thank you.