Free Support Forum - aspose.com

Extracting embedded content from MS office files


#1

Hi,
I am posting this question for MS office docx file but I have same query for excel and powerpoint files as well.
I am attaching sample docx file which has an embedded xlsx file. When i convert docx file to HTML using aspose.words, the content from embedded xlsx is not part of the output html.

embedded_doc.zip (53.5 KB)

Is this feature supported and how to use it?


#2

@investigation,

An embedded Excel file in Word document is represented as a Shape node in Aspose.Words’ DOM; you can loop through Shape nodes collection and get reference to the original object, you can then convert the Excel file to HTML format separately by using Aspose.Cells. You can build logic on the following code snippet:

Document doc = new Document("E:\\embedded_doc\\embedded_doc.docx");

NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int i = 0;
//Loop through all shapes
foreach (Shape shape in shapes)
{
    if (shape.OleFormat != null)
    {
        if (!shape.OleFormat.IsLink)
        {
            //Process OLE Word object
            if (shape.OleFormat.ProgId == "Word.Document.12")
            {
                MemoryStream stream = new MemoryStream();
                shape.OleFormat.Save(stream);

                Document newDoc = new Document(stream);
                //newDoc.Save(string.Format(@"E:\temp\outEmbeded_{ 0}.html", i));
                i++;
            }

            //Process OLE Excel object
            if (shape.OleFormat.ProgId == "Excel.Sheet.12" || shape.OleFormat.ProgId == "Excel.Sheet.8")
            {
                // Here you can use Aspose.Cells API to convert to HTML
            }
        }
    }
}

Please also check:
https://docs.aspose.com/display/cellsnet/Export+Worksheet+CSS+Separately+in+Output+HTML

Hope, this helps.