Extracting excel tables from a word file

Hi,

We have taken an Aspose licence.We are trying to convert word
files to HTML. In these conversions, we have some files which contain
excel objects embedded within them. The issue is that the embedded excel
sheets are
converted to images. So I am not able to manipulate the content of
those excel sheets in the converted HTML file.
We are needed to replace the images with actual tables. Is that possible?
If not, the other approach we might take includes extracting and storing the excel tables as separate files and adding the name of the file in the converted HTML file. If you can suggest a way to work out any of the 2 approaches, it’ll be of great help.

Thanks and Regards,
Nikhil

Hi Nikhil,


Well, first of all, Aspose.Cells for .NET is an Excel spreadsheet management component, so it cannot load or work on Word documents in any case. Since you have some Excel Ole Object (your so called images in the converted HTML file via Aspose.Words for .NET), the question is originally related to Aspose.Words product. The main issue here is Aspose.Words coverts a word file to Html, it will simply process an ole object (of any type) as an image in the output HTML file. Similarly, Aspose.Cells will process an Ole Object as an image too in converting an Excel file to Html.

I think your other approach may give you better results, here is complete scenario and steps involved. The devised approach (given below) may not be very accurate but might help you a bit to acquire your needs.

You may try:
1) Open the Word file with Aspose.Words APIs.
2) Iterate the Ole Objects with Aspose.Words APIs.
3) If an Ole Object is Excel file (type), open the Ole Object Data (of byte arrays etc.) with Aspose.Cells APIs, then convert it as separate files, you may also use your own .NET code (using FileStream APIs) to save the Ole object data to Excel file on some location/path.
4) Add a hyperlink for that Ole object with Aspose.Words APIs in the original Word file.
5) Now convert the Word file to Html with Aspose.Words for .NET APIs.


Thank you.

Hi,

Thanks for your questions and using Aspose.

When Excel object is embedded inside the Word document, then Excel object is embedded as an Ole Object.

You can extract all types of Ole Objects from Word document using Aspose.Words.

Once, you will extract the Excel object as an Ole Object, you can then create a Workbook object of it using Aspose.Cells and manipulate it further.

In order to know how to extract Excel object as an Ole Object from your Word document, please post your query on Aspose.Words forum. Aspose.Words team will help you asap.

Hi Amjad & MShakeel,
Thanks for you quick reply. Will check out this approach and get back to you in case any help is required.

Regards,
Nikhil

Hi,

Thanks for your posting and considering Aspose.Cells.

Let us know if you face any issue relating to Aspose.Cells. We will be glad to help you asap.

Hello,Could you please provide me with a code snippet to extract information from an Excel shape and save it as an html file? I am trying the code below, but there seem to be some problem.

Aspose.Words.Document d = new Document(filename.ToString());

int i = 0;

//Save embedded excel as a separate file
NodeCollection shapes = d.GetChildNodes(NodeType.Shape, true);
foreach (Aspose.Words.Drawing.Shape shape in shapes)
{
if (shape.OleFormat.ProgId.Equals("Excel.Sheet.12"))
{
byte[] byteArr = Encoding.ASCII.GetBytes(shape.ToString());
MemoryStream stream = new MemoryStream(byteArr);

Aspose.Cells.Workbook newExcelDoc = new Aspose.Cells.Workbook(stream);
newExcelDoc.Save(destinationFileName + "_" + i);
++i;
}
}

HtmlSaveOptions htmlOptions = new HtmlSaveOptions(SaveFormat.Html);
d.Save(destinationFileName, htmlOptions);Aspose.Words.Document d = new Document(filename.ToString());

int i = 0;

//Save embedded excel as a separate file
NodeCollection shapes = d.GetChildNodes(NodeType.Shape, true);
foreach (Aspose.Words.Drawing.Shape shape in shapes)
{
if (shape.OleFormat.ProgId.Equals("Excel.Sheet.12"))
{
byte[] byteArr = Encoding.ASCII.GetBytes(shape.ToString());
MemoryStream stream = new MemoryStream(byteArr);

Aspose.Cells.Workbook newExcelDoc = new Aspose.Cells.Workbook(stream);
newExcelDoc.Save(Destinationfilename + "_" + i);
++i;
}
}

HtmlSaveOptions htmlOptions = new HtmlSaveOptions(SaveFormat.Html);
d.Save(Destinationfilename.ToString(), htmlOptions);

Hi,


Well, I am afraid, I am not familiar with Aspose.Words APIs, so you got to contact them for any issue you found to extract Excel’s OLE object shape. Please ask them on how to extract Excel Ole Object to save to an Excel file to some path. As I suggested, if you could get the Excel ole shape’s ObjectData (in the form of byte array), then you may use either Aspose.Cells.Workbook object to fetch the Excel file to save to Excel file or even HTML file. Or you may even use .NET’s FileStream APIs, e.g

//…
//This is just a hint (I am not familiar with the relevant Aspose.Words APIs), please check the relevant APIs with Aspose.Words team

byte[] objectData = oleExcel.ObjectData;
FileStream fileStream = File.Create(dir + "ExcelFile.xls");
fileStream.Write(objectData, 0, objectData.Length);



And, for saving to HTML file, it is very simple, see a sample code below:
Aspose.Cells.Workbook book = new Aspose.Cells.Workbook(stringExcelFilePath/stream);
HtmlSaveOptions options = new HtmlSaveOptions();
book.Save("e:\\test2\\output.htm", options);

For complete reference on opening and saving different file formats and conversions, see the documents:
http://www.aspose.com/docs/display/cellsnet/Opening+Files
http://www.aspose.com/docs/display/cellsnet/Saving+Files


Thank you.


I'm sorry but I have already asked their help and they had redirected me to the Cells team.

We have trusted Aspose and now we are at a critical stage. So please help me out instead of pushing the responsibility on other teams.

Thanks,

Nikhil

Hi,


I am not putting any responsibility to Aspose.Words team rather I am asking you to kindly do contact them on how to get Excel Ole Object to create the Excel file on path/disk. I am suggesting you to kindly get the Ole ObjectData using Aspose.Words APIs because you want to extract Ole Object in Word document and not in Excel file. If you have any issue or problem regarding extracting Ole Objects in Excel file, then I can help you and will have all the responsibility to get this done and make sure to fix any issue if it happens any all the way.

Also, I can still request you to kindly create a sample console application using Aspose.Words and Aspose.Cells APIs, zip it and post it here (we will also check it out) with all the template files to show the issue regarding Aspose.Cells component. We can look into it soon.

Thank you.

Hi,


There are three issues in your code:

a) Getting ole object from ole object with Aspose.Words.

b) Saving the file to html file with Aspose.Cells.

c) There is no relation between the image and exported html.

Please check the following codes with the attached file.

string filename = @"D:\FileTemp\oleobject.docx";

string destinationFileName = @"D:\FileTemp\dest.html";

Aspose.Words.Document d = new Document(filename.ToString());

int i = 0;

//Save embedded excel as a separate file

NodeCollection shapes = d.GetChildNodes(NodeType.Shape, true);

foreach (Aspose.Words.Drawing.Shape shape in shapes)

{

if (shape.OleFormat.ProgId.Equals("Excel.Sheet.12"))

{

//a) Getting ole object from ole object with Aspose.Words.

MemoryStream stream = new MemoryStream();

shape.OleFormat.Save(stream);

stream.Seek(0, SeekOrigin.Begin);

//b) Saving the file to html file with Aspose.Cells.

Aspose.Cells.Workbook newExcelDoc = new Aspose.Cells.Workbook(stream);

Aspose.Cells.HtmlSaveOptions hOptions = new Aspose.Cells.HtmlSaveOptions();

newExcelDoc.Save(destinationFileName + "_" + i +".html", hOptions);

//c) There is no relation between the image and exported html.

shape.HRef = destinationFileName + "_" + i + ".html";

++i;

}

}

Aspose.Words.Saving.HtmlSaveOptions htmlOptions = new Aspose.Words.Saving.HtmlSaveOptions();

d.Save(destinationFileName, htmlOptions);

Hi Amjad,

Here is the code snippet suggested by the Words team. however, they have left the excel part for me to figure out. Can you please help me with this?

Document doc = new Document(@"C:\test\in.docx");
// Get collection of shapes
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int i = 0;
//Loop through all shapes
foreach (Shape shape in shapes)
{
if (shape.OleFormat != null)
{
if (!shape.OleFormat.IsLink)
{
//Extract OLE Word object
if (shape.OleFormat.ProgId == "Word.Document.12")
{
MemoryStream stream = new MemoryStream();
shape.OleFormat.Save(stream);
Document newDoc = new Document(stream);
newDoc.Save(string.Format(@"C:\test\outEmbeded_{0}.html", i));
i++;
}
//Extract OLE Excel object
if (shape.OleFormat.ProgId == "Excel.Sheet.12")
{
// Here you can use Aspose.Cells component
// to be able to convert MS Excel files to separate HTML files
}
}
}
}
Document doc = new Document(@"C:\test\in.docx");
// Get collection of shapes
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int i = 0;
//Loop through all shapes
foreach (Shape shape in shapes)
{
if (shape.OleFormat != null)
{
if (!shape.OleFormat.IsLink)
{
//Extract OLE Word object
if (shape.OleFormat.ProgId == "Word.Document.12")
{
MemoryStream stream = new MemoryStream();
shape.OleFormat.Save(stream);
Document newDoc = new Document(stream);
newDoc.Save(string.Format(@"C:\test\outEmbeded_{0}.html", i));
i++;
}
//Extract OLE Excel object
if (shape.OleFormat.ProgId == "Excel.Sheet.12")
{
// Here you can use Aspose.Cells component
// to be able to convert MS Excel files to separate HTML files
}
}
}
}
doc.Save(@"C:\test\out.html");doc.Save(@"C:\test\out.html");

Hi Amjad,

Here is the code snippet suggested by the Words team. however, they have left the excel part for me to figure out. Can you please help me with this?

Document doc = new Document(@"C:\test\in.docx");

// Get collection of shapes

NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);

int i = 0;

//Loop through all shapes

foreach (Shape shape in shapes)

{

if (shape.OleFormat != null)

{

if (!shape.OleFormat.IsLink)

{

//Extract OLE Word object

if (shape.OleFormat.ProgId == "Word.Document.12")

{

MemoryStream stream = new MemoryStream();

shape.OleFormat.Save(stream);

Document newDoc = new Document(stream);

newDoc.Save(string.Format(@"C:\test\outEmbeded_{0}.html", i));

i++;

}

//Extract OLE Excel object

if (shape.OleFormat.ProgId == "Excel.Sheet.12")

{

// Here you can use Aspose.Cells component

// to be able to convert MS Excel files to separate HTML files

}

}

}

}
Document doc = new Document(@"C:\test\in.docx");

// Get collection of shapes

NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);

int i = 0;

//Loop through all shapes

foreach (Shape shape in shapes)

{

if (shape.OleFormat != null)

{

if (!shape.OleFormat.IsLink)

{

//Extract OLE Word object

if (shape.OleFormat.ProgId == "Word.Document.12")

{

MemoryStream stream = new MemoryStream();

shape.OleFormat.Save(stream);

Document newDoc = new Document(stream);

newDoc.Save(string.Format(@"C:\test\outEmbeded_{0}.html", i));

i++;

}

//Extract OLE Excel object

if (shape.OleFormat.ProgId == "Excel.Sheet.12")

{

// Here you can use Aspose.Cells component

// to be able to convert MS Excel files to separate HTML file

}

}

}

}
doc.Save(@"C:\test\out.html");doc.Save(@"C:\test\out.html");

There are 2 problems in this code.

  1. I do not know whether the ‘Shape’ class belongs to Microsoft.Office.Interop.Excel or Aspose.Words.Drawing.Shape.

  2. Once I get the Shape object, how can I extract the Excel sheet and save it to the file?

I’ll be thankful if you can resolve my doubts.

Thanks,

Nikhil

Hi Shakeel,

This is exactly what I needed, especially those lines you added in red. Thanks a tonne for your help.

Regards,

Nikhil

Hi Amjad,

Shakeel has resolved the problem. Will work on it and trouble you guys if I get stuck again. Thanks for the support.

Regards,

Nikhil

Hi,

Thanks for your feedback and using Aspose.

It’s good to know your issue is resolved now.

Let us know if you face any other issue, we will be glad to assist you further.

Hi Shakeel, As suggested, here is the code I am using to add a link to the image of the Excel sheet embedded in the word doc.

int i = 0;

foreach (Aspose.Words.Drawing.Shape shape in shapes)

{

if (null != shape.OleFormat && shape.OleFormat.ProgId.Equals("Excel.Sheet.12"))

{

MemoryStream stream = new MemoryStream();

shape.OleFormat.Save(stream);

stream.Seek(0, SeekOrigin.Begin);

Aspose.Cells.Workbook newExcelDoc = new Aspose.Cells.Workbook(stream);

destinationFilePath
= @“D:\Excel Embed Test\Output” + Destinationfilename +
“_” + i + “.html”;

newExcelDoc.Save(destinationFilePath);

shape.HRef
= destinationFilePath;

++i;

}

}

This works perfectly. The excel object is an image in the converted file, but since it’s href is set, clicking on it takes me to the excel file for modification.

However, the requirement is that I want to replace the image by the anchor tag. Is that possible? Could you please provide a code snippet for the same?

Thanks and regards,

Nikhil

Hi,

Thanks for your posting.

We think you should check this feature with Aspose.Words that how to process an ole object as an anchor tag.

Aspose.Cells can only convert the data of the ole object to html file if the embedded file is excel file. We cannot decide how to place the html file in the parent html (generated by Aspose.Words)

Hi Shakeel,

Thanks for the prompt reply. I will take up this issue with the Words team.

Thanks and regards,

Nikhil

Hi,

Thanks for using Aspose components

In case, you need any assistance relating to Aspose.Cells, please feel free to post on our forums.

We will be glad to assist you.

Have a good day.