Convert excel file to html and upload to azure blob as a stream

Hi,

We are considering aspose.cells for one of our below requirement.

We have a excel file and we want to display in browser in read only mode. So we noticed Aspose.cells can convert excel into html and then we can show converted html file in browser. Below is the process that we can follow/coding steps :

1) read/download excel file from azure blob storage,
2) using Aspose.cells convert that excel file (from step 1) into html and store that html file on somewhere (folder in website).
3) Then convert that generated html file (from step 2) into stream and upload same on azure blob storage.

Limitation in above process is, one of the folder in our website should have write permission to saved converted (refer step 2) html file.

Question to Aspose.Cells experts:
Is there a possibility we can simply read excel file and convert excel file into html in somewhere memory (may be in memory stream) instead of disk or folder and then upload that stream to azure blob? By converting excel into html in memory will help us to avoid to give write permission to any folder in our website.

Ultimately target that we want to achieve:
Read excel file and convert to html and upload converted html file to azure blob.
Please note in our project azure blob can upload file as a stream/FileStream.

Please suggest any preferred approach to achieve same.
Thanks




Hi,


Thanks for providing us some details.

Aspose.Cells does support to read/save Excel/HTML spreadsheet from/to disk as well as streams (e.g MemoryStream), see the sample code below:
e.g
Sample code:

var wb = new Workbook(“e:\test2\Book1.xlsx”);
var saveOptions = new HtmlSaveOptions(SaveFormat.Html);
var ms = new System.IO.MemoryStream();
wb.Save(ms, saveOptions);

Hope, this helps a bit.

Thank you.

Thanks for your reply, but it’s not working.


i tried below code:

// Creating-Reading excel file
var wb = new Workbook(“e:\test2\Book1.xlsx”);
var saveOptions = new HtmlSaveOptions(SaveFormat.Html);
var myStream = new System.IO.MemoryStream();
wb.Save(myStream, saveOptions);


//****** Reading stream into html file - NOT WORKING

string htmlFile = “C:\Softwares\temp\asposeHtmlStream.html”;
FileStream wrfileStream = new FileStream(htmlFile, FileMode.Create, FileAccess.Write);
myStream.WriteTo(wrfileStream);
wrfileStream.Close();

When I am executing above code, it is generating below file and that is not opening in browser

Link to File: Error

Please suggest when we save to stream then how to read that stream to show in browser as html?

Thanks

Hi Vikramjeet,


Thank you for your feedback.

I have tested the code snippet against a sample of my own and I am not able to observe any problem with the resultant HTML. Please check the attached archive for the input & output files. In order to further investigate the matter, please share your input spreadsheet so we could perform tests on our side to assist you further in this regard.

By the way, I have used the latest revision of Aspose.Cells for .NET 8.9.0.3 for my testing.

Hi Thank you for your prompt reply.


I am also using Aspose cells 8.9 version. Please find attached POC project, please download it and run it to see where we are missing anything that is leading us to generate incorrect html file. you can also find incorrect html file inside project folder “\ConvertedFiles\BOOK1Output.html”. After downloading you can run my code and then check under “ConvertedFiles” folder inside project structure if it is generating correct html file.
Main code is under : AsposeWebApiController.cs under controller folder
excel Files under: App_data\excelFiles
Output html should be generated under “ConvertedFiles” folder

Thanks
Link to download POC code:
https://synechron-my.sharepoint.com/personal/vikramjeet_singh_synechron_com/_layouts/15/guestaccess.aspx?guestaccesstoken=Xba7K12E0vVEXtiYRz59NaPjEJ56Y%2fSlRwKNyBB%2b2Ik%3d&docid=02b07107a53dd40d399264f2cffdccc0c&expiration=2016-07-27T13%3a11%3a40.000Z

Hi Vikramjeet,


Thank you for sharing the sample project for testing. Upon executing the said project, I am able to replicate the said issue on my side, that is; resultant HTML is not correctly saved to disc. The reason behind the said behaviour is that you are testing the functionality in evaluation mode, that is; without setting a valid license. Please note, when in evaluation mode, Aspose.Cells APIs inject worksheets with evaluation watermark to the resultant spreadsheet formats. In case of saving the result in HTML format, the API injects an extra page with evaluation warning on it. If you review your sample spreadsheet book1.xlsx you will notice that it contains only one worksheet in it. The expected HTML should consists of a single page without any other resources when saved in licensed mode whereas in case of evaluation mode, the resultant HTML will consists of 3 HTML pages (1 main & 2 sub-pages) whereas the sub-pages needs to be saved separately in a folder. As your code is not telling the API in which folder you wish to save the extra resources therefore sub-pages are not being saved and consequently the resultant HTML cannot be viewed in a browser. You can avoid the said problem by setting the HtmlSaveOptions.AttachedFilesDirectory property. Please specify a location which is in the same folder as of the .HTML file. Please check the code snippet at the bottom of this post for better understanding.

You can also avoid the said issue for spreadsheets containing only one worksheet by setting the license in your application. Please note, you have to first request for a 30 day temporary license by following the instructions shared here. Once you have received the temporary license via an email, please place it in any directory and pass the path of .LIC file to Aspose.Cells.License.SetLicense method before initializing the Workbook class. In case of spreadsheet containing more than one worksheet, it is advised to set the HtmlSaveOptions.AttachedFilesDirectory property while saving the resultant HTML in an instance of Stream.

C#

string outputFile = dataDir + “ConvertedFiles\”;
string returnFileName = string.Empty;
Workbook wb;
if (fileName.ToUpper().Contains(“.XLS”))
{
fileName = “Book1.xlsx”; // Always read this file for testing
wb = new Workbook(dataDir + “\App_Data\ExcelFiles\” + fileName);
// Converting excel to stream…
var saveOptions = new HtmlSaveOptions();
saveOptions.AttachedFilesDirectory = outputFile + “output_files”;
var myStream = new System.IO.MemoryStream();
wb.Save(myStream, saveOptions);

// Reading stream into HTML file –
returnFileName = fileName.ToUpper().Replace(“.XLSX”, “”) + “Output.html”;
FileStream wrfileStream = new FileStream(outputFile + returnFileName, FileMode.Create, FileAccess.Write);
myStream.WriteTo(wrfileStream);
wrfileStream.Close();
}

Hi Babar,

Thank you very much for detailed response. Much appreciated your knowledge to understand the problem and way of explanation.
Although I was doubtful on same (sheet files) and you confirmed! And fyi, yes in my requirement our excel files will be having multiple sheets and big size (aprox 2MB) as well.

Anyway next challenge:
now when i am using above (in your prev response) code with below line that works fine at local but
##############
<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>saveOptions.AttachedFilesDirectory = outputFile + “output_files”;
##############<br style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>
that means attachment (folder with files-Book1Output_files) will be created at given path and it will not be part of my stream (i.e holding main workbook - Book1.xlsx), so challenge is, if i will pass/store my stream on azure blob or at some other location and next time when i will get/access same stream it will not be having attachments so its again bad html that will not work.

So considering above situation of excel having multiple sheets and html file need folder for sheets data, i can think of 2 solutions:

Option-1)First i should create that folder (<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>AttachedFilesDirectory) locally (like above code) and then I should upload same folder separately on my azure blob or anywhere. And it does not seems very optimised approach ?

Option-2) If somehow, we can make that folder(<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>AttachedFilesDirectory) as a part of stream (under wb.save) then whenever we will store/pass our stream, everything will go as a kind of complete package (html file and folder) and then wherever we will access/read/get same stream we will be having complete html with folder as well.

What you suggest, any chance for option-2 or any other better 3rd option ?

Thanks

Hi Vikramjeet,


Thank you for the confirmation on previously reported problem. Regarding your recent concerns, there are a few worth considering solutions as detailed below.

1) Save the result in MHTML format instead of HTML because MHTML is a web page archive format used to combine in a single document the HTML code and its companion resources. If you choose to save the spreadsheet in MHTML format, you do not need to worry about the additional resources as they will be injected into the single MHTML file. Moreover, MHTML can be viewed in browsers similar to HTML without any additional plugins/3rd party tools. Note: You may need to enable ActiveX controls/scripts to properly view the contents in IE and some other browsers.

C#

<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”>var book = <span class=“kwrd” style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>new<span style=“font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre; background-color: rgb(255, 255, 255);”> Workbook(dir + “Book1.xlsx”);
var myStream = new System.IO.MemoryStream();
book.Save(myStream, SaveFormat.MHtml);
FileStream wrfileStream = new FileStream(dir + “stream.mht”, FileMode.Create, FileAccess.Write);
myStream.WriteTo(wrfileStream);
wrfileStream.Close();

2) Another possible solution could be to save the individual worksheet to separate HTML file. In case a given worksheet contains any embedded images, you can tell the API to export the images as base64 strings, hence no additional folder will be required to save the resources.

C#

var saveOptions = new HtmlSaveOptions(SaveFormat.Html);
saveOptions.ExportActiveWorksheetOnly = true;
saveOptions.ExportImagesAsBase64 = true;

var book = new Workbook(dir + "Book1.xlsx");
for (int i = 0; i < book.Worksheets.Count; i++)
{
book.Worksheets.ActiveSheetIndex = i;
using (var myStream = new System.IO.MemoryStream())
{
book.Save(myStream, saveOptions);
FileStream wrfileStream = new FileStream(dir + book.Worksheets[i].Name + ".html", FileMode.Create, FileAccess.Write);
myStream.WriteTo(wrfileStream);
wrfileStream.Close();
}
}

Please execute the above code in licensed mode only, otherwise the process will be stuck in an endless loop because with each Workbook.Save call, the API will inject a worksheet with evaluation watermark to the workbook.