HTML to Excel Worksheets

We loose the potential of the formatting and styles to be in one file if we Save As a “html” (Web Page). We create the mockup in xlsx, then Save As “mhtml” Single File Web Page. This we can open in our text editor and it has all the same things the “html” file would have, but also the formatting, html elements of the tables, and styles. Including the xml for the Worksheets. Can you be more explicit in what we need our input file to contain in order for the html to .xlsx conversion to work, and can you help with the Aspose.cells API examples? Thank you.

@afitz,

Thanks for the details.

We will evaluate and update you soon.

@afitz
When we read MHT file,
1, Check Content-Type: multipart/related; boundary="----=_NextPart_01D8D1DD.28209070" to get “boundary”
2,Split MHT to html files by boundary and named each html file with the value of “Content-Location”
3,Parse “x:ExcelWorkbook” xml section to get HRef of “x:WorksheetSource”,
4,Gets the html stream of the worksheet by comparing HRef of “x:WorksheetSource” and “Content-Location”
5, Read the data of the worksheet.

@afitz
If you can split the your compound html to several html streams, you also can implement IStreamProvider to provide each worksheet stream as the following codes:
internal class StreamProviderZip : IStreamProvider
{
Hashtable _streamMap;//split each stream to this hashtable.
public void InitStream(StreamProviderOptions options)
{
options.Stream = (Stream)_streamMap[options.DefaultPath];
}

    public void CloseStream(StreamProviderOptions options)
    {
        //options.CloseStream();
    }

  
}
class Program
{   static void Main(string[] args)
    {
        HtmlLoadOptions loadOptions = new HtmlLoadOptions();
        loadOptions.StreamProvider = new StreamProviderZip();
        Stream mainStream = null;//main html file with xml section.
        Workbook workbook = new Workbook(mainStream, loadOptions);

}
}

Thank you @simon.zhao for detailing how the .mht file is read. With that process we can use the Aspose.cells API like this without needing to use the StreamProvider? Do we need to add any code here?

LoadOptions loadOptions = new LoadOptions(getLoadFormat(isMhtml));
Workbook workbook = new Workbook(inputStream, loadOptions);
workbook.save(outputStream, SaveFormat.XLSX); 

The StreamProvider is a suggestion, not necessary?

*** I just ran some tests of the .mht to .xlsx and was able to successfully output a two page Multisheet Workbook-- thanks to your recommendations and efforts! Styles are a little finicky. Certain inline styles can cause the file to break. But, the good news is we have an output with correct tabs and sheets. From here I will incrementally add more sheets, styles, and data, so that I can catch each bug as it occurs. Is there a way to hard code row heights and column widths? So far I am successful at one or the other. In the test I just ran I was able to hard code column widths, but the rows would not respond to any heights, and I had to manually expand the rows to fit the content.

@afitz
1, The StreamProvider is a suggestion.
2, Certain inline styles can cause the file to break;Is there a way to hard code row heights and column widths?
Please share your template file here. We will check it ASAP.

RowHeightColWidthBugs.7z (3.1 KB)

Hi @simon.zhao, thank you for your help. In the file I commented where the bugs occur.
In setting column widths and row heights with Aspose.cells, if we want the columns and rows to expand with content, we must implement the AutoFit property detailed AutoFit Columns and Rows while loading HTML in Workbook|Documentation. If that property is always set to true, does hardcoding row widths and heights ever override the property? …Or will we need to choose one or the other…i.e.

  1. If AutoFit property is set to false, hard code row heights and column widths
  2. If AutoFit is set to true, the columns and rows will always expand with content and ignore any hardcoded widths and heights.
    Thanks!

@afitz,

Thanks for the sample file with details.

I have logged it with your existing ticket “CELLSJAVA-44907” into our database. Let us evaluate your requirements and concerns in details. We will get back to you soon.

@afitz
1,Please remove "Content-Transfer-Encoding: quoted-printable " in the your file for it’s not encoded with “quoted-printable”.
2, If row widths are set, MS Excel will not auto extend row heights again.

Worked like a charm! Thanks @simon.zhao , I appreciate your swift feedback.

@afitz,

It is nice to know that following the suggestion works for your needs. Feel free to write us back if you have further queries or comments.

Hi there. Everything is going great based on your recommendations. Currently building a six sheet Workbook, and there is one odd thing with the output file, that has to do with cell range. It appears as dashes for each instance that the range bug occurs on the cover page at the very top (maybe it has something to do with the tab links?). If I take the .xlsx Workbook and convert back to .mht you can see how that range bug shows up in the html. Have you seen this bug before? Thank you.Screenshot (1400).png (41.1 KB)
Screenshot (1401).png (18.1 KB)

@afitz,

Thanks for the screenshots and findings.

Please spare us little time as we need to evaluate your provided details/findings using your sample file(s). We will get back to you soon.

@afitz
Could you post your template html file and excel file ? We will check it soon.
“<0xa0>” is very strange.

Hi @simon.zhao . I figured it out. By removing the tab sheet from the input .mht file the underlines that appear (seen on the cover page of the Workbook, detailed in the screen shot I sent ) do not show up in the output file. The tab sheet is not necessary. I simply can modify the tabs in the xml of the head ExcelWorkbook.

The <0xa0> shows up whenever we convert the Excel Workbook to .mht. We always just delete them or if there are just a few ignore them. It has to do with spaces, I believe.

Is it possible to hide and expand rows? If a rows height is set as 0 in the input file, would the Aspose.cells api be able to output a row that is initially hidden, but if the expand button is clicked would show the hidden rows? I just tried it, and unfortunately did not work, and hidden child rows are not in the output all together. What I would like to accomplish is the feature in the image Screenshot (1430).png (2.0 KB)

@afitz,

We will evaluate your requirements and get back to you soon.

@afitz
1, Is "<0xa0> " generated by your program?
2, hide and expand rows
We will read “mso-outline-level” property in the next version to support this feature as MS Excel

Thank you @simon.zhao

  1. No. It can happen if you build a document in Word or workbook in Excel and “Save As” a Single File Web page. If you were to open that Single file as .mht in a text editor it most often will have an occurrence/s of “<0xa0>”
  2. When is the next version release?
    I appreciate your time.