HTML to Excel Worksheets

Hello, I am having trouble exporting HTML to a Multisheet XLSX report. There is not a lot of documentation on doing this conversion. I am curious, when the HTML string is parsed, how does it know what are separate worksheets? i.e. how does the Workbook object deliminate its WorksheetCollections?

Hi @afitz,

Can you please tell us what version of Aspose Cells are you using. Can you also attach an example of the input file?

@mlyra My company uses 20.11

@mlyra I will put together an example, but I am most curious, what are the delimiters within the HTML that Aspose.cells looks for to create the WorkbookCollection for a Multisheet?

hi @afitz

There is no delimiter to separate multiple worksheets in an HTML file. However, I can create some strategies that can use other methods from rich Aspose APIs that can provide such functionality. Please send me the input file.

Thank you @mlyra
We need it to be dynamic in the sense that the input file could have x amount of Worksheets. I have attached two files compressed. One file has mhtml boundaries, the other does not. We utilize the single file web page because we modify it for reporting with Velocity and a Java application. The Velocity is run before it goes into our Aspose.cells conversion, so not a concern. We are able to output single file Excel xlsx Workbooks with it. Our conversion is generally:
LoadOptions loadOptions = new LoadOptions(getLoadFormat(isMhtml));
Workbook workbook = new Workbook(inputStream, loadOptions);
workbook.save(outputStream, SaveFormat.XLSX);
But the conversion is sensitive- i.e. it does not like inline styling. And we need to implement AutoFit and you do have some docs with examples on that which is great. Another possible concern-if we do the conversion with mhtml boundaries will the six sheets link to the same styles, or if we do no boundaries, how can we make it so the six sheets refer to the same CSS styles in the first head tags.html_to_xlsx.7z (17.7 KB)

Thank you so much for your help!

@afitz,

Generally, Aspose.Cells APIs check the records about Multisheet when loading html file. Do you need those elements/delimiters within the HTML to differentiate if the HTML file has multiple sheets in it?

We will get back to you to provide details on it.

Hi there,
Ideally we input the file, and the Aspose.cells conversion to Workbook contains the WorksheetCollection, and outputs a Excel xlsx file with the proper tabs, and sheet styling. If there are x amount of sheets, it is easier for us to keep all the report styles in one location in the .mhtml file. But if we need to have them in the head tags for each sheet we can. We are flexible in the sense that we just need a working solution. i.e. we can use the mhtml boundaries if Aspose.cells needs them as delimiters, but we can also omit them if Aspose.cells does not need them, and can successfully determine the separate sheets and corresponding sheet tabs. Thank you for your help!

@afitz,

Thanks for providing further details.

We will soon share the internal working and details.

@afitz
Could you provide your input sample files and expected output file? We can check it ASAP.
And now please use MHT file to contains X sheets.

Hi Simon, I attached them 3 days ago to @mlyra, they are attached as a .7z “html_to_xlsx.7z”

html_to_xlsx.7z (17.7 KB)
Here they are again @simon.zhao Thank you!

@afitz,

Thanks for the sample files.

Your zipped archive contains only MHT files, it does not include your expected (e.g., XLSX) file. Please also provide your expected Excel file which you want to acquire from MHT files.

Thank you @Amjad_Sahi, Here is the compressed output example. Thank you!xlsx_Output_example.7z (41.5 KB)

@afitz,

Thanks for the expected output Excel file.

We are checking your issue with your expected output file and will get back to you soon.

@afitz,

Thanks for the expected output Excel file.

Please note, generally, we follow MS Excel standards and specifications when rendering HTML to Excel spreadsheets. Anyways, I have logged a ticket with an id “CELLSJAVA-44907” for your issue/requirements. We will evaluate HTML to Excel rendering if we could enhance as per your custom needs.

Once we have an update on it, we will let you know.

Thank you @Amjad_Sahi. We look forward to any insight, knowledge or improvements of html to .xlsx Workbooks with WorksheetCollections.

@afitz,

Sure, we will keep you posted with new updates once available on it.

@afitz
We check Excel Worksheets with the following section in the main.html.

<!--[if gte mso 9]><xml>
 <x:ExcelWorkbook>
  <x:ExcelWorksheets>
   <x:ExcelWorksheet>
    <x:Name>Report1</x:Name>
    <x:WorksheetSource HRef="dest_files/sheet001.htm"/>
   </x:ExcelWorksheet>
   <x:ExcelWorksheet>
    <x:Name>DataSheet1</x:Name>
    <x:WorksheetSource HRef="dest_files/sheet002.htm"/>
   </x:ExcelWorksheet>
  </x:ExcelWorksheets>
  <x:Stylesheet HRef="dest_files/stylesheet.css"/>
  <x:WindowHeight>12456</x:WindowHeight>
  <x:WindowWidth>23256</x:WindowWidth>
  <x:WindowTopX>-108</x:WindowTopX>
  <x:WindowTopY>-108</x:WindowTopY>
  <x:RefModeR1C1/>
  <x:TabRatio>600</x:TabRatio>
  <x:ActiveSheet>0</x:ActiveSheet>
 </x:ExcelWorkbook>
</xml>

@afitz

Can you split the file into normal html format as MSExcel saved file?