Excel to HTML conversion. Increasing file size

Hello.

I use Aspose.Cells to convert Excel files to HTML.
HTML is significantly increased in size compared to the original document ()
example.zip (2.0 MB)
image.png (9.6 KB)
I found that Aspose duplicates styles for the td in HTML, which can lead to an increase in size.
image.png (4.5 KB)

I believe that optimization is needed here.
Probably the CSS class should be generated and used in places with the same styling.

I have a 10MB Excel document that is 300MB after conversion.

@Andrei86,

Thanks for the template file.

Please notice, I am able to reproduce the issue as you mentioned by converting your template file to HTML file format. I found that file size (“sheet001.htm”) is somehow increased in Excel to HTML conversion.
e.g.
Sample code:

Aspose.Cells.Workbook workbook = new Aspose.Cells.Workbook("e:\\test2\\example.xlsx");

            foreach (Worksheet sheet in workbook.Worksheets)
            {
                sheet.Cells.DeleteBlankColumns();
                sheet.Cells.DeleteBlankRows();
            }

            HtmlSaveOptions options = new HtmlSaveOptions();
            options.ExportHiddenWorksheet = false;
            options.ExcludeUnusedStyles = true;                

            workbook.Save("e:\\test2\\out1.html", options);

I have logged a ticket with an id “CELLSNET-49377” for your issue. We will look into it soon.

Once we have an update on it, we will let you know.

@Andrei86,

When saving the file as HTML with MS Excel, the size of sheet001.htm is also greater than 70M. We found there are many empty cells in the HTML file. Now, please try the following code to merge empty cells to minimize the size:

HtmlSaveOptions options = new HtmlSaveOptions();
options.ExportHiddenWorksheet = false;
options.ExcludeUnusedStyles = true;
options.MergeEmptyTdForcely = true;

workbook.Save(dir + "dest.html", options);

Hope, this helps a bit.

ExcludeUnusedStyles option does not fix problem with big generated file.
MergeEmptyTdForcely parameter breaks other files. Details here:

@Andrei86,

As we told you, when saving the file as HTML using MS Excel, the size of sheet001.htm is also greater than 70M. We found there are many empty cells in the HTML file. As devised, please try the suggested code (in the previous post) to merge empty cells to minimize the size for your needs.

Please follow up other thread to get latest updates or for a fix.

How about styles duplication?
Is it fixed in new version 21.10?

@Andrei86,

I guess you will get more or less same rendering for styles when using MS Excel to convert to HTML. Is not it or you find different rendering?

As we found there are many empty cells in the HTML file. So, please try the following code to merge empty cells to minimize the size:

HtmlSaveOptions options = new HtmlSaveOptions();
options.ExportHiddenWorksheet = false;
options.ExcludeUnusedStyles = true;
options.MergeEmptyTdForcely = true;//please try this option for this file.

workbook.Save(dir + "dest.html", options);

@Andrei86,

Please try the latest version Aspose.Cells for Java 21.11 with the following codes:

using (Aspose.Cells.Workbook workbook = new Aspose.Cells.Workbook(dir + "example.xlsx")) {
HtmlSaveOptions options = new HtmlSaveOptions();
options.ExportHiddenWorksheet = false;
options.ExcludeUnusedStyles = true;
workbook.Save(dir + "dest.html", options);
}

The size of the generated “sheet001.html” is about 2M on our end.

We use .Net version

@Andrei86,

Sorry, you may try Aspose.Cells for .NET v21.11 (Downloads section/Nuget repos.).