Save to HTML: irrelevant data included

Hi,

Using Aspose Cells for Java, version 17.11.
We’re using Aspose Cells in a server side application where we visualize Excel in HTML clients, overlaying the HTML exported with Aspose with additional data. The consumer of the HTML is a web browser. Importing the HTML back into Excel or Aspose is not relevant for us.

Sample code:

Workbook workbook = new Workbook("input.xlsx");
HtmlSaveOptions options = new HtmlSaveOptions();
Worksheet sheet = workbook.getWorksheets().get("Versioning");
workbook.getWorksheets().setActiveSheetIndex(sheet.getIndex());
options.setExportActiveWorksheetOnly(true);
options.setExportBogusRowData(false);
options.setExportFrameScriptsAndProperties(false);
workbook.save("output.html", options);

See attached Excel input and HTML output files.html-irrelevant-data.zip (1.4 MB)

Some observations on the quality and usefulness of the output from Aspose Cells:

  1. The generated HTML contains over 5000 CSS styles, 99% of which are not used in the generated HTML. This amounts to multiple megabytes of unneeded CSS. Would it be possible to provide a property for HtmlSaveOptions (“ExcludeUnusedStyles”?), that would trigger Aspose Cells to only write out the CSS for styles actually being used by the Excel content being saved?
  2. Another large part of the output consists of <x:ExcelName> data, describing names defined in the workbook. Note that most (if not all) of the names being written by Aspose do not actually relate to the sheet being saved. As mentioned before, our use of the exported HTML is to show it in a web browser (on a PC or smartphone client), so in fact any of the Office-specific XML being embedded in the HTML is of no use for us. In fact, it consumes unneeded bandwidth and likely impacts performance without any benefit. Since we are not interested in being able to re-import the generated HTML into Excel or Aspose, we’d like to generate only content that is relevant to the viewer. An option to completely disable generation of the <x:ExcelWorkbook> element and content below it would be a great improvement for out use of Aspose Cells.
  3. A similar case applies to the <o:DocumentProperties> element generated by Aspose Cells. There is no use for this if the consumer is an HTML client. In addition, including this element might unintentionally expose privacy-sensitive data, so it would be good if there was an option controlling generation of this element.

Overall, we are quite happy with the fidelity of the HTML generated by Aspose Cells. However, for the use-case where the consumer of the HTML is a regular web-browser or smartphone, and re-import of the generated HTML is not needed, it would be nice to have more control to exclude irrelevant content from the exported HTML.

Kind regards,
Taras

@TarasTielkes

Thanks for using Aspose APIs.

We were able to find irrelevant data included issue when save to html. We tried to fix it using Workbook.removeUnusedStyles() but it threw exception. We hided all worksheets except the Versioning sheet but it did not fix the issue. Besides we found, if we don’t use HtmlSaveOptions.setExportActiveWorksheetOnly() method, the output Html size is quite less.

This issue has been logged as

  • CELLSJAVA-42471 - Irrelevant Data included when Save to Html

@TarasTielkes

Thanks for using Aspose APIs.

Please download and try the following fix and let us know your feedback.


About CELLSJAVA-42471, we have some tips for you:

Adds HtmlSaveOptions.ExcludeUnusedStyles property which indicates whether excluding unused styles. The default value is false.

If you want to import the HMTL or MHTML file to Excel, please keep the default value.


Adds HtmlSaveOptions.ExportDocumentProperties property which indicates whether exporting document properties. The default value is true.

If you want to import the HTML or MHTML file to Excel, please keep the default value.


Adds HtmlSaveOptions.ExportWorksheetProperties property which indicates whether exporting worksheet properties. The default value is true.

If you want to import the HTML or MHTML file to Excel, please keep the default value.


Adds HtmlSaveOptions.ExportWorkbookProperties property which indicates whether exporting workbook properties. The default value is true.

If you want to import the HTML or MTHML file to excel, please keep the default value.


The sample code in Java is as follows

Java

Workbook workbook = new Workbook(filePath + "input.xlsx");

Worksheet sheet = workbook.getWorksheets().get("Versioning");

workbook.getWorksheets().setActiveSheetIndex(sheet.getIndex());

HtmlSaveOptions options = new HtmlSaveOptions();
options.setExportActiveWorksheetOnly(true);
options.setExportBogusRowData(false);
options.setExportFrameScriptsAndProperties(false);
options.setExcludeUnusedStyles(true);
options.setExportDocumentProperties(false);
options.setExportWorkbookProperties(false);
options.setExportWorksheetProperties(false);

workbook.save(filePath + "out_java.html", options);

Hi Shakeel,

These are very useful improvement, thank you.

We’ve done some quick initial testing with 17.12.4.
We still observe a large amount of unused styles being generated into the CSS section, even though we specify options.setExcludeUnusedStyles(true). Please see the following sample code, and the attached input and output files:

Workbook workbook = new Workbook("input.xlsx");
WorksheetCollection worksheets = workbook.getWorksheets();
Worksheet worksheet = worksheets.get("Entry points");
worksheets.setActiveSheetIndex(worksheet.getIndex());

HtmlSaveOptions options = new HtmlSaveOptions();
options.setExportActiveWorksheetOnly(true);
options.setExportBogusRowData(false);
options.setExportFrameScriptsAndProperties(false);
options.setExcludeUnusedStyles(true);
options.setExportDocumentProperties(false);
options.setExportWorkbookProperties(false);
options.setExportWorksheetProperties(false);

workbook.save("output.html", options);

export-html-17.12.4.zip (1.4 MB)

If you inspect the generated HTML and CSS, you’ll see that many declared CSS styles are not used in the generated HTML. Examples of unused style class declarations: x3586, x3588, x3634. In fact, the large majority of generated CSS style declarations are (still) not actually used.

Thanks again for improving this part (HTML export) of Aspose Cells. This will make a big different for our use cases, and we’ll be happy to provide further feedback after additional testing.

Kind regards,
Taras

@TarasTielkes

We were able to replicate this issue and logged it in our database for investigation and for a fix. Once, there is some fix or other news for you, we will let you know asap.

This issue has been logged as

  • CELLSJAVA-42494 - Large amount of unused styles being generated into the CSS section

@shakeel.faiz
One other remark on the HTML/CSS generated in 17.12.4.
When setExcludeUnusedStyles is enabled, I see a style that actually is used not being generated.
In the attached example, see range “Entry points!O4:R4”.

The generated HTML looks like this:
<td class=‘x4834’><font class=“font0”>b+l</font><font class=“font39”>&nbsp;(including&nbsp;ECB&nbsp;add-on)</font></td>
<td class=‘x4834’><font class=“font0”>a</font><font class=“font39”>&nbsp;(including&nbsp;ECB&nbsp;add-on)</font></td>
<td class=‘x4834’><font class=“font0”>p+r </font><font class=“font39”>(including&nbsp;ECB&nbsp;add-on)</font></td>
<td class=‘x4835’><font class=“font0”>o</font><font class=“font39”>&nbsp;(including&nbsp;ECB&nbsp;add-on)</font></td>

The CSS class declaration for “font0” and “font39” seems to be missing.

@TarasTielkes

We have logged your update in our database for product team consideration and investigation. We will look into it as well and once, we will get some fix or other update for you, we will let you know asap.

@TarasTielkes

Thanks for using Aspose APIs.

We are afraid, we cannot fix your issue logged as CELLSJAVA-42494 because of its complexity. For now, we cannot extract unused styles from the whole style collection.

If you want to delete all the unused styles in some specific worksheet, please delete other worksheets firstly and then export the Excel file to Html.

The issues you have found earlier (filed as “CELLSJAVA-42471” and its sub-tasks (“CELLSJAVA-42490”, “CELLSJAVA-42491” and “CELLSJAVA-42493”)) have been fixed in this Aspose.Cells for Java 18.1 update.

Please also check the following document/blog for your reference:

@TarasTielkes,
Please try our latest version/fix: Aspose.Cells for .NET v21.6.3 (attached)
Aspose.Cells21.6.3 For .Net2_AuthenticodeSigned.Zip (5.5 MB)
Aspose.Cells21.6.3 For .Net4.0.Zip (5.5 MB)
Aspose.Cells21.6.3 For .NetStandard20.Zip (5.5 MB)

Your issue should be fixed in it.
Let us know your feedback.

@John.He,

Is a Java version of Aspose Cells with these change also available?

Thanks in advance,
Taras

@TarasTielkes,

The parallel Java fix is not published yet. We will share the Java fix in the next few days.

Keep in touch.

@TarasTielkes,
Please use HtmlSaveOptions.ExcludeUnusedStyles property when exporting file to html.
Please try our latest fix:
aspose-cells-21.6.5-java.zip (7.3 MB)

The issues you have found earlier (filed as CELLSJAVA-42494) have been fixed in this update.

Hi @John.He,

I’ve tried version 21.6.5, but I still see a large amount of unused styles being added to the generated HTML.

Sample code:

Workbook workbook = new Workbook("input.xlsx");
WorksheetCollection worksheets = workbook.getWorksheets();
Worksheet worksheet = worksheets.get("Entry points");
int index = worksheet.getIndex();

worksheets.setActiveSheetIndex(index);

HtmlSaveOptions saveOptions = new HtmlSaveOptions(SaveFormat.HTML);
saveOptions.setParseHtmlTagInCell(false);
saveOptions.setExportHiddenWorksheet(false);
saveOptions.setExportActiveWorksheetOnly(true);

saveOptions.setExportDocumentProperties(false);
saveOptions.setExportWorkbookProperties(false);
saveOptions.setExportWorksheetProperties(false);
saveOptions.setExportFrameScriptsAndProperties(false);
saveOptions.setExportBogusRowData(false);
saveOptions.setExportComments(false);
saveOptions.setEncoding(Encoding.getUTF8());

saveOptions.setExcludeUnusedStyles(true);

workbook.save("output.html", saveOptions);

From looking at the generated HTML, the very large majority of CSS styles generated are not being used. From the top down, here are the first 20 unused ones:

  • x15
  • x3535
  • x3536
  • x3537
  • x3538
  • x3539
  • x3540
  • x3541
  • x3542
  • x3543
  • x3544
  • x3545
  • x3546
  • x3547
  • x3548
  • x3549
  • x3550
  • x3551
  • x3552
  • x3553

These are just the first 20 unused styles - in fact, there is a total of 1398 unused styles in the generated HTML, making up 90% (or more) of the total size of the output :wink:

Input XLSX and output HTML attached.

Kind regards,
Taras

aspose-cells-unused-styles-in-html.zip (1.4 MB)

@TarasTielkes,
We have observed this issue and logged it with the ticket. You will be notified here once any update is ready for sharing.

@TarasTielkes,

Please note, deleting unused styles happens at the workbook level and not on worksheet level. Although you only export one worksheet, but all the styles used in the workbook will be exported. If you want to export only one worksheet and delete all unused styles, you can delete other worksheets before exporting. In this way, all unused styles will be removed during style collection.

@TarasTielkes,

Furthermore, we found there are still some issues in the APIs end. Please wait for the next version to be published. We will provide a new fixed version within the next few days or so.

@TarasTielkes,
Please try our latest version/fix: Aspose.Cells for Java v21.7.3 (attached)
aspose-cells-21.7.3-java.zip (7.4 MB)

You can use the following code to test it:

    Workbook workbook = new Workbook(filePath + "input.xlsx");
    WorksheetCollection worksheets = workbook.getWorksheets();        
    
    System.out.println("before collecting styles: " + workbook.getCountOfStylesInPool());
    for (int i = 0; i < worksheets.getCount(); i++)
    {
        if (!worksheets.get(i).getName().equals("Entry points"))
        {
            worksheets.removeAt(i);
            i--;
        }
    }
    System.out.println("worksheet count: " + worksheets.getCount());
    System.out.println("after delete other worksheets: " + workbook.getCountOfStylesInPool());

    Worksheet worksheet = worksheets.get("Entry points");
    int index = worksheet.getIndex();
    worksheets.setActiveSheetIndex(index);

    HtmlSaveOptions saveOptions = new HtmlSaveOptions(SaveFormat.HTML);
    saveOptions.setParseHtmlTagInCell(false);
    saveOptions.setExportHiddenWorksheet(false);
    saveOptions.setExportDocumentProperties(false);
    saveOptions.setExportWorkbookProperties(false);
    saveOptions.setExportWorksheetProperties(false);
    saveOptions.setExportFrameScriptsAndProperties(false);
    saveOptions.setExportBogusRowData(false);
    saveOptions.setExportComments(false);
    saveOptions.setEncoding(Encoding.getUTF8());

    saveOptions.setExcludeUnusedStyles(true);

    workbook.save(filePath + "out_java.html", saveOptions);

    System.out.println("after collecting styles: " + workbook.getCountOfStylesInPool()); 

Your issue should be fixed now.
Let us know your feedback.

Hi @John.He,

We will try that, however, it seems a bit silly to manually delete all the other sheets.
I am already indicating which specific sheet I want to export, should Aspose Cells not be able to use that information to determine which styles are applicable in the resulting HTML?

Kind regards,
Taras