Removing unused styles from Excel workbooks

There is a common bug in recent versions of Excel, where on a copy/paste between different workbooks, all the styles from the source workbook are added to the destination workbook.

As a result, at some point (when reaching ~65000 styles), workbooks become unusable.

The problem is documented in the following KB article, among others:

http://support.microsoft.com/kb/213904

We are using Aspose Cells, and are using the HTML export feature. The problem we are running into is that Aspose Cells exports all styles present in the Excel workbook to CSS class selector declarations in the generated HTML. For some of workbooks we’ve seen, close to 15 MB of CSS text is generated! This breaks the rendering in Internet Explorer, and greatly slows down other web browsers. At the same time, out of the tens of thousands of styles present in the workbook, only 20 or so are used.

Unfortunately, Aspose Cells does not provide any way to delete styles from a workbook, making it impossible to implement an automatic fix or workaround for this problem.

Given that this is a very common problem when working with Excel, it would be a very good improvement if Aspose Cells were extended to deal with this. One way would be to implement operations that allow the deletion of styles. Another option would be to make the HTML export smarter, and allow it to only transform those styles to CSS that are actually used in the workbook or worksheet being exported.

Thanks in advance

Hi Taras,


Thank you for floating the idea.

We have evaluated your presented scenarios while using the latest version of Aspose.Cells for Java 8.3.1, and we have logged the following tickets for the product team’s review.

  • CELLSJAVA-41144 has been logged as a feature request to provide the ability to delete/remove styles from the StyleCollection.
  • CELLSJAVA-41145 has been logged as an enhancement to the HTML rendering core to make the CSS generation process smart enough to include only used styles in the resultant CSS file.

We have used the following piece of code to test the scenario of dynamically creating too many styles in the workbook before saving the result in HTML, XLS and XLSX formats. In case of HTML format, the resultant CSS is of 20+ MB in size, whereas you can see that only one of the styles is actually used in the spreadsheet to format a cell.

Java

Workbook book = new Workbook(“D:/temp/book1.xlsx”);
Worksheet sheet = book.getWorksheets().add(“test”);

Style style = null;
for (int i = 0; i < 70000; i++)
{
//style = book.createStyle();
int index = book.getStyles().add();
style = book.getStyles().get(index);
style.setName(“style”+i);
style.getFont().setName(“Arial”);
style.getFont().setSize(20);
style.getFont().setBold(true);
}

sheet.getCells().get(“A1”).putValue(“A”);
sheet.getCells().get(“A1”).setStyle(style);
book.save(“D:/out.xlsx”, SaveFormat.XLSX);
book.save(“D:/out.xls”, SaveFormat.EXCEL_97_TO_2003);
HtmlSaveOptions html = new HtmlSaveOptions();
html.setAttachedFilesDirectory(“out”);
book.save(“D:/out.html”, html);

You have to spare us little time to properly analyze the feasibility of the above discussed feature/enhancement. As soon as we complete the preliminary analysis, we will update you here for your reference.

Hi,

Thanks for using Aspose.Cells for Java.

We have fixed this issue (CELLSJAVA-41145).

Please download and try this fix: Aspose.Cells for Java v8.3.1.5 and let us know your feedback.

Hi,


We’ve tried 8.3.1.5.

While the generation of HTML does not longer seem to include unused styles, we’re seeing a number of differences in the appearance of the generated HTML, compared to 8.3.0, so at this point we’re not sure yet.

We’ll investigate these one by one, the first one is:
8.3.0 vs 8.3.1.5: changes in cross cell text cropping?

Kind regards,
Taras

Hi Taras,


Sure, we will review your post and assist you accordingly. Please note, it would be appropriate to create a new thread for every distinct problem for better viability and tracking of your reported issues.

The issues you have found earlier (filed as CELLSJAVA-41144;CELLSJAVA-41145) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

Hi,


Could you provide a code example of how to remove styles? Sample code showing how to remove unused styles only would be even better. I’m referring to the feature implemented as CELLSJAVA-41144).

I noticed also that Aspose Cells for Java version 8.3.2 marks the StyleCollection class as deprecated. What is the intended replacement for it?

Thanks in advance,
Taras

Hi Taras,

Thanks for your posting and using Aspose.Cells.

It is not possible to remove style objects, however we will analyze its feasibility. We have therefore logged this issue in our database for investigation. Once, there is some update for you, we will let you know asap.

This issue has been logged as

  • CELLSJAVA-41207 - Possibility of removing Style objects from Workbook

Hi Taras,

Thanks for using Aspose.Cells.

Currently
we do not provide api for users to remove/delete style manually. On the
contrary, whether the style collection needs cleanup operation and
which style should be removed is determined automatically when saving
the html files. It is because removing style is a complicated and
time-consumed operation: when one style was removed from the
collection (even though it is not used by any object in the workbook),
some or all style references of objects in the workbook (such as cell,
row,…etc.) maybe need to be updated.



Anyways, if you do require to do this operation by yourself manually,
we may provide a method named like Cleanup() or so. However, you should
keep it in mind that this operation should be called only when it is
really required.


For the deprecated class StyleCollection, user uses it commonly for
creating new style or getting named style. For creating new style,
please use Workbook.CreateStyle() method instead. For getting named
style, please use Workbook.GetNamedStyle(string). Also please refer to
the “Obsoletes StyleCollection” section in the release note at http://www.aspose.com/community/files/51/.net-components/aspose.cells-for-.net/entry601617.aspx or http://www.aspose.com/community/files/72/java-components/aspose.cells-for-java/entry601680.aspx.

                <br>
Since CELLSJAVA-41144 was defined as "a feature request to provide the ability to delete/remove styles from the StyleCollection" and was marked as "Fixed" subsequently, I assumed this feature had been implemented.

Hi Taras,


First of all, please accept our apologies for not informing you earlier that the product team decided not to publicly expose the mechanism to remove a particular style from the collection. Reasons for their decision have already been elaborated in this post. Instead, a cleanup operation has been implemented internally in reference to your request that was logged earlier as CELLSJAVA-41145 (Smart Generation of CSS while exporting Spreadsheets to HTML). If you require us to expose the Cleanup, we will be happy to make it publicly accessible with future releases of Aspose.Cells APIs.

Hi Babar,

Exposing the feature to remove unused styles from workbooks would be very valuable for us.
We deal with a wide range of sources of Excel workbook in our product, and in the real world, the "accumulating unused styles" issue is quite prevalent.

Since it has a real impact on performance, memory usage and file size, we would greatly appreciate a way to automate the cleaning. At the moment, we rely on various external tools, which need to be manually invoked, and do not provide a realistic solution.

Thanks in advance,
Taras

Hi Taras,

Let me discuss the matter again and log appropriate ticket to expose the style cleanup method that you may call in your code to control the process a bit. I will shortly get back to you with updates in this regard.

Hi again,


This is to inform you that upon discussion with the product team, we have reopened the ticket (CELLSJAVA-41144) that was originally logged to provide the same feature (Ability to remove unused Style objects). As soon as the requested feature is available, we will notify you here in reference to the aforesaid ticket.

Hi Babar,


Will this feature be provided in the future?
Dealing with “style pollution” in Excel is a daily battle for us.

Thanks,
Taras

Hi Taras,

Thanks for your posting and considering Aspose.Cells.

We are afraid, there is no update for you regarding this issue at this moment. However, we have logged your comment in our database against this issue and requested the product team to provide some ETA or fix for it. Once, there is some news for you, we will share it with you asap.

Hi Taras,

Thanks for considering Aspose.Cells.

We will try to provide a fix for this feature in next week.

Hi again,


This is to update you that the ticket logged earlier as CELLSJAVA-41144 has been resolved now. We will shortly provide the next revision of the Aspose.Cells for Java API here once we have ensured the quality of the build and incorporated other enhancements.

Hi,

Thanks for using Aspose.Cells

For this requirement, please try Workbook.RemoveUnusedStyles() method with our latest release: Aspose.Cells for Java 8.6.2.