New Workbook() on HTML stream very slow since 23.6

Hi, we recently started experiencing a performance issue in a section of our code after upgrading to Aspose-Cells for Java 23.10. What we’re doing is creating an HTML input stream (using non-Aspose code) and trying to save it as XLSX. The performance problem is in the “new Workbook()” call, not the subsequent save() we do right afterward. (Which I find surprising… I’d expect a constructor to be fast, and a save() to be slow.) It used to create the Workbook instance in less than 1 second, now it takes at least 40 seconds for the sample contents.

I did some debugging and experimenting, and found it worked great in 23.1, and degraded in 23.6. Releases 23.2 through 23.5 give various index out of bounds and OOME errors from the constructor code and thus could not be tested for performance.

Our code is much larger than could be attached here, but I did isolate the functionality to a standalone code snippet and an HTML file to feed to it. Our actual code generates this HTML file from a reporting system that reads the rows from a database and runs it through a report template, and generates this HTML to feed to Aspose. But, we can’t include that here. So, I extracted what HTML it generated and put it in a file.

The standalone code snippet that shows the issue is:

public class MainExcelExport {
    public static void main(String[] args) throws Exception {
        InputStream inputStream = new FileInputStream("excelinput.html");

        try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
            LoadOptions loadOptions = new LoadOptions(LoadFormat.HTML);
            System.err.println("DEBUG: here 1, time = " + new Date());

            // This will be extremely slow from 23.6 onward, is fast in 23.1
            Workbook workbook = new Workbook(inputStream, loadOptions);
            System.err.println("DEBUG: here 2, time = " + new Date());

            // Save it - this is always fast
            workbook.save(outputStream, SaveFormat.XLSX);
            System.err.println("DEBUG: here 3, time = " + new Date());

            // Write it out to XLSX file
            byte[] byteArray = outputStream.toByteArray();
            try (OutputStream fileOutputStream = new FileOutputStream("output.xlsx")) {
                fileOutputStream.write(byteArray);
            }
            System.err.println("DEBUG: here 4, time = " + new Date());
        }
        catch (Exception ex) {
            System.err.println("Error happened: " + ex.getMessage());
        }
    }
}

Our sample input file is:

excelinput.html.zip (24.9 KB)

I also found performance degrades exponentially based on input size. Double the number of rows, and you get quadruple the time it takes to create the Workbook.

Are there any workarounds? And, can you please look into this to consider it as an Aspose defect? If there are things we can do to the HTML stream (other than report less data – can’t do as we have no control over what our customers need to report on) that may be possible workarounds for the issue, please let us know. But, please keep in mind this is generated HTML based on a report template (of which many are out there in the wild out of our control) and the specifics of the customer’s data. So, we’re limited in what options we have for changing what’s generated into this HTML.)

Thanks very much for any help!

@chuckw,

Thanks for the sample HTML file and details.

After an initial test, I am able to reproduce the issue as you mentioned by using your sample HTML file and sample code segment. I found new Workbook() on HTML stream is slow. It takes long time to load the HTML into Workbook object model from streams.

We need to evaluate your issue in details. We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): CELLSJAVA-45902

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

1 Like

@chuckw,

We are pleased to inform you that your issue has been resolved. The fix will be included in an upcoming release (Aspose.Cells v24.4) that we plan to release in the first half of April 2024. You will be notified when the next version is released.

1 Like

Great job, thank you very much!

@chuckw
You are welcome. Once v24.4 is released, we will notify you promptly.

By the way, is there anything we can do before we get the fix to help our customers, with regard to what in the HTML might be activating the issue? If we knew what causes the slow-down (other than lots of data… that we have no control over as customers run this with whatever large amount of data they have), we might be able to recommend or make a change to the report template so that it generates something different. (The report template is a Velocity report… which is basically HTML that inserts dynamic content into it… looks a bit like JSP’s if you’re not familiar with Velocity reports.)

Thanks!

@chuckw,

We will investigate to determine the root cause and provide further details on the issue. If we are able to provide a workaround, it may only be a temporary workaround and not entirely reliable. We will follow up with you soon. However, we recommend waiting for the next release and then trying to use it.

@chuckw ,

There is an extra “<” character in front of “<<style>” in html, which causes subsequent parsing errors.
Please remove the extra " <", it will work fine.
Hope, this helps a bit.

1 Like

That’s it! I edited the original template which had that <<style in there, and now it works great! And without waiting for an upgrade. Thanks very much!

@chuckw,

It is good to know that after removing the extra chars (as suggested), your issue is sorted out now. Please feel free to write us back if you have further queries or comments.

The issues you have found earlier (filed as CELLSJAVA-45902) have been fixed in Aspose.Cells for Java 24.4.