Performance regression for LoadDataFilterOptions.STRUCTURE

Hi,

Using Aspose Cells for Java, version 19.12.2.

There are many places in our code, where we only want to retrieve a small subset of (meta)data about a given Excel workbook. Sometimes we only need to know the names of the sheets present in the workbook (LoadDataFilterOptions.STRUCTURE), other times only the document properties (LoadDataFilterOptions.DOCUMENT_PROPERTIES) or macro content (LoadDataFilterOptions.VBA).

While evaluating the upgrade from 19.9 to 19.12.2, we encountered what seems to be a performance regression on a use case that is quite important to us: retrieving only the names of the sheets present in the workbook.

Sample code:

System.out.println("CellsHelper.getVersion() = " + CellsHelper.getVersion());
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
Files.copy(Paths.get("input.xlsx"), byteArrayOutputStream);
byte[] bytes = byteArrayOutputStream.toByteArray();

int iterations = 60;
long dummy = 0;
long start = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
    LoadOptions loadOptions = new LoadOptions();
    LoadFilter loadFilter = new LoadFilter(LoadDataFilterOptions.STRUCTURE);
    loadOptions.setLoadFilter(loadFilter);
    Workbook workbook = new Workbook(new ByteArrayInputStream(bytes), loadOptions);
    dummy += workbook.getWorksheets().getCount();
    System.out.print(".");
}
System.out.println();
long finish = System.currentTimeMillis();
long delta = finish - start;
double average = delta / (double) iterations;

System.out.println("delta: " + delta + ", iterations: " + iterations + ", average: " + average);
System.out.println("dummy = " + dummy);

As you can see, the code above is a benchmark that times how long it takes, on average, to retrieve the names of the sheets present in a given workbook.

Using Aspose Cells for Java, version 19.9, the output is:

CellsHelper.getVersion() = 19.9.0
............................................................
delta: 22705, iterations: 60, average: 378.4166666666667
dummy = 9000

Using Aspose Cells for Java, version 19.12.2, the output is:

CellsHelper.getVersion() = 19.12.2
............................................................
delta: 154796, iterations: 60, average: 2579.9333333333334
dummy = 9000

Both runs are on the same machine, using Java 8u202, using the JVM options “-server -Xms3g -Xmx3g”.

As you can see, the performance of 19.12.2 is almost 7x worse compared to 19.9. This is unexpected, and is a blocker for us to migrate to the newer version of Aspose Cells for Java.

The sample file used in the benchmark above is attached.

Kind regards,
Taras

input.zip (6.1 MB)

@TarasTielkes,

Thanks for the template file and sample code.

After an initial test, I am able to reproduce the issue as you mentioned by using your sample code with your template file. I noticed approx. similar results when evaluating the code segment with v19.9.x and latest version/fix. I have logged a ticket with an id “CELLSJAVA-43079” for your issue. We will look into it soon.

Once we have an update on it, we will let you know.

@TarasTielkes,
We have investigated it more. Please check release notes of 19.12 on the topic of “Public API and Backwards Incompatible Changes”. “Adds LoadOptions.KeepUnparsedData option”. For your case, please add the code:


loadOptions.setKeepUnparsedData(false);
Workbook workbook = …