Unexpected CPU hot spot in ZipOutputStream when opening XSLB


#1

Hi,

Using Apose Cells for Java, version 19.8.

Profiling the loading of XLSB files, we observe an unexpected CPU hot spot.
See below a “Flame graph” rendering of the sampled CPU data:
aspose-cells-19.8.png (42.0 KB)

During loading of XLSB, Aspose Cells seems to spend significant time compressing, which is not what we would expect (we would expect decompressing, not compressing).

Can you explain the two observed call paths leading to com.aspose.cells.a.f.zk#write(byte[], int, int) and com.aspose.cells.a.f.zk#a(com.aspose.cells.a.f.zi)?

Thanks in advance,
Taras


#2

@TarasTielkes,
I have tried to reproduce this issue using own sample XLSB file but could not succeed. Please share your sample file and code snippet with us for our testing. We will reproduce the problem and provide our feedback regarding the above calls after analysis.


#3

@ahsaniqbalsidiqui

Please find the test XLSB file attached:
EIOPA_SolvencyII_DPM_Dictionary_2.1.0.zip (395.5 KB)

The performance test simply loads the provided file in a loop.


#4

@TarasTielkes,
I have tried to load it hundred times with below code but does not notice any problem.

for (int i = 1 ; i<=100 ; i++)
{
    Workbook workbook = new Workbook("EIOPA_SolvencyII_DPM_Dictionary_2.1.0.xlsb");
}

Could you share your code snippet for our testing if you face the problem with latest version.


#5

@ahsaniqbalsidiqui your code is sufficient to reproduce the problem.
If you have trouble reproducing the CPU profiling results, simply put a breakpoint in com.aspose.cells.a.f.zk#write(byte[], int, int). It will be repeatedly triggered during loading of the provided file.

I’d like to understand why Aspose Cells is compressing data during loading of XLSB.
The flamegraph from the original post will also provide you the complete call stack, starting from the Workbook constructor.

Kind regards,
Taras


#6

Also note that a similar CPU hotspot does not occur when loading XLSX files, it is specific to the handling of the XLSB format by Aspose Cells.


#7

@TarasTielkes,

As you know, XLSX/XLSB is an archive of many parts of the Workbook. While loading the template file, we do not always parse all those entries. For some un-parsed entries, we need to keep them in memory for being used later on, such as, re-saving the Workbook, or parsing those entries further for other process. However, for memory performance considerations, instead of keeping all data of the original file in memory, we only keep those un-parsed entries by compressing them into one data block.

Thanks for your understanding.


#8

Hi @Amjad_Sahi

From your explanation it would be beneficial if I could express the fact that I only want to read the file, so that it could both skip the compression effort, as well as consume less memory. A very large part of the interactions with the Aspose API are read-only (i.e. only loading data, and not saving files).

I still wonder why this specific behavior is only happening for XLSB, as it has a lot of internal structure similarities to XLSX.

When we tried to optimize the speed of some of our batch processing flows, we expected to get a performance increase from switching from XLSX to XLSB. However, the bottleneck we see in the profiler causes our overall performance to degrade instead when switching from XLSX to XLSB, which is a bit disappointing.

Kind regards,
Taras


#9

@TarasTielkes,

We will evaluate it and get back to you soon.

Yes, that is is strange. Please spare us little time to evaluate it to provide our feedback.


#10

@TarasTielkes,

We need to investigate and evaluate your issue thoroughly. I have logged a ticket with an id “CELLSJAVA-43002” for your issue. We will look into it soon.

Once we have an update on it, we will let you know.


#11

@TarasTielkes,
This is to inform you that we have fixed your issue (logged earlier as “CELLSJAVA-43002”) now. We will soon provide you the fixed version after performing QA and incorporating other enhancements and fixes.


#12

@TarasTielkes,

Please try our latest version/fix: Aspose.Cells for Java v19.8.6 (attached)

Your issue should be fixed in it.

Let us know your feedback.
aspose-cells-19.8.6-java.zip (6.7 MB)


#13

Hi @Amjad_Sahi,

The XLSB parsing performance of 19.8.6 is much better, good work :+1:
In some of our test cases, the performance is close to double of 19.8.0.

That said, in performance profile data of 19.8.6, I still observe a fair amount of CPU being spent on compression. In one of my tests it’s around ~7% now, which is much better than the ~22% observed using 19.8.0.

The remaining call path I see with the profiler is:

main  Runnable CPU usage on sample: 968ms
  java.util.zip.Deflater.deflateBytes(long, byte[], int, int, int) Deflater.java (native)
  java.util.zip.Deflater.deflate(byte[], int, int, int) Deflater.java:444
  java.util.zip.Deflater.deflate(byte[], int, int) Deflater.java:366
  java.util.zip.DeflaterOutputStream.deflate() DeflaterOutputStream.java:251
  java.util.zip.DeflaterOutputStream.write(byte[], int, int) DeflaterOutputStream.java:211
  java.util.zip.ZipOutputStream.write(byte[], int, int) ZipOutputStream.java:331
  com.aspose.cells.a.f.zk.write(byte[], int, int)
  com.aspose.cells.a.c.zab.a(zm, zm)
  com.aspose.cells.zrz.a(HashMap)
  com.aspose.cells.zapn.a(Workbook, LoadOptions, boolean)
  com.aspose.cells.zjp.a(zm)
  com.aspose.cells.zjp.a(String, zm, LoadOptions)
  com.aspose.cells.Workbook.a(String, LoadOptions)
  com.aspose.cells.Workbook.<init>(String)

It would be interesting to know what is the background of this remaining CPU hotspot, and it we can somehow prevent it, for example, by indicating that we are opening the workbook for reading only, and do not require the functionality to save it later.

Kind regards,
Taras


#14

@TarasTielkes,

Good to know that XLSB parsing performance is improved now. I have logged your profiler trace and concerns against your issue into our database. We will evaluate and once we have an update on it, we will let you know.