Severe performance degradation HTML to PDF from version 19.8 to 23.10

I am trying to upgrade to version 23.10, but have noticed severe performance degradation.

I have setup a sample project posted on Github:

david-garcia-garcia/aspose-performance-issue (github.com)

The project is a simple web application that runs the benchmark and outputs the result. Change the Aspose.PDF reference to different versions to analyze see the differences.

This image is when using 19.8

image.png (33,1 KB)

And this one is when using 23.10

image.png (22,4 KB)

The images include both the benchmark results + perfmon monitoring of CPU User Time, % of GC time and total PRIVATE bytes.

You can see that the regression is huge, I can identify several issues:

  • The output PDF has 3 pages in 23.10 but only 2 in 19.8
  • The output PDF has increased from 0.43Mb to 1.4Mb (x3)
  • Throughput has halved, same test takes 13s in 19.8 but 27s in 23.10
  • Gargabe collection is crazy in 23.10, this can easily take down applications as it halts all threads during GC.
  • CPU usage is way greater in 23.10, and it extends to the double of time of CPU usage.

I tried switching to intermediate releases to see how it behaved, but I was not able to find anything stable since 2019. Some have memory leaks, some have even crazier CPU and GC issues/consumption. Plus I believe focus should be on the latest release (23.10)

Forgot to mention, in 23.10 while running the load tests/benchmark repeatedly, I was able to more than once get into a “lock” where PDF generation in Aspose was stuck with a constant CPU usage and never ending.

@deivid.garcia.garcia
Thank you for writing to us, I will check and write to you.

@deivid.garcia.garcia
It seems to me that for a more correct comparison it is better to check the operation with the library in isolation - in the console. And eliminate parallelism by running conversions sequentially.
The conversion is carried out directly in

public static void GeneratePdf(string localPath, bool bVertical, string strOutputPath)

right, did I understand?
image.png (59.5 KB)
Are you working on Linux?

@sergei.shibanov yes that is (contact.aspx) where the PDF is generated. Default.aspx is the one that runs the benchmark and outputs the results. This setup is done to reproduce as close as possible a real life scenario where an IIS hosted application is serving end user requests (concurrent).

The level of concurrency in the test is adjustable and should be set to a number that does not exceed the number of CPU in the machine you run the test to avoid saturation of resources.

In any case, just by switching the aspose version of the nuget package you can clearly tell the huge difference.

Setting up a console based scenario without paralelism can be misleading in evaluating the stability and performance of the process (unless aspose is NOT thread safe? ).

I confirm this is running on Windows

@deivid.garcia.garcia
Aspose.Pdf does not guarantee parallel work with the same pdf document. Thanks for the clarification, I will continue to study the issue.

@sergei.shibanov just to clarify, the example test case I built does never work on top of the same PDF file, it creates a new PDF file target for each request:

image.png (6,3 KB)

Plus all the aspose objects are provisioned and disposed on independent threads, and not shared concurrently.

What happens inside the library in terms of threads safety (static, thread static or similar stuff?) I cannot tell.

@deivid.garcia.garcia
I ran a single conversion in the console and reproduced the differences in the size of the resulting document and the number of pages in it for versions 19.8 and 23.10.
On this occasion, I will create a task for the development team and continue to study this issue.

@deivid.garcia.garcia
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-55814

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@deivid.garcia.garcia

With a single conversion in the console, I reproduced the first three points and created a task for the development team based on them.
Next week I will continue with the next two points.

@sergei.shibanov thanks for looking into this.

I believe that points 3, 4 and 5 are closely intertwined and all stem from the same cause. High GC and high CPU as per the graphs seem directly related (GC can consume a lot of CPU).

@deivid.garcia.garcia
Thank you for your comment. I will pass on your opinion when setting the task to the development team.

@deivid.garcia.garcia
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-55875

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@deivid.garcia.garcia
I created a task (PDFNET-55875) for the development team according to points 4 and 5. If there is information on any of these tasks, I will write here.

Thanks for looking into this.