Hi,
I am testing the performance of the Aspose.HTML - converting HTML pages to PDF. Is it true that the evaluation license only allows the conversion of 1 document at a time? It seems that multithreading does not increase the performance, it seems that it takes 2-4 seconds per document no matter how many threads are used.
@matthias77,
We recommend you to kindly evaluate Aspose.HTML for .NET API with full capacity and without the evaluation version limitations with a test (temporary) license. You can request a 30-day Temporary License. Please refer to How to get a Temporary License?.
I am also moving your thread to respective forum where one of our fellow colleagues from Aspose.HTML team will assist you soon.
The license I have used for that test is a temporary license. The first test looked promising, hence my department ordered a license (in process at our side). I continued with a full load test and there discovered that it did not scale as expected. I assume that there is a lock somewhere, that does not allow me to convert documents in several threads simultaneously. I just want to have that confirmed as a limitation on the temp-license and/or on the full-licensed version.
@matthias77
The process of parsing HTML documents occurs in accordance with the official specification HTML Standard. It describes which processes should occur in parallel and which sequentially. Therefore, the impact of multithreading depends on the content of the document. For example, loading images, parsing CSS styles and processing some scripts occur in parallel.
Also, the conversion of documents after the first one is faster due to the lack of initialization of static classes. Furthermore, you can please share your sample code snippet and sample files with us for the investigation and we will log a ticket in our issue tracking system to further analyze it.
This issue seems to be a deal breaker, hence I investigated it further. We have run a comparison on a virtual machine with 12 CPU’s and the parallel version was sometimes much much slower.
Then I run the same comparison on a normal barebone windows machine with 32 cores and the improvement against the single thread version is about 5% not more.
Comparison.7z (2.6 KB)
The program takes an option “-s” and then will run each task in sequence. Then each task goes fast. Without “-s”, I am starting as many tasks as there are files and then wait for them. The other argument is the directory that contains the HTML files to be converted. In parallel, it takes almost as much time and each task is running quite slow. One would expect that with 32 cores, it would be at least 1/20 of the time. Running the same program in 3 different powershells shows no slowdown. Hence, there must be a limitation within the process like a shared lock or alike.
@matthias77
We have opened the following new ticket(s) in our internal issue tracking system to investigate your case.
Issue ID(s): HTMLNET-4919
We will look into it and let you know as soon as the ticket is resolved. Please be patient and spare us some time.
@matthias77
Could you please provide examples of the documents used in the tests so that we can more specifically investigate the problem with multithreading?
I am sorry, I cannot attach the html documents, since they contain sensitiv information. The documents were HTML5 documents (with header and similar tags), a longer CSS in the same document, and about 2-4 pages of structured text.
I am sure, that the problem can be reproduced with similar documents. There were no embedded pictures or other objects.
@matthias77
Thanks for the feedback. We will try to investigate using same structure of the HTML documents as you shared and let you know in case we have further updates.
I just discovered that the HtmlDocument needs a Dispose call, otherwise resources are not free-ed and the number of threads raises beyond reasonable (that gave it away). But still, it needs 2 seconds per simple document.
@matthias77
Could you provide an example of CSS contained in an html page?
If this is not possible, then the number of lines in this CSS.