Performance Issue WIth Aspose.Pdf version 17.10

Hi,

I wanted to upgrade our .NET project to use the latest Aspose.Pdf version (17.10) from current referenced version 11.7 due to performance issue. But version 11.7 perform way better that the latest version in the code that converts an HTML to a TIFF image (using Aspose.Pdf.Document is way slower and memory sink than using deprecated Aspose.Pdf.Generator.Pdf)

I have Attached aSolutionAndImages.zip (396.8 KB)
copy of a small test project that show case memory and CPU usage of both versions with diagnotstic tool snapshots, along with my approach of using the sapose api to acheive my goal.

Also I didn`t succeed in producing the exact margins with the new version as with the old version. Please check generated images: Printed_v17_10.tiff (new version) and Printed_11_7.tiff (old version)

Please advise me on the matter.

Thanks,

@khaled2,

We have tested your source HTML document with the latest version 17.11 of Aspose.Pdf for .NET API and it takes 10 seconds to run your entire scenario. You can directly convert an HTML document to TIF format and do not need to first convert into the PDF format. This is the output TIF: Printed_v17_11.zip (138.3 KB). Please download and try the latest version 17.11 and let us know how that goes into your environment.

The recent version 17.11 of Aspose.Pdf for .NET API imports and exports documents in more real way than the old version 11.7. Did you compare the results in any other way?

Bro, you haven’t answer any of my questions. Would you ask me to go use version 17.11 if I had asked my question 2 days ago ?

If version 17.11 takes 10 seconds as you said, then it is worse than v11.7 and v17.10! And what about the memory consumption?. After all I have tested 17.11 as well and it has the same issue as 17.10. How come a new version uses 6 seconds with 100 mo of memory to process one page, while the old version uses 3 seconds and 13 mo of memory. I thought I have made something wrong. If you have looked at my code you would have noticed that I have another method that uses HtmlFragment instead of passing an HtmlLoadOptions as in the running method. I also have tried using Aspose.Html library as well and it has the same issue. Could you please point me to some code snippet that can do the job without throwing me an outofmemoryexception after 10 seconds. or if I should stick to v11.7 which itself is bad and thats why I wanted to upgrade.

Regarding the margins, in the old code it uses mergins on the section, we also can find in line 55: section.PageInfo.Margin.Outer = 0. How can I apply the same margins using the new structure of document? Are outer margin removed in newer versions?

@khaled2,

The export of HTML document to PDF is taking 7-8 seconds and consumes memory up to 200 MB in our environment. We have logged an investigation under the ticket ID PDFNET-43664 in our bug tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates. Please note, the performance of old version 11.7.0 of Aspose.Pdf for .NET API is same with the slight difference in our environment.

There is no Section element in the new DOM (Document Object Model) approach, as the structure of a PDF file is hierarchical, Aspose.Pdf for .NET API also accesses the elements in the same way. Based on the new DOM structure, there are no outer and inner properties in the MarginInfo class and we use a Page instance instead of the Section instance. In order to better understand the difference between the new DOM approach and old legacy approach, please refer to these help topics:

  1. Introduction to the DOM API
  2. Introduction to DOM (legacy)

Hi,

Thank you for your answer. The diagram showing the DOM document structure is interesting, but it doesn`t show the type names, i.e. which classes are abstracts and are concrete and their names. Is there any class diagram for the document structure?

Also when converting an HTML to pdf, what is the recommended approach; using HtmlFragment or using HtmlLoadOptions?

Another question, you have mentioned that I can convert from html to Tiff directly without passing through pdf, where can I find a code snippet that demo this capability.

Last comment: We started experiencing performance issue with aspose library since we changed target platform to x64. Do you guys produce separate binaries for x32, x64?

@khaled2,

All classes which come under the Aspose.Pdf namespace are based on the new DOM approach. Please refer to code examples: Working with Aspose.Pdf. HtmlLoadOptions class allows to import an HTML document into Aspose.Pdf API as follows:

[C#]

// for external resources, send base path parameter to constructor
HtmlLoadOptions loadOptions = new HtmlLoadOptions();
// load HTML document
Document document = new Document("path/to/myHtml.html", loadOptions);

In order to import an HTML document into TIFF, please refer to this code example: Convert PDF Pages to TIFF Image

We do not release separate libraries, we build a single library in 32-bit and it works perfect on 64 bit operating systems. If you are facing a performance issue on 64-bit operating systems, then please share all details of the use case as well as the local environment details. We will investigate and share our findings with you.