I have a 30MB PDF file with 50 pages, which makes Aspose.Pdf.Document to consume a whopping 21.4 GB of memory. memoryUsage.png (91.1 KB)
I’ve prepared a very simple project, which loads the document, iterates over the pages getting Contents.Count (i.e. no editing), then waits for a key-press (so that I could capture the memory usage at the end). Please note that you need to add a license file yourself, because the evaluation version limits the number of pages it can access.
(I’ll provide a link to it soon in my next post)
I’ve tried a few approaches to reduce the memory usage:
Calling the GC
Issuing a Document.FreeMemory()
“Incremental saving” with Document.Save()
These function calls were made every page. To my surprise, none of them had any positive effect, and Document.Save() seemingly closed the underlying stream, so it even breaks the program.
My questions would be the following:
1.) What is the reason for this unproportionally large memory usage? The PDF file doesn’t seem to be unusually large, which makes the whole thing annoying.
2.) What can I do to make the processing of such files feasible on machines with less resources (say, “just” 16GB of RAM)?
3.) How the “incremental save” feature should be used?
It is quite difficult to answer such questions because CPU performance and memory usage all depend on complexity and size of the documents you are loading/generating.
When PDF document is closed, all the DOM data is purged from memory during the next garbage collector cycle. Please note that the memory may not be released until you close the application.
I’m using the latest version (22.2), so these symptoms apply to that (and older versions too).
Have you checked the project, especially the PDF file I sent? Maybe there is something with it, that explains the memory usage, and could lead to a solution.
Do you have any suggestions for questions 2.) and 3.)?
Unfortunately, the input PDF file is corrupted. We cannot open it in Adobe reader and Aspose.PDF also does not import it. Please check the attached image. image.png (28.1 KB)
The 16GB memory is even enough to process the documents. However, one should make sure to use x64 mode of debug while processing large files. Also, a simple or small size document may contain complex structure and a lot of elements in it that leads to large memory consumption because the API loads all required resources in the memory while processing it.
The incremental save approach is recommended during PDF generation process. You can use it to gradually build/create a document by adding content and other objects into it. In case of existing PDF document, it overwrites the file and it does close all the opened resources.
That’s strange. Have you checked the checksum of the merged zip file? Me and my coworker could open it without problems after downloading, merging and unzipping the project.
Since this forum limits the uploaded file’s size to 5MB (the error message says 50000kb though), I had to split the zip up.
Anyway, now you can also access the merged version here, but only for the next 7 days:
The checksum of the zip should be the same. The PDF within has the following SHA-256 sum:
4e3570fcebb8bfa93fbda084b3e9bcbed32eb85f3bdd9fd98d2ce78aa78c6a88