PDF export of PNG images has terrible performance (in time and memory)

We’re using Aspose.Words for Java (14.9.0) to create PDF and DOCX with lots of images.

A typical case would be a document with 100 pages, each page containing two ~200KB PNG images. That’s a total of 40MB of images, and the final document’s size is about 40MB, for both PDF and DOCX.

We add images to the document using DocumentBuilder.insertImage(String filename).

Creating the document is cheap. (68MB of heap, 1 second).
Exporting to DOCX is cheap. (135MB of heap, 3 seconds).
Exporting to PDF is nuts. (1.8GB of heap, 68 seconds).

I’ve attached an image of the heap usage, and the single-class example program which reproduces this problem. MS Word and Adobe both work very smoothly on these documents, so I don’t think they’re super crazy. Any advice?

Hi Edgar,

Thanks for your inquiry. Please note that performance and memory usage all depend on complexity
and size of the documents you are generating. While rendering a document to fixed page formats (e.g. PDF), Aspose.Words needs to build two model in the memory – one for document and the other for rendered document.

The process of building layout model is
not linear; it may take a minute to render one page and may take a few
seconds to render 100 pages. Also, Aspose.Words has to create APS (Aspose Page Specification)
model in memory and this may again eat some more time for some
documents. Rest assured, we’re always working on improving performance;
but, rendering will be always running slower than simple saving to flow
formats (e.g doc/docx).

I have logged this performance issue as WORDSNET-11008 in our issue tracking system. We will update you via this forum thread once this issue is resolved.

We apologize for your inconvenience.

I agree that performance is a difficult problem, so I put a lot of effort into making this test application so that you can easily profile our case. I also made sure to make this test application adaptable, so that you can profile other cases for other kinds of user as well. If I had sourcecode, this is exactly the application I would use to debug this performance issue. In fact, I used this application to try different image layout strategies to see if I could optimize it at my end, but Aspose was consistently extremely slow.

It’s fairly common to use PDF as a “portfolio” of sorts to hold a collection of images. This use-case is well-suited to a streaming approach - there aren’t any complex page layout constraints.

For reference, Word is able to export the DOCX file to PDF in 5 seconds (12x faster) with no noticeable change in memory consumption. Requiring 1.8GB of heap to generate a 40MB document represents a memory overhead of 40x.

I’m sure that if an engineer runs this test application in a debugger, he/she will be able to spot low-hanging fruit fairly quickly. Can you let me know when you’ve had a chance to run the test application? I think it will make it easy for you to make progress on this problem for us and for other users who are interested in using your software for graphical report generation.

Btw, we’re very happy with the quality of your product and your support, and don’t mean to rush you! I just want to make sure that this test application doesn’t slip through the cracks, as I believe it is likely to be a helpful testbench.

Hi Edgar,

Thanks for sharing the detail. I would
like to share with you that
issues are addressed and resolved based on first come first serve
basis. Currently, your issue is pending for analysis and is in the
queue. We will update you via this forum thread once there is any update available on this issue.

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

Understood. Thanks for your time.

Just tested with Aspose.Words for Java 14.10.0, same problem, possibly a little slower. I noticed that the issue label is WORDSNET, but I am using the Java version. Perhaps a typo?

Hi Edgar,

Thanks for your inquiry. Please
note that the latest version of Aspose.Words for Java is completely
auto-ported from .NET, i.e. we do not write code for Aspose.Words for
Java; it is generated out automatically from C# code of Aspose.Words for
.NET. So there should not be any significant difference in
functionalities between Java and .NET versions because the code is
mostly the same.

As a workaround of this issue, we suggest you please use following PdfSaveOptions. Hope this helps you.

PdfSaveOptions options = new PdfSaveOptions();
options.getDownsampleOptions().setDownsampleImages(false);
options.setImageCompression(PdfImageCompression.JPEG);
options.setJpegQuality(80);

Thanks Tahir. Your JPEG suggestion improves the memory performance considerably, but the time performance is still very bad. Additionally, our images happen to be big white diagrams, with single-pixel black lines drawn all over them. It’s a worst-case for JPEG and a best-case for PNG.

With JPEG: 56 seconds, 800 MB (looks like it’s just limited by GC frequency, could probably run in less).
Without JPEG: 65 seconds, 1.8GB (same as before)

Hi Edgar,

Thanks for your feedback. We will update you via this forum thread once this issue is resolved. Thanks for your patience.

Hi Edgar,

Further to my last, the very best way is to use source images in Jpeg format instead of Png. In this case memory usage and speed when saving to Pdf will be almost the same as for saving to Docx. Hope this helps.

The issues you have found earlier (filed as WORDSNET-11008) have been fixed in this Aspose.Words for .NET 23.10 update also available on NuGet.