PDF document - first save is slow

Hi,

I’m investigating some performance issues saving a PDF document. I tested using a different scenarios as follows:

Scenario #1
A loop where each iteration of the loop

  • Creates an InputStream from a Word file
  • Creates a Document from the InputStream
  • Saves the Document to an OutputStream with default PDFSaveOptions

The first save is substantially slower than subsequent saves. E.g. 1 second for first save versus 100ms for subsequent saves.

Scenario #2

  • Create an empty Document
  • Use a DocumentBuilder to write a simple string to Document
  • Save Document
  • Execute loop described in scenario 1 where I load a Word doc and save to PDF

In this scenario, all 10 iterations of the loop have similar execution times. So creating an initial Document with basic text improves performance for first save in the loop.

Scenario #3
Same as scenario 2 but I didn’t write any text to the initial document. This scenario gave similar results to scenario 1, the first iteration of the loop was considerably slower. So creating an initial Document with basic text does not improve performance for first save in the loop.

I suspect some of this may be explained by class loading. However, is there some other reason why first execution is slow? The delta between scenarios #2 and #3 is interesting. Is this some internal caching taking place on first execution? If so, is there a TTL in play that would see the save time degrade some time later due to lack of invocations?

Any info would be great, thanks.

@personiodm

You are observing a common performance pattern where the first operation in a sequence is slower due to initialization overhead, including class loading, JIT compilation, and warming up internal caches . This explains why the first save in Scenario #1 is significantly slower than subsequent saves.

Creating an initial Document with basic text (Scenario #2) helps because it triggers early initialization of internal components, which reduces the overhead during the actual conversion process . In contrast, when no text is written to the initial document (Scenario #3), the internal components required for PDF conversion remain uninitialized until the conversion begins, leading to the same slow first save as in Scenario #1 .

This behavior is typical in applications that perform heavy initialization on first use, and it’s not related to a TTL (time-to-live) mechanism affecting performance over time. The performance improvement seen in Scenario #2 is due to pre-warming these internal resources.

To summarize:

  1. The first PDF save is slower due to initialization overhead such as class loading, JIT compilation, and internal caching mechanisms that warm up on first use .
  2. Creating an initial Document with basic text helps improve performance for the first save in a loop because it triggers early initialization of internal components, reducing the overhead during subsequent operations .
  3. If no text is written to the initial document, the first save remains slow because the internal components required for PDF conversion are not initialized until the actual conversion process begins .

@personiodm The problem occurs because on the first call Aspose.Words inits static resources, such as fonts, which are then reused on the subsequent calls.

You can create a simple document with some text and save it as PDF, for example, on your application start to force Aspose.Words to init resources to avoid “cold” start on the real requests.

Thanks for the detailed response.

Could you give some more info on the initialisation of the internal components? E.g. does writing some text to the document initial all the internal components or just those components that handle text/font related processing.

If I want to process documents with headers, footers, page breaks, tables etc - would upfront creation & saving of a document with these components give optimal performance for subsequent document?

FYI, I’m using Aspose in a microservice so I’m considering a “warm-up” process when the microservice starts up where I create an initial document to initialise the system. Just wondering what the shape of that document should be.

@personiodm

You are asking about the initialization of internal components in Aspose.Words and whether creating a warm-up document with headers, footers, page breaks, and tables can improve performance for subsequent document processing.

Writing text to a document initializes internal components related to text and font processing, but not necessarily all internal components required for handling headers, footers, page breaks, and tables . Creating a warm-up document with various elements like headers, footers, page breaks, and tables can improve performance for subsequent document processing by pre-initializing more internal components . The performance improvement observed in Scenario #2 (creating an initial document with basic text) is due to early initialization of internal resources, reducing overhead during actual conversions .

For your microservice use case, a warm-up process that creates a document with common elements (headers, footers, tables, etc.) would be beneficial to reduce the latency of the first few document processing tasks after startup .

@personiodm

It would be enough to create a simple document with some simple text and render it to PDF to make Aspose.Words to init the required internal resource and to “warm-up” Aspose.Words for subsequent calls.

1 Like