Slow saveToPdf because of fonts not getting cached

In my project we are using Java Aspose.Word(ver. 4.0.1.0) to generate pdf documents. We are using the doc.saveToPdf function.

While investigating performance issues I noticed the fonts where being read far to many times. Generating 12 documents a total of 5.6 MB where read from a single font file sized 303 kb. Having 4 fonts, that where 20 MB of disk activity, which could be kept in a 1MB cache. This might not sound like much, but our production enviroment are extremely sensitive to disk activity, because there are a lot of concurrent processes requesting disc IO.

Please have a look at the process log i've attached. It was created with Process Monitor from SysInternals. It shows how the fonts are read and written in to the pdf-file. This repeats for each of the 12 documents.

I expected to find a caching flag and searched the documentation, and was happy to find pdf.IsTruetypeFontMapCached = true;

Unfortunatly this only seems to apply to Aspose.Pdf, even though the caching issue clearly are present in Aspose.Word. Please look into fixing this.

I hope this will be fixed soon, because it really is a much bigger issue in production enviroments with concurrent access, than in sandboxes or single threaded apps.

Hi Christian,

Thank you for your interest in Aspose.Words and for your detailed speed examination. We will look into including this functionality in one of our future versions. Could you please clarify further as to why this is specifically needed on your side? By which I mean why excatly is your production enviroment extremely sensitive to disk activity? This information may help us better understand the situation and provide a better solution to the issue.

Also please note that the Java version is currently being syncronized with the existing .NET code. Once complete all of the functionality avaliable in .NET will be avaliable in Java. This will most likely have speed increases in the rendering of PDF documents in Java.

Thanks,

We have an app where multiple users interacts with the system, and sometimes generate documents. The interaction spawns a fair amount of disk activity on old fasion(non SSD) harddrives. Since non-SSD harddrives have a latency for accessing files, because they need to move the read head to the correct position, they have very bad scalability when accessing multiple files concurrently. Most modern harddrive have a sequential throughput on 100 MB/s and a random read throughput on less than 1MB/s. The random read scenario happens when the disc needs to access two files on two seperate locations on the disc, and need to move the read head back and forth between them.

So in the worst scenario one user is writing a file in one position of the disk, while another starts generating documents. Suddenly there is a 3 seconds penalty pr. document because it needs to read the fonts each and every time and keeps moving the read head away to service the other users write. In a single user case it took perhaps less than 0.1 second to read the fonts.

Scale up to 3 or 5 concurrent users, and it is an absolute disaster to have unnescecary disk IO.

This is why I always look for what the app I'm analyzing are doing of disk IO, since in my experience this is one of the worst scalability bottle necks.

I very much recommend using SysInternals Process Monitior for this analysis. Its free, and it can log all OS-events for a given Process ID, and afterwards it can do a file-summary, where the activity on specific files is summarized.

Hi Christian,

Thank you for this additional information. We will look into including this in a future version. We will keep you informed of any developments.

Thanks,