Using Aspose.Words to save docx template to PDF slows down over time

Hello Aspose Team,
I’d like some technical input concerning my Aspose.Words instances slowing down over time.

What are we doing:
We’re using Aspose.Words to generate a PDF from a .docx template.
This is done in a DotNet 7.0 application, packaged up into a container using Docker, built on a “mcr.microsoft.com/dotnet/aspnet:7.0” base, which is then hosted on an internal Kubernetes cluster for our applications to hit.
We’re using containers and deploying to Kubernetes because of its ability to scale to varying demands.

The .docx templates we’re using contain a bunch of keywords that are replaced with values from our underlying system and then it’s saved to a PDF.

What are we experiencing
We’re experiencing that the doc.Save() method gets slow over time. When we initially deploy the application each save will take around 1 -1.5 seconds but progressively over time it slows down. A few days later it takes 5-6 seconds to save the doc to PDF and then suddenly it will explode and the doc.Save() method will take several minutes.
At a certain point this starts causing the applications calling our API to timeout.

To try and debug this issue we’re currently running a version in our production environment that contains stopwatches to show us which part of the code exactly that’s slowing the whole thing down. It’s very clearly the doc.Save() method that gets increasingly slow over time.

Redeploying the container instantly solves the issue and the Save() method will take the initially 1-1.5 seconds again.

Right now, our workaround is to automatically redeploy the container using a Cron job. This is an unacceptable solution long term, which is why I’m looking for a solution to the issue.

Other things that might be nice to know
The data being input into the template is very uniform in its structure. It always contains names, addresses, blood pressure readings, and a lot of boolean values. Which means the progressively slower saves can’t realistically be attributed to the data being replaced into the template being different between the saved document instances.

Code
I’ve attached a shortened and redacted version of the PDFGenerator code. I hope the redactions don’t make the screenshot completely incomprehensible. The real code contains quite a few more builder subclasses that replace certain sections of the template but these shouldn’t impact performance as it’s the doc.Save() method that gets slow.

@plspao If possible, could you please create a simple console application that will allow us to reproduce the problem on our side? We did not observe such problem earlier. Does the problem occur with some particular document or with any document? Is the problem reproducible if run the process in a simple loop?

Hi Alexey,
Thanks for getting back to me.

We have 2 separate PDFs being generated in this application. They are generated from 2 separate templates but using more or less the same data. Both get real slow over time.

The curious part is that it doesn’t seem to be the application as a whole. It’s just the doc.Save() method.
As i wrote in my original post I’ve placed stopwatches all around the code to find out which part slows down and they show that there’s no slowdown of the application as such. The application gets the data from our database and readies the doc to be saved, just as fast on a slow run as on a fast one - it all takes a consistent 0.5 second.

That said, I’m not convinced it’s necessarily anything to do with Aspose.Words. My prime suspect is the fact that we’re using it in a container. I’ve used Aspose products at a previous employer for a similar use case but on an old-fashioned server and we never experienced any issues back then.

I have this idea that it might be the fact that it’s containerized combined with the fact I need to save and reload the PDF to get it into a stream that might cause trouble. I haven’t been able to show that though.

If you’re comfortable using docker desktop, I’ll be happy to create a small project with the same setup, docker-file, and so on, as we use and copy our code into it.
I obviously can’t guarantee that this will show the same behavior but I think it’s worth a shot - but I think it’s important that it’s run in a container though.

Is there a way I can get this project over to you outside of posting it here on the Forum?
The code or what it does isn’t sensitive as such and we have no issue providing you with it but because of the type of data it handles, we would like to avoid sharing it with everyone.

Thanks again for getting back to me!

All the best
Alex Østergaard

@plspao It is safe to attach files in the forum. Only you as a topic starter and Aspose staff can access the attachments. In addition, I can make the post private, so only you and Aspose staff can see it.

Thanks Alexey,

No worries. The post being open for everyone is just fine. :+1:

Great that the attachements are private though. That makes it a lot easier for me.
Thank you.

I’ll get started on that project for you right away.

1 Like

Hi Alexey,

I’ve now created two projects for you to check out.
Here in my local environment, they are actually exhibiting the same behavior as we see in our production environment.

AposePDFGeneration.zip

AsposePDFConsumer.zip

One is the PDF Generator.
This is mostly code lifted straight from our Production branch with a few tweaks to make it run without all the bloat.
There’s still a lot of our internal datamodels included to make it as close to real world as possible.
This project will startup from Visual Studio and get hosted in Docker Desktop so it’s possible to evaluate whether the containerized use of the Aspose.Words component or maybe the container itself might be to blame.

If you have Docker Desktop on your machine this should work. It runs on localhost port 5000.

The other project is a small console application consumer that just calls the API 10 million times and outputs how long it took to the console, simulating our clients calling our internal PDF generation API.

This is where stuff gets really interesting as I’ve managed to reproduce the behavior seen in our production environment.
When I first started up the API the consumer generating a PDF would take less than 0.5 seconds but on attempt 300 it took 3.5 to 4 seconds.
Restarting the consumer did not help this but restarting the API does.

(btw, sorry for the misspelling of Aspose in one of the projects. When I noticed in was too late to change it)

I really appriciate your time!

All the best
Alex Østergaard

@plspao Thank you for additional information. unfortunately, I do not observe any performance degradation on my side:

Calling API: Attempt 3075
0.4185885
Calling API: Attempt 3076
0.4189594
Calling API: Attempt 3077
0.3878862
Calling API: Attempt 3078
0.6535515
Calling API: Attempt 3079
0.4889502
Calling API: Attempt 3080
0.4438227
Calling API: Attempt 3081
0.4178787
Calling API: Attempt 3082
0.4150751
Calling API: Attempt 3083
0.4155754
Calling API: Attempt 3084
0.4034725
Calling API: Attempt 3085
0.4082595

The test is still running…

From what I see in your code:

  1. You apply the license on each API call, this is not required. It is enough to apply the license once per application domain. For example you can apply the license in static constructor of the class that uses Aspose.Words.

  2. Fonts are also setup on each API call. This is also not required. You can configure FontSettings.DefaultInstance once per application domain. When you do page setup each time on API call more and more duplicated folder font source are added:

public static void FontSetup()
{
    // Here we already have previously added font folders.
    List<FontSourceBase> fontSources = FontSettings.DefaultInstance.GetFontsSources().ToList();

    FolderFontSource folderFontSource = new FolderFontSource("/usr/share/fonts", true);
    fontSources.Add(folderFontSource);

    FontSettings.DefaultInstance.SetFontsSources(fontSources.ToArray());
}

@plspao After some time I really see small performance degradation:

Calling API: Attempt 11221
0.6992437
Calling API: Attempt 11222
0.6753073
Calling API: Attempt 11223
0.676998
Calling API: Attempt 11224
0.7112891
Calling API: Attempt 11225
0.6683882
Calling API: Attempt 11226
0.6660184
Calling API: Attempt 11227
0.6842771
Calling API: Attempt 11228
0.7021709
Calling API: Attempt 11229
0.7049617

I will try applying the above mentioned changes and continue tests tomorrow.

Wow. This is extremely interesting.

I’ve just redownloaded the projects from here and run it fresh on my work laptop, so I know the two of us should be testing with the exact same code - to make sure i don’t have something in my Bin folder or something.
I still experience the same behavior.

I’ll also have a look at the rest of the great input you’ve provided as well. Thank you!

Have a good night.

//Alex

1 Like

@plspao After the above mentioned changes (Moved setting license and font setup into static constructor of DataFactory class):

Calling API: Attempt 1
4.181407
Calling API: Attempt 2
0.5748722
Calling API: Attempt 3
0.4231122
Calling API: Attempt 4
0.2713384
Calling API: Attempt 5
0.2406317
Calling API: Attempt 6
0.3049818
Calling API: Attempt 7
0.2307053
Calling API: Attempt 8
0.235265
Calling API: Attempt 9
0.2939627
Calling API: Attempt 10
0.2447253

Calling API: Attempt 15154
0.4313614
Calling API: Attempt 15155
0.4254233
Calling API: Attempt 15156
0.3945022
Calling API: Attempt 15157
0.3882086
Calling API: Attempt 15158
0.4335504
Calling API: Attempt 15159
0.4470642
Calling API: Attempt 15160
0.3912347
Calling API: Attempt 15161
0.3993739
Calling API: Attempt 15162
0.4396405
Calling API: Attempt 15163
0.3999765

Looks better to me.

1 Like

Hi Alexey,

Since last night I’ve tried several thing and now I have fixed the faulty behaviour on my machine as well.

I tried deploying the packages to Docker Desktop on my personal workstation and suddenly couldn’t reproduce the issue - I essentially had the same experience as you.
So I concluded that the behavior we experienced on our side must be related to the settings of whatever hosts the container - eg. Docker Desktop or Kubernetes - rather than the actual code.

But when I tried the changes you recommended it suddenly worked on my laptop as well.
I moved license setup and font setup into the constructor of the PDFGenerators and then dependency injected the PDFGenerators as singleton instead.
Apart from this being a much nicer solution in general it also meant that the PDF generation sped up significantly and now shows no sign of degrading performance.


(Yes yes, I know I should place an interface in front of these classes but this is just a dummy implementation anyway :sweat_smile:

So, you ended up being right!
The problem was the fontsetup() method being run every time a new PDF was generated.

Takeaways from this
First off, I still can’t explain why this would only show as performance degradation on some environments. :man_shrugging: This weirds me out a little bit.

Secondly, one would humbly suggest that some kind of functionality be added to the FontSettings.DefaultInstance.SetFontsSources() method to prevent it from just adding existing items to the list repeatedly. I can’t be the only one who made this mistake! :face_with_peeking_eye:
Of course, this could cause a performance penalty in itself so one would have to measure which way to handle this would be the better solution.
In any case, this would be a job for whenever there’s two Thusdays in a week.

Thank you so much for your help with this Alexey! I really do appriciate it. I probably wouldn’t have found the issue without your help!

All the best
Alex Østergaard

@plspao It is perfect that you managed to resolve the problem on your side.

The only guess I have is the difference is in the number of fonts available in the specified fonts folder. Aspose.Words needs to read them all, if there are a lot of font, it takes more time. However, as I can see in the Dockerfile, there should not be problems with this.

In your case, actually, it is not required to add FolderFontSource. By default there is only one SystemFontSource in the default font setting instance. By default, Aspose.Words looks for the fonts in all of the following locations: /usr/share/fonts , /usr/local/share/fonts , /usr/X11R6/lib/X11/fonts. So there is no need to add additional FolderFontSource that points to /usr/share/fonts.

Sorry for not getting back to you, Alexey. I got sick and then Christmas happened.

Exactly. As you mention, the Dockerfile installs a bunch of fonts, so whether it’s run on one environment or another, the number of actual fonts installed shouldn’t be different — unless maybe some are unavailable during the container’s build, and this doesn’t result in a hard error.

Yes, that was my understanding as well.
But this line (in which I updated FolderFontSource) was actually added previously to fix an issue where Aspose.Words couldn’t find the fonts after they were installed via the docker file, so all the PDFs came out using the Fanwood font.
This wasn’t acceptable to our business team.

After adding that line, the PDFs came out fine, so I assumed that I had misunderstood something and that setting FolderFontSource was necessary.

@plspao have you tried to check what folders Aspose.Words uses in your environment? You can try using SystemFontSource.GetSystemFontsFolders method.
Also I checked the code and here are folders where Aspose.Words should look for fonts when run in Linux:

public static string[] GetLinuxFontFolders()
{
    return new string[]
    {
        GetLinuxUserFontFolder(), // user's local fonts' directory ~/.fonts
        "/usr/share/fonts", // Ubuntu, openSUSE
        "/usr/local/share/fonts", // Fedora?
        "/usr/X11R6/lib/X11/fonts" // RHEL?
    };
}

Awesome thanks.
I’ll give that a try.

1 Like