Question: Is the .net 8+ Aspose DLL better at memory management than .net 4.8?

Hi,

Having had previous issues with large document production when using a 32bit EXE with the aspose.words.net DLL, often seeing memory spike to massive levels causing out of memory issues, I wondered if we utilized the .net 8+ version of the aspose.words.dll, does it make use of the multitude of improvements in performance, memory optimization and garbage collection?

Do you have any example benchmarks?

@GMiddleton I am afraid we do not have such benchmarks. However, as far as I know .NET8+ has better memory management than .NET4.8.
In addition, for reducing memory usage upon processing extremally large documents, you can try using LoadOptions.TempFolder, SaveOptions.TempFolder and SaveOptions.MemoryOptimization properties.

1 Like

Thanks, I’ll definitely try those options out.
I’ll produce my own benchmarks, but I can understand it’s difficult as its 100% dependent on document content which is infinitely variable.
However, I don’t think our issue was with loading or saving, but with building the document in memory.
Do these settings help with that or are there other settings that can help use a temporary file cache while building the document in memory?

@GMiddleton Yes, you are absolutely right, memory consumption depends on many factors, such as input document format, it’s complexity and content. The following article might be useful for you:
https://docs.aspose.com/words/net/memory-requirements/

I have done some experimentation, in particular with the impact of .net 4.8 vs .net 8 aspose, especially with respect to insertHTML (using a multiple copies of the same html table with inline styles and images), UpdatePageLayout and saving.
In many cases in my tests, the output document ran for 1600 pages.

Conclusions so far:

Inserting html via builder
Speed and Memory of Insertion of html appears to be linear in nature, although some exceptional results were noticed.

Update Page Layout
In both .net 4.8 and .net 8 calling this function can suddenly increase memory by gigabytes.

Example 1: .net 4.8 - private memory was 523mb before calling UpdatePageLayout, then after calling UpdatePageLayout, it increased by 995mb

Example 2: .net 8 - private memory was 187mb before calling Update Page Layout, then after calling UpdatePageLayout, it increased by 1709mb

Saving
I didn’t use the memory optimized save, but in general, without that, saving can add upto 300mb to the memory pressure. In my earlier tests I used much larger HTML inserts, and on many occasions I gave up waiting for Save to finish (10 minutes+), so in my test i trimmed the HTML down to a maximum of 3,000 lines/800kb, and inserted hundreds.

General
Speed is improved slightly if html is simpler, and doesn’t include images.
.net 8 did show marginal improvements in speed and some improvements in early garbage collection. But the spike in UpdatePageLayout being considerably worse in .net 8 was a worry.

@GMiddleton Thank you for sharing your testing results with us. If possible, could you please also share your test project? We will test with your data on our side too.

Building document layout is quite resource consuming operation. So it is expected that it eats memory upon executing UpdatePageLayout. In addition, the process is not linear, so the bigger the document is the more memory is required. For example, suppose, for 10 pages document is required 10MB of memory to build document layout, for 20 page it might be required 100MB of memory.

examples.zip (117.6 KB)

Please find enclosed the projects I used, test data and the results.
I have removed the licence file, of course, so you’ll have to hook that up.

The code runs through different permutations of size of HTML, complexity of HTML and if images are used, and the number of iterations.

The figures are sometimes dubious in the .net 8 example because of more aggressive GC (i’m quite happy about that).
But of particular concern is the test line below, which balloons 1.7gb in .net8 vs 1gb in .net 4.8.

new TestSpec(){ numberOfTablesToInsert=400,sizeOfTableToUse=2,stripHTMLToMinimum=false,includeImages=true },

I’ve also noticed that UpdateFields has similar tendency for adding bloat.

Question: When is it absolutely necessary to call UpdateFields or UpdatePageLayout?

I’ve just created with Aspose, a test document with
PAGE of NUMPAGES in the footer
and a bookmark reference to the top.
save as docx

It appears I didn’t need to update fields or the layout - are these only needed if the caller saves the document then tries to do more? as per Document.UpdatePageLayout | Aspose.Words for .NET

I’m wondering if I should skip calling these functions at all, or under what circumstances I really must call them (I’m never loading and modifying an existing document, always creating, but I might use a template doc).

@GMiddleton Thank you for additional information. Some field in MS Word documents are updated automatically. For example PAGE and NUMPAGES field in the document’s header/footer are updated automatically by MS Word

Calling UpdateFields might be required when you insert fields into the document from scratch. For example if you insert TOC field. Also, UpdateFields might be required if IF condition field is used and the condition was updated.

Calling UpdatePageLayout is not required if the output is saved to flow document formats, such as DOCX, DOC, RTF etc. It might be required when document is saved to fixed page formats, such as PDF. building document layout is quite resource consuming operation, that is why after performing this operation once Aspose.Words caches the layout. If after building document layout the document is modified, the changes will not be reflected in PDF document, because it will be built from cached document layout. In this case it is required to call UpdatePageLayout.

If you do not actively modify fields in your document and do not work with document layout information it is not required to call these two method at all.