Hi Aspose -
I’m wondering about memory usage during a Mail Merge. Let’s say I’m creating letters to 500 recipients and I have a Word Template that includes a graphic logo in the header. (The 500 recipients are contained in a DataView.)
When I execute the mail merge, does the resulting Aspose Document reside completely within the memory of the server? In this scenario for me, the resulting Word document is about 12 Mb. In other words, would the entire 12 Mb be occupied in the server’s RAM before the file is written to disk or sent to the browser?
Additionally, if the Word template has a graphic logo in the header of the template, does the graphic get repeated for each iteration of the merge? That would explain why my resulting file (after merging 500 names) is 12 Mb. Yes, I’ve tested without the graphic and the resulting file is about 1.7 Mb.
Do you have any recommendations on how to keep memory usage to a minimum? I’m running Aspose.Words in a shared hosting environment in which my application is allocated 100Mb of RAM, so if a single user consumes 10 or 20Mb, that’s a huge percentage of my available space.
I appreciate any thoughts you may have on this.
Hi Aspose -
In Aspose.Words a complete document is kept in memory, so if you have a large document, it will take more memory.
When you do mail merge or otherwise duplicate content with images in Aspose.Words, the images are not deep-cloned. Therefore it does not matter if you have 500 records or one, in memory it will be only one image.
However, when you save that document to disk, the story is different. The images does get saved 500 times if you have repeated it 500 times. It might sound strange, but it is simply because we have not implemented a mechanism for reusing same images when saving files. We will do that in one of the future versions.
Also note, that a document size on disk is usually smaller than document size in memory. There is no fixed relationship between disk size and memory size, it depends on the complexity of the document, formatting etc, but memory footprint will always be bigger than size on disk. If you have a document 1.7mb on disk and 12mb in memory it could be reasonable. It could be reasonable for some documents at some point in their life (remember garbage collector is always working etc).
The thing with each document taking 10mb in memory sometimes is not necessarily bad, because you often keep the document in memory only for fractions of a second. Open, populate and save and then document goes out of scope and all is collected by gargabe collector.
We have about 500K contacts for some off our larger clients, what happens if they are using a fairly complicated word template and it ran out of memory before finish the job?
Is there something in the API where I can stream to disk as we execute the mail merge process?
Thanks in advance,
Unfortunately, the word documents cannot be generated in a streaming manner, their structure and format essentially forbid it, so the document has to be created in memory all at once. Please note that created a very large documents is generally a bad idea, they will probably cause problems and slowdowns when opennong them in MS Word.
So the best solution is to find a way to generate several lesser documents instead of one monstrous document. Break the generation process in parts somehow - that looks like the best option.
It’s been a little more than a year since I originally posted this thread and I’m wondering if Aspose.Words has changed or has any other solutions for the challenge originally posted.
To refresh your memory, the issue is using a mail merge template that contains graphics and the resulting file size when merging a large data set.
I’m still struggling with this and I’m hoping that either there’s a way now to solve this.
Even if I could create each document separately, and write each to disk… if only I had a way to join all of the files into a single file without having to keep the conglomerated file in memory.
Thanks for updating me on this.
Sorry for delay. I consulted with our developers. And unfortunately I have no good news for you. A mechanism for reusing same images when saving files is still not implemented. Also Aspose.Words still needs about 10 times more memory for building the model than the source document size.
Thanks for the message, Alexey.
So you’re saying that Aspose.Words needs ten times more memory than the original document size (multiplied by the number of records in my data set)? Whoa! That’s a lot! (But understandable.)
For example, say I have a Word template that is 100 Kb and my data set has 200 records. I need to have 200 Mb of RAM available for that operation? (100 Kb x 200 records x 10)
I think I’m gonna need more RAM!
Thanks for clarifying. If I’ve understood correctly, I’ll plan accordingly.
Thanks for your request. I don’t think that you need 200mb of memory to generate the document. The duplicated images are not deep-cloned in memory. So memory usage can be calculated by the following formula:
duplicated_image_size + source_file_size*10 + size_of_new_content
You can test this on your side.