'Out Of Memory' Exception on Document.Save with docx and doc

I am using aspose Words for .NET to generate large reports from database data. The reports can be saved in three formats html, docx, and doc.
When calling Document.Save using Html format the report is saved correctly without issue. When using Html format I can successfully produce documents containing 20,000 records which result in a 182MB html file.
However when saving the document in either docx or doc format I run in to the following ‘Out of Memory’ Exceptions as detailed below. Even running 5000+ records in these formats causes these issues. (Expected docx size 3MB, expected doc size 30MB)
The issue is that checking the PC’s Available memory performance counter I see that I have plenty (almost 1.3GB) of free memory when this exceptions occurs. Using a Process ‘virtual memory analysis’ utility I see that the process also has 1GB of free virtual available memory. However due to memory fragmentation the biggest contiguous block is approx 250MB.
Therefore it appears (from examinig the exceptions below) that the problem is due to the fact that aspose is trying to create a very large memorystream which is bigger than the largest contiguous memory block available in the processes virtual memory space, resulting in an ‘Out of Memory’ exception.
Although we were aware that there was an ‘Out of Memory’ issue before chosing Aspose, we beleived that it was due to the fact that we had really run out of memory. We had planned to work around this issue by monitoring memory usage and splitting the files in to multiple documents when memory became an issue. However as explained above it seems we cannot simply predict when this ‘Out of Memory’ exception will occur as it seems to be dependant on the present largest contiguous memory block in the processes virtual memory space.
Therefore answers to any of the following would help us to determine the best way forward with this issue:
Is Aspose.word trying to stream the whole document in one huge memorystream before writing to file in docx/doc format? (Hopefully not) Or does Aspose.words just stream one section/chunk before writing it to file?
Is aspose aware of the above issue, or have any plans to resolve this issue by writing is chunks?
Why can Aspose successfully save such large Html Document without running in to this issues?
Is there any way I can add sections/settings to the document such that this problem will not occur?
Is there a recommended max document size or any better method of predicting when this ‘Out of Memory’ problem will occur?
Thanks in advance
Daniel Finkelstein
Senior Developer - HP Software
Docx Exception:
Message:

Exception of type 'System.OutOfMemoryException' was thrown.
Stack Trace:
at System.IO.MemoryStream.set_Capacity(Int32 value) 
at System.IO.MemoryStream.EnsureCapacity(Int32 value) 
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count) 
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder) 
at System.IO.StreamWriter.Write(Char value) 
at System.Xml.XmlTextWriter.WriteEndAttributeQuote() 
at System.Xml.XmlTextWriter.AutoComplete(Token token) 
at System.Xml.XmlTextWriter.WriteEndAttribute() 
at System.Xml.XmlWriter.WriteAttributeString(String localName, String value) 
at ؓ.⎛.⎪(String ހ, String Ӷ) 
at ⎃.⎂.⇿(String ⎌, String Ӷ) 
at ㍺.§×.VisitParagraphStart(Paragraph para) 
at Aspose.Words.Paragraph.Accept(DocumentVisitor visitor) 
at Aspose.Words.CompositeNode.Ս(DocumentVisitor Վ) 
at Aspose.Words.Tables.Cell.Accept(DocumentVisitor visitor) 
at Aspose.Words.CompositeNode.Ս(DocumentVisitor Վ) 
at Aspose.Words.Tables.Row.Accept(DocumentVisitor visitor) 
at Aspose.Words.CompositeNode.Ս(DocumentVisitor Վ) 
at Aspose.Words.Tables.Table.Accept(DocumentVisitor visitor) 
at Aspose.Words.CompositeNode.Ս(DocumentVisitor Վ) 
at Aspose.Words.Body.Accept(DocumentVisitor visitor) 
at 
㍺.㎓.㎬(CompositeNode ㎭) 
...
at 
㍺.㎍.DoWrite() 
...
at Aspose.Words.Document.ԅ(Stream Ӿ, String Ӽ, SaveFormat Ԇ) 
at Aspose.Words.Document.Save(String fileName, SaveFormat fileFormat)
Doc Exception:
Message:
Exception of type 'System.OutOfMemoryException' was thrown.
Stack Trace:
at System.IO.MemoryStream.set_Capacity(Int32 value) 
at System.IO.MemoryStream.EnsureCapacity(Int32 value) 
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count) 
...
at Aspose.Words.Document.ԅ(Stream Ӿ, String Ӽ, SaveFormat Ԇ) 
at Aspose.Words.Document.Save(String fileName, SaveFormat fileFormat)

Hi

Thanks for your request and detailed description of the problem. Could you please also specify which version of Aspose.Words you use and attach your template here for testing? I will check the issue and provide you more information.
Also, you should note, there is no sense to generate such huge documents because you will not be able to open them in MS Word. MS Word will hung for few minutes to open such huge documents. In addition, I think HTML file of 182 MB size is also useless.
You should use few smaller documents instead of one large.
Best regards.

Please find attached large docx file for testing against. This file was successfully generated in doc format using Aspose (I got lucky with memory fragmentation) and I then converted to docx using word.
I have never tried to open this document using aspose but am sure that trying to call Document.save in docx format is likely to reproduce the problems.
I would agree with you that documents of this size are not really usable. However if our users wants to create such a document (and they often do) then we do not want to stop them.
What is important however is to protect the user from running in to an ‘Out of Memory’ exception and automatically splitting the document in to more managable chunks when we detect that memory problems are likely to occur. The problem is that we cannot (easily) predict when this will occur. It is dependent on the process memory fragmentation and the availability of a contiguous virtual address memory for the memorystream used in ‘Document.Save’. In extreme cases of fragmentation it could occur with a document of only 200-300 pages?
One Additional Question that might help us to find a reoslution.
Is there any method of knowing how large the memorystream required to save the document in docx or doc format will be before calling save?
Thanks

Hi

Thank you for additional information. I managed to reproduce the problem on my side. Your request has been linked to the appropriate issue. You will be notified as soon as it is resolved.
Memory usage depends on document complexity and the destination file format. So it is difficult to tell you how much memory it is necessary to save the particular document. We will deeply investigate the problem on our side and provide you more information.
Best regards.

Please advise if there is any progress on investigating this issue.
We are in the process of purchasing an Enterprise License for Aspose.Words and the ability to deal with this issue is very important for the sucessful implementation of our reporting feature.
It is our hope that reporting with the Aspose component will become the main reporting mechanism within HP Quality Center which is one of HP Software’s - Flagship and most successful business technology products.
Would very much appreciate any information that would help to move this issue forward.
Thanks and Regards
Daniel Finkelstein

Hi Daniel,

Thanks for your request. Unfortunately, at the moment I cannot provide you any additional information regarding this issue. I added your request into my monthly report, this will push the issue up in the priority list. We will let you know once the issue is resolved.
Best regards.

I am also having this issue with particular documents. I will attach one in a new private thread.
The error occurs on this line:
doc.Save(dstStream, SaveFormat.Html)

Hi Rick,

Thanks for your request. I already answered this question in the following thread
https://forum.aspose.com/t/77208
Best regards,

Sorry, I thought I had the latest version. After installing the latest version, the issue I was having is resolved.

Hi Daniel,
I had a look at your documents and how they behave in Aspose.Words.
My Testing
The 5MB DOCX file - if you look inside it has a 111MB document.xml that Aspose.Words needs to read and parse. That is quite a lot of data.
I was able to successfully load the file and save both as DOC and DOCX on a machine with 1Gb of memory and Visual Studio running taking some memory as well.
When the test was running I noticed peak usages of memory by the process to be around 500MB. This is not too bad I think considering the resulting DOC file is 50MB.
The resulting DOCX file is just great and it is as good as your original. So I was not yet able to reproduce the problem you are having.
Your Document
It is hard to give a recommendation about the size of the document Aspose.Words will be able to comfotably process.
Your document has 2,500 pages (which is quite a lot) and it also contains a lot of tables with rich formatting - this all contributes to the size of the document. The tables store lots of border, shading, width and other attributes for all cells. If you had a 10,000 pages document consisting of reasonably simple paragraphs it would not have been a problem processing it maybe.
But as a rule, I would try to keep documents processed by Aspose.Words under 1,000 pages these days. Such documents will process “reasonably quickly” in “reasonable amounts of RAM”. You can of course process large documents if you are prepared to throw more memory and processing time at them.
Why save as HTML Works and save as DOCX or DOC does not
As I mentioned above, I was able to save successfully in all formats in my test. But you probably have made a right point about memory fragmentation and finding a large block of continuous memory.
When Aspose.Words saves as HTML it saves into the provided stream directly. E.g. into a file stream. The memory is only used for housekeeping tasks during save and all garbage collected when the save is completed.
However, when saving to DOCX and DOC (and to RTF) - Aspose.Words uses memory differently.
DOCX is a ZIP file consisting of multiple document parts. Aspose.Words writes each document part into a memory stream and only when writing is complete - compresses them all to the ZIP file. In the case of your document it will require at least 111MB to keep the document.xml stream in memory. Knowing how MemoryStream allocated memory - I think it doubles the size of the byte array every time it needs to throw - it might require as much as 200MB during save. That looks like the symptom you are experiencing.
Saving to DOC is somewhat similar. The DOC file is a Structured Storage document that consists of streams inside it. All streams are at first collected in memory and then written to the structured storage (our own implementation).
Summary
In general, I agree the way Aspose.Words writes DOCX and DOC into streams into memory first should be optimized so they are written to the disk straight away.
But this is not likely to be an easy fix we can do right now. We will need to take some time to research and assess the changes required. For example, I am not sure if the ZIP code we use for DOCX will allow “simultenaous” writing of multiple ZIP streams into a file. E.g. when the document model is being saved - multiple memory streams are created at the same time - one for each document part, but will it be possible to do when writing the streams to the ZIP file straight away I don’t yet know. Maybe only the main document part document.xml will need to be written directly. But what about large images or embedded objects that might be present. We would want to write them directly too. So there is a bit of work.
Therefore I do not want to promise a definite timeframe for fixing. I can only say we will try to address this sometime this year.

Hi Roman
Thanks for the detailed analysis of this problem and breakdown of the issues involved. Your diagnosis of the symptoms and problem we have when saving large Docx and Doc files certainly matches the problems that we are experiencing.
The reason we probably experience these issue even with this size document (5MB Doc File) are because we are generating reports within a thick client application that uses 300MB of memory for normal running.
I woulld like to again stress the importance of this issue for our customers that often run very large reports.
Additionally we have found while running a large report may be successful the first time it is run in the process, it becomes more likely to exhibit this problem the more often it is run in a process. This is particularly problematic as customers become confused as to exactly what size of report is too large for us to handle. Therefore we are struggling to recommend to our cusotmers at what level to set their data filters to ensure that report generation will be successful.
Therefore please advise if you have made any progress with investigating possible fixes, or whether you have completed any design for resolving this issue.
Thanks again
Daniel Finkelstein
HP Quality Center - Senior Software Engineer

Hi

Thanks for your request. Both issues (regarding DOCX and DOC) are still unresolved. Also, unfortunately, there is no timeline specified for these issues in our defect database. So I cannot promise you any date of the fix. My apologizes for inconvenience.
Best regards.

Hi,

Just wanting to follow up whether or not progress has been made on this issue?

We are attempting use Aspose.Words for a complex merge & are experiencing similar out-of-memory exceptions. This is using v9.4 of the components.

Hi Toby,

Thanks for your inquiry. The issues are still unresolved. Both issues are scheduled to be fixed in 10.1.0 version of Aspose.Words, which will come out roughly in 6-8 weeks. We will notify you.
Best regards,

Thanks Alexey.

We have a simple test harness with a sizable word template & data source that we’d be happy to share with you if it would assist with ‘real-life’ examples/volumes for your testing.

Cheers,
–Toby.

Hi Toby,

It would be great if you share your test project. You can attach it here.
Best regards,

Hi Alexey,
I have attached for you the test harness that Toby mentioned and the Template which we are experiencing the out of memory exception with. I have also attached a sample dat file for you.
I have removed our licence file from the test harness before uploading as it says not to send a licence.
If you have any questions about it then let me know.
Chris

Hi Chris,

Thank you for additional information. We will let you know once the issues are resolved.
Best regards,

*AndreyN:
Hi Rick,
Thanks for your request. I already answered this question in the following thread
https://forum.aspose.com/t/77208
Best regards,

Such a very amazing link!
Thanks you for the post.


https://forum.aspose.com/t/77208 watch free movies online

The issues you have found earlier (filed as 15430) have been fixed in this update.