Memory consumption issues in Aspose.Words

Hello there,

I am running into some memory problems when running Aspose.Words for .NET (version 9.4) to save a word document (either .DOC or .DOCX) to PDF format.

It seems that for big documents (about 20 MB DOCX) memory consumption, while executing the document.Save method, starts to build up until one of two things happens:

- If I am running the code inside a web application (which is the case of the product we are using Aspose.Word on), after a while I get a System.OutOfMemoryException on my web app and the document does not get converted.

- If I am running the code inside a small Windows Forms test application (it just opens the document and saves as PDF, nothing else), it takes a while but does manage to convert the document, however, during the conversion I look at the task manager to check memory consumption and I notice that the process running aspose.words goes up in memory until it reaches 1 GB, and then stays there, even after the document is converted.

Given these situations what I would like to know is the following:

- Can you advise on what is the best method to dispose of your components in the code, so that we can prevent it from holding memory resources when they are no longer needed ?

- Do you know of any reason why it works on a windows forms app, and not on a web app ? Can it have something to do with IIS (version 6) limiting the amount of memory used by a running web app ? The app is running on IIS on Mixed Mode, I will try in Isolated Mode as well.

- I noticed on other similar threads that you advise against using such large documents, do you have any limit regarding the size of the documents that your component can handle without running into memory problems ?

- Do you plan to correct this memory consumption issue for future releases of aspose.words ? If so, can you give me an estimate on when these issues will be corrected ?

We are using Aspose.Words on a product, and even though we can advise our clients not to use large documents we can’t really guarantee that they won’t, so we need to have a way to prevent such high memory consumption when dealing with large documents.

Thank you for your time,

Joao Maia

Hi

Thank you for your interest in Aspose.Words.

  1. You do not need to dispose Aspose.Words objects. Memory should be released automatically once object is not used. If memory is not released, there might be something wrong, so you should attach your document and code here for testing.

  2. The only reason of such problem I can see is lack of memory. Also if you convert such large document to PDF, there is approximately 16000 pages (if there is mostly text in the document). Aspose.Words renders approximately 10 pages per second, so time to render such document should be 1600 seconds or 27 minutes. This time is too big for WebRequest I suppose. I think you will get RequestTimeout exception.

  3. There is no limit of documents size. The only limit is amount of available memory on your side.

  4. You should attach your document here for testing. We will check the issue and provide you more information.

Now let me explain why Aspose.Words uses more memory than document size. Document after loading into the memory is stored in DOM (Document object Model). If document contains mostly text content, Aspose.Words requires approximately 40 times more memory than the original DOCX document size (10 times more memory than DOC file size). So in your case, if your DOCX document size is 20MB, to load this document you need 800MB of memory. Then when you save document to PDF, Aspose.Words needs to build layout of the document that also stored in the memory. So I think to convert such huge document to PDF you need approximately 2GB of available memory.

By the way, even MS Word does not like such large documents.

Best regards,

Hello Aleksei, and thank you for your reply.

In order for you to try and understand the problem I am attaching the following:

- The document I am trying to convert

- The code (actually the solution)

- A few screen captures of my task manager, before and after running the code.

As you can see the code is very simple, just a page with a button that, when clicked, loads up the document and then converts it into PDF, nothing else.

Regarding what you say in #2, I am not getting any timeouts from the web server. What I am getting is a System.OutOfMemoryException and not RequestTimeoutException.

The 20 MB document is not just text, it has some images as well. Even though I can agree when you say that these documents are large, MS Word opens this document with no problems and I can work on it normally in my machine.

Even so, your product is being used by us on a document management platform, therefolare, we do not know if our clients will be using small or large documents. We need a solution that, within reasonable limits, will work with large or small documents.

Notice the following in the task manager screenshots that I send:

- I am not running any programs on my machine other than Visual Studio 2010 and Internet Explorer (to access the web app), and other services that windows (and my company requires).

- Before I click the button on the web app, the asp.net worker process is occupying only about 30 MB, and the total memory occupied by all applications and services is about 927 MB, which gives me about 2GB available.

- After I click the button I immediately see asp.net worker process occupying more memory and some CPU as well, which is perfectly normal.

- After the exception though, asp.net worker process stops consuming CPU (which is normal because it stopped whatever it was doing) however, the memory occupation remains on about 1GB (TaskManagerAfter_1.jpg).

- Even though the process is occupying 1GB memory, total memory occupation is about 1.98GB, which means I still have 1GB available, therefore I don’t understand why I get the OutOfMemory exception.

I suspected that the memory occupation after the exception could happen because of the exception itself, maybe when the exception was raised, resources were not being freed. However, I tried running the same code on a windows forms application (same simple app, just one button that loads the document and then saves to PDF), and it converts the document with no OutOfMemory exception, however, when the conversion ends, the behavior in terms of memory occupation is exactly the same (about 1GB occupied by the windows forms app).

Any help you can give me will be greatly appreciated.

Regards,

Joao Maia

Hi

Thank you for additional information. I cannot reproduce OutOfMemoryException on my side. But I see that memory is not released for some reason. Actually memory is released i.e. is not used, but garbage collector does not collect garbage for some reason. You can easily fix this by calling GC.Collect() method after converting your document:

private static void ConvertDocument()
{
    Document doc = new Document(@"C:\Temp\BigDocument.docx");
    doc.Save(@"C:\Temp\out.pdf");
    GC.Collect();
}

Hope this helps.

Best regards,

Hello there,

Did you try with the solution I sent you or did you use your own solution ?

One thing I noticed with the solution I sent is that when running the web app from within visual studio 2010 (either by pressing F5, or by selecting View in Browser, I don’t get the Out of Memory Exception. I believe it is because the application is not running under IIS, but rather under the Web Server that comes with visual studio (you can easily see this because the process in Task Manager is not aspnet_wp.exe, but rather WebDev.WebServer20.exe).

However, when I publish the application to a web server (in this case localhost), and then run it from that web server (http://localhost/AsposeTestWeb), that’s when I get the Out of Memory Exception. It seems that IIS is somehow limiting the amount of used memory, and somehow this causes the exception.

Can you please check if you also have this behavior ?

Thank you for your time and best regards,

Joao Maia

Hi

Thank you for additional information. I tried both scenarios, i.e. running the application in Visual Studio and running application in IIS. In both case I was able to convert your document to PDF without any problems.

I have 4GB of RAM on my test PC, I run test in Windows 7 64bit.

Best regards,

Hi,

we have a similar issue with memory consumption of aspose words in IIS.
We use aspose words to scan word documents for hyperlinks and media items and parse them with our own hyperlinks/media.
Some of our users get out of memory exceptions when saving their word document through our web product, the documents that fail are around 30 MB in size.
We see that the application pool then uses around 1 - 1.5GB of memory to save the document.
So if 2 users do this we already need 3 GB of memory to be free on the webserver!?

What we would like to see is that not the whole document is loaded but only the parts of the DOM that we call (hyperlinks/media in our case).

I hope you can find a solution for this problem

Regards Guy

Hi

Thanks for your request. Unfortunately, currently there is no way to process such large documents using Aspose.Words without utilizing much memory. 30MB document is simply too large.

Best regards,

Does Aspose.Words need to load the whole document into memory before any operation, such as saving to Text is possible, ie is there not a way to stream a DOC/DOCX file in and stream the response out?

Any other tips on converting very large documents to text?

Thanks.

Hi Dave,

Thanks for your inquiry.

I’m afraid when you load a document into Aspose.Words it loads it into memory so it can be stored in the DOM in order for use. I’m afraid there is no way to stream a document from one format to another without loading it into the DOM.

Thanks,

We’re encountering a similar issue.

  1. If Windows memory paging is increased, is it posible to solve the issue ? Any changes in code to accomodate this scenario ?

  2. Can the DOM structures be created on disk (as an option for large files) ? Automatic assignment of space for DOM structures - memory or disk, as a function of document size - should be the ideal solution.