System.OutOfMemoryException when loading and saving big doc file

Hi Aspose,
When I loaded and saved big word doc http://www.fileshare.ro/e3172439276 using Aspose.Words_17.2.0. It can be opened with microsoft office. I’m aware that this word doc is big and contains a large amount of picture. The loading file takes 800MB of RAM and it threw exception when saving file.
My sample code:

try
{
    Aspose.Words.Document doc = new Aspose.Words.Document(inputDocFile);
    doc.Save(outputDocFile);
}
catch (Exception e)
{}

Thanks you.

Hi there,

Thanks for your inquiry. We introduced an option (SaveOptions.MemoryOptimization) to optimize memory consumption during these scenarios. When its value is set to true it will improve document memory footprint but will add extra time to processing. This optimization is only applied during save operation. Please try this property. Hope this helps you.

If you still face problem, please share your output file format. We will investigate the issue and provide you more information on this.

Thanks for your reply. I tested option MemoryOptimization and “out of memory” exception does not appear. Sample file is mentionted in the previous comment
There 2 other issues.
- 1st issue: Process’s memory is very high (1034MB) and does not drop when i exited the function ConvertTo(). More specifically,
+ 1st ConvertTo: Process’s memory is 1034 MB
+ 2nd ConvertTo: Process’s memory is 1485 MB
+ 3rd ConvertTo: Process’s memory is 1734 MB
+ 4nd ConvertTo: Process’s memory is 734 MB
My question is: Is there a way to release memory manually when ConvertTo finishes instead of GarbageCollector?
- 2nd issue: doc --> pdf tooks more than 1 hours and still not finishes
My question is: Does it has timeout option ?

static void Main(string[] args)
{
    License wordlicense = new License();
    wordlicense.SetLicense("Aspose.Total.lic");
    string strInFile = "F:\big.doc";
    string strOutFile = "F:\outfile.doc";
    for (int i = 0; i < 10; i++)
    {
        ConvertTo(strInFile, strOutFile, "doc");
    }
}
static bool ConvertTo(string strInFile, string strOutFile, string strType)
{
    try
    {
        Aspose.Words.Document doc = new Aspose.Words.Document(strInFile);
        if (strType == "doc")
        {
            WordML2003SaveOptions option = new WordML2003SaveOptions();
            option.MemoryOptimization = true;
            doc.Save(strOutFile + ".doc", option);
        }
        else if (strType == "pdf")
        {
            Aspose.Words.Saving.PdfSaveOptions option = new Aspose.Words.Saving.PdfSaveOptions();
            option.MemoryOptimization = true;
            doc.Save(strOutFile + ".pdf", option);
        }
    }
    catch
    {
        return false;
    }
    return true;
}

Hi there,

Thanks for your inquiry. Please note that performance and memory usage all depend on complexity and size of the documents you are generating. While rendering a document to fixed page formats (e.g. PDF), Aspose.Words needs to build two model in the memory – one for document and the other for rendered document.

In terms of memory, Aspose.Words does not have any limitations. If you’re loading huge Word documents into Aspose.Words’ DOM, more memory would be required. This is because during processing, the document needs to be held wholly in memory. Usually, Aspose.Words needs 10 times more memory than the original document size to build a DOM in the memory.

We’re always working on improving performance; but, rendering will be always running slower than simple saving to flow formats (e.g. doc/docx).

truongminhlong:

My question is: Is there a way to release memory manually when ConvertTo finishes instead of GarbageCollector?

- 2nd issue: doc --> pdf tooks more than 1 hours and still not finishes

When the document is closed, all the DOM data is purged from memory during the next garbage collector cycle.

truongminhlong:

- 2nd issue: doc --> pdf tooks more than 1 hours and still not finishes

My question is: Does it has timeout option ?

Thread.Abort method is unsafe way for interrupt Aspose.Words execution. Aspose.Words has special interruption way using class InterruptionToken. Please use following code examples to achieve your requirement.

Hope this helps you. Please let us know if you have any more queries.

/// 
/// My Aspose document.
/// 
public class MyDocument : Aspose.Words.Document
{
    public MyDocument(string fileName) : base(fileName) { }

    public bool TryToSave(string fileName, int timeout)
    {
        InterruptionToken token = new InterruptionToken();
        bool finished = SaveWithTimeout(token,
        () =>
        {
            token.BindToCurrentThread();
            try
            {
                Save(fileName);
            }
            catch (Exception ex)
            {
                Console.WriteLine("Interrupted");
            }
        }, timeout);
        return finished;
    }

    private bool SaveWithTimeout(InterruptionToken token, ThreadStart threadStart, int timeout)
    {
        Thread workerThread = new Thread(threadStart);
        workerThread.Start();
        bool finished = workerThread.Join(timeout);
        if (!finished)
        {
            token.Interrupt();
        }
        return finished;
    }
}
MyDocument myDoc = new MyDocument(MyDir + "in.docx");
bool done = myDoc.TryToSave(MyDir + "Out.pdf", 1000);
Console.WriteLine(done? "Converted" : "Interrupted by timeout.");

Thanks you for your reply.
“When the document is closed, all the DOM data is purged from memory during the next garbage collector cycle.”

Does it mean that the document is close when this code execute “doc.Save(…);” ?
Because this file’s DOM data took 1000MB in RAM, I want to manually release memory after doc.Save(…) ; Is there anyway to release manually release DOM data?

Hi there,

Thanks for your inquiry. Please note that the memory may not be released until you close the application. You may call GC.Collect method after saving the document. Note that Aspose.Words has some internal static objects which remain live after GC.Collect because these static fields are GC root.

Please let us know if you have any more queries.