Aspose.Words Memory Issue

Similar to this post…
https://forum.aspose.com/t/112850

We found that using the latest version of the components fixed extreme memory issues when using the PDFKit.

However, in addition to .pdf files we’re also using Aspose.Words to generate thumbnails from Word documents uploaded to our Document Management system. Unfortunately, despite using the latest components (aspose.words v9.7.0.0, dated 07 Feb 2011) we’re finding similar problems with memory not being released - its not quite as bad as with .pdf files but still present nonetheless.

Can this be looked at/logged as a defect?

Hi
Thanks for your request. Could you please attach a sample document that causes the problem and your code here for testing? We will check the issue and provide you more information.
Best regards,

Hi, thank you for your reply. Here’s a bit more info. We use a very similar approach to extract thumbnails for word documents as we do for .pdfs and this works ok - but there is still a memory leak (see below)
For Word documents, I have provided a simplied version of our code… a graphic is extracted from a word document, that is then uploaded to our Document Management database… this works fine, but the memory is not being released. Unfortunately, I cannot supply a test document as they are company confidential but the memory increase occurs with any word document we try (.doc or .docx)…
We are confident objects, streams etc are being closed and disposed of correctly… because of the behaviour we saw with PDFKit before your fixes to it - i.e nothing we tried released the memory until we tried your updated .dll and then it worked straight away… We believe a similar thing is happening here with your words component, although the memory leaks aren’t as big they still exist

// Main Calling class...
public class UploadThumbnail
{
    public void Execute(string FileName)
    {
        Bitmap bmp;
        AsposeThumbnailCreatorWord thumbnailCreator = new AsposeThumbnailCreatorWord();
        bmp = thumbnailCreator.CreateThumbnail(FileName);
        UploadThumbnailToStore(bmp);
        bmp.Dispose();
        IDisposable disposableThumbnailCreator = thumbnailCreator as IDisposable;
        if (disposableThumbnailCreator != null)
        {
            disposableThumbnailCreator.Dispose();
        }
    }
    public void UploadThumbnailToStore(Bitmap bmp)
    {
        // Do upload to SQL Server filestream based db
    }
}
public class AsposeThumbnailCreatorWord: IDisposable
{
    public System.Drawing.Bitmap CreateThumbNail(string filePath)
    {
        // Do licensing stuff.. edited for brevity..

        // Get document
        Aspose.Words.Document doc = new Aspose.Words.Document(filePath);
        try
        {
            // create memory stream
            using(MemoryStream ms = new MemoryStream())
            {
                Aspose.Words.Saving.ImageSaveOptions options = new Aspose.Words.Saving.ImageSaveOptions(SaveFormat.Jpeg);
                options.JpegQuality = 50;
                // save image as jpeg in memory stream
                doc.Save(ms, options);
                Bitmap bmp = new Bitmap(ms);
                return bmp;
            }
        }
        catch (Exception exc)
        {
            throw new Exception(exc.Message);
        }
    }
}

Hi
Thank you for additional information. But, unfortunately, I cannot reproduce the problem on my side. I used the following code for testing:

[Test]
public void Test001()
    {
        TestMemoryUsage(1000);
    }
    [Test]
public void Test002()
    {
        TestMemoryUsage(2000);
    }
    [Test]
public void Test003()
    {
        TestMemoryUsage(3000);
    }
    [Test]
public void Test004()
    {
        TestMemoryUsage(4000);
    }
    [Test]
public void Test005()
    {
        TestMemoryUsage(5000);
    }
    [Test]
public void Test006()
{
    TestMemoryUsage(6000);
}
private void TestMemoryUsage(int iterations)
{
    Console.WriteLine("Iterations = {0}", iterations);
    DateTime start = DateTime.Now;
    // Measure starting point memory use
    long memoryStart = System.GC.GetTotalMemory(true);
    for (int i = 0; i <iterations; i++)
    {
        Document doc = new Document(@"Test001\in.doc");
        using(MemoryStream ms = new MemoryStream())
        {
            ImageSaveOptions options = new ImageSaveOptions(SaveFormat.Jpeg);
            options.JpegQuality = 50;
            // save image as jpeg in memory stream
            doc.Save(ms, options);
            using(Bitmap bmp = new Bitmap(ms))
            {
                // Do somethong with bitmap
            }
        }
    }
    // Obtain measurements after creating the Dpcument.
    long memoryEnd = System.GC.GetTotalMemory(true);
    Console.WriteLine("Memory used: {0} bytes = {1} MB", memoryEnd - memoryStart, (memoryEnd - memoryStart) / (1024 * 1024));
    Console.WriteLine("Test time: {0} seconds", (DateTime.Now - start).TotalSeconds);
}

Theoretically, if memory is not released memory usage should be linearly increased depending of number of iterations, just like time. But as you can see from the results memory usage is not increased. Here is my results:

Tests.Test001 : Passed
Iterations = 1000
Memory used: 3321456 bytes = 3 MB
Test time: 106,4540888 seconds
Tests.Test002 : Passed
Iterations = 2000
Memory used: 1353356 bytes = 1 MB
Test time: 211,1390765 seconds
Tests.Test003 : Passed
Iterations = 3000
Memory used: 1353560 bytes = 1 MB
Test time: 316,4280987 seconds
Tests.Test004 : Passed
Iterations = 4000
Memory used: 1353548 bytes = 1 MB
Test time: 425,304326 seconds
Tests.Test005 : Passed
Iterations = 5000
Memory used: 1353548 bytes = 1 MB
Test time: 529,9013086 seconds
Tests.Test006 : Passed
Iterations = 6000
Memory used: 1353548 bytes = 1 MB
Test time: 633,969261 seconds

My test document contains text images and tables.
Best regards,

Hi, and thank you for your detailed response. I have done some more testing based on your information above and incoporated the memory usage in diagnostic trace files that our Document Management service generates…
Using word documents between 1.5Mb - 3Mb, the “before” & “after” values reported by using
system.GC.GetTotalMemory(true) do indeed show that memory is being released after calling our word thumbnail generation code…
I compared these values alongside a simple check of the mem usage using task manager and initially the results are very positive… :slight_smile: For each test, the Document Service was restarted to get it back to a default state…

*** 1st Test!!
1st Upload…

Document Service before upload of 1.5Mb file… Task man shows approx 29Mb
Document Service after upload of 1.5Mb file… Task man shows approx 49Mb (we do expect some increases here for numerous reasons)
2nd Upload…

Document Service before upload of 1.5Mb file… Task man shows approx 49Mb
Document Service after upload of 1.5Mb file… Task man shows approx 49Mb
So far so good… :slight_smile:
3nd Upload…

Document Service before upload of 1.5Mb file… Task man shows approx 49Mb
Document Service after upload of 1.5Mb file… Task man shows approx 49Mb
Excellent! However… using larger word files… Not typical, but definitely a possible requirement of our customers…
*** 2nd Test!!
1st Upload…

Document Service before upload of 105Mb file… Task man shows approx 29Mb
Mem usage then MASSIVELY increases to above 1GB!! Before finally dropping to approx 62Mb

I think we can live with the huge memory spike if the memory is definitely released, however unlike the 1st Test, even though there was a memory release the overall footprint has increased…
Also, I tested another 105Mb file that was NOT a word document, ensuring the thumbnail creator was not accessed… this was to help determine what other memory usage our Document Service introduced (it utilises SQL Server filestreaming), and check if we had introduced any leaks…
*** 3rd Test!!
1st Upload…

Document Service before upload of 105Mb file… Task man shows approx 29Mb
Document Service after upload of 105Mb file… Task man shows approx 36Mb
Document Service before upload of 105Mb file… Task man shows approx 36Mb
Document Service after upload of 105Mb file… Task man shows approx 36Mb
This means that when we are not using the thumbnail creator (Aspose.Words) we are not getting any memory leakage other than the initial increase we expect… My tests also seem to suggest that there is a possible problem in dealing with large Word documents, though I don’t know at what sizes your product encounters problems…
Hope this helps

Hi
Thank you for additional information. Maybe the problem is neither in Aspose.Words nor in your code. Maybe garbage Collector just does not collect disposed objects. So some disposed objects are still in memory. Have you tried calling GC.Collect() method after converting your document.
Also, the problem might be in the document itself. So it would be great if you can share this document for testing.
Best regards,

Ok, I will try calling GC.Collect explicitly, and re-test with the large document to see if there’s any effect.
As for getting you the 105mb document for further testing, what do you suggest? Zipped up it is still 66mb…

Hi
Please let me know the result of testing.
You can share your file on Rapidshare or on any other file sharing hosting and provide me a link where I can download it. Or you can split archive into parts and attach these parts here in the forum.
Best regards,

Calling GC.Collect() at the appropriate time after thumbnail creation did not have any effect.
During operation, using the 105Mb word doc, the memory usage still increaed to over 1Gb before coming back down to approx 65Mb (from an initial value of 29Mb).
Rapidshare and most other file sharing services are blocked here, however we do have access to Dropbox. Do you have an email address that I can send a “shared folder” invite to? I can send you a link to the file in my public dropbox folder, but would prefer a share (its a little more “secure”) :wink:

Hi
Thank you for additional information. Please send an invitation.
Waiting for your inputs.
Best regards.

Hi,
Have created a folder share, invite sent to that email address and uploaded a zipped copy of the large word document for testing
Regards

Hi
Thank you for additional information. But still on my side memory is released once the process of conversion is finished.
Here are new results of the tests on my side.

Iterations = 1
Memory used: 965383944 bytes = 920 MB
Test time: 150,5456107 seconds
Tests.Test002 : Passed
Iterations = 2
Memory used: 962729396 bytes = 918 MB
Test time: 287,7664593 seconds
Tests.Test003 : Passed
Iterations = 3
Memory used: 962729732 bytes = 918 MB
Test time: 425,2083205 seconds
Tests.Test004 : Passed
Iterations = 4
Memory used: 962729684 bytes = 918 MB
Test time: 565,9263691 seconds
Tests.Test005 : Passed
Iterations = 5
Memory used: 962729720 bytes = 918 MB
Test time: 704,813313 seconds
Tests.Test006 : Passed
Iterations = 6
Memory used: 962729732 bytes = 918 MB
Test time: 850,3566376 seconds

As I can see memory usage is not increased here. 918Mb, I suppose, is size of the loaded document in memory (usually Aspose.Words needs 10 times more memory to load the document than the original document size). It seems GC did not have time to release this memory that is why you see it used after conversion. But in task manager all memory is released.
I am really sorry for inconvenience. But maybe you can create a simple application that will allow me to reproduce the problem on my side. Thank you for cooperation.
Best regards,

Hi Alexey,
Thanks for taking a look at this… I will see if I can create a simple application at this end to send to you… as I probably said, the architecture that we’re using the Aspose compnents within is a bit complicated to distribute (connect to our custom Windows Service via WCF, this service is responsible for streaming file into SQL Server 2008 filestream, generate thumbnail from source file (using Aspose), then stream that into SQL Server 2008 filestream)
To recap… The massive (temporary) memory spike is a worry, but now you’ve explained it that helps us to understand… but its the fact that not enough memory appears to be getting released (when viewed in Task Manager) after uploading large files which is a possible customer requirement… resulting in a gradual memory leak during daily usage…
… We do expect a small overhead after the first upload but then we expect it to stabilise - which is definitely what we see when using your PDF.Kit component… :slight_smile:
So, from before…
“1st Upload…
Document Service before upload of 105Mb file… Task man shows approx 29Mb
Mem usage then MASSIVELY increases to above 1GB!! Before finally dropping to approx 62Mb…”
…in the above case, lets say we always expect a first time rise to approx 36Mb… somewhere along the line 27Mb isn’t getting released - at least as far as figures reported in Task Manager are concerned.
When bypassing thumbnail creation for such a large file entirely… memory rises to 36Mb and stays there…

Hi Marc,
Thank you for additional information. I will wait for a simple application.
In mean time, for testing purposes, could you please try converting few large documents one by one? Will the amount of used memory increase or not? Will you reach a OutOfMemoryException?
Best regards,