Memory leak when converting Office docs to PDF

edtsoftware · April 24, 2012, 7:53am

We use Aspose.Cells, Words and Slides to convert Office documents to PDFs. A typical batch might contain tens of thousands of files. We notice that as a batch is being processed, its memory usage steadily increases until the point where we start getting OutOfMemoryException errors.

I have written a small application to demonstrate the problem (the source is attached to this post). There are three batches of Word, Excel and PowerPoint files, each with 10 files (9 batches of files in total). When you click a button, the corresponding batch of 10 files is converted to PDF, and saved in the system temp folder.

After each batch is processed, a garbage collection is run, then the resulting working set size for the application is displayed, along with the relative increase/decrease in the working set compared to its size prior to the batch being processed.

You will notice that each time a new batch is run, the working set size increases by a few MB. If you re-run a batch that has already been run, then the working set size stays roughly the same.

For example, when I run the first batch of Word files, the working set increases 23MB to 62MB. When I run the second batch, it increases to 64MB. The third batch increases it to 72MB. If I re-run the first, second or third batch, it stays at 72MB.

I would expect that when a batch of files is converted to PDF, the working set size should be roughly the same before and after the batch is run. It seems like the Aspose libraries are retaining data about files that they have processed.

Am I missing something obvious, like Dispose() methods that I should be calling? Is there any way to purge data that is stored about previously processed documents?

Thanks

edtsoftware · April 24, 2012, 7:55am

Btw, you will need to place an Aspose.Total license file in the output directory in order to run the application.

imran.rafique · April 25, 2012, 9:18am

Hi Reuben,

Thanks for your inquiry and sorry for the delayed response. We've started working over your query and will get back to you soon.

hassan.farrukh · April 25, 2012, 12:49pm

Hi Reuben,

I'm representing Aspose.Slides,

I've observed Memory leak issue in case of Aspose.Slides and requested our development team to share their thoughts regarding memory leak issue and as soon as I receive some response, I will share that with you.

I've also executed multiple runs and able to see that in first run it shows substantial memory leak but in subsequent runs I've noticed a minute memory leak.

We are sorry for your inconvenience,

imran.rafique · April 25, 2012, 1:36pm

Hi Reuben,

Thanks for the query. First off, please note that the EmptyWorkSet function removes as many pages as possible from the working set of the specified process. For more details please visit the Microsoft documentation.

Moreover, please follow the code snippet as a workaround and let us know how it goes on your side?

MemoryHelper.ClearMemory();

long newWorkingSet = Environment.WorkingSet;

static class MemoryHelper
{
    [DllImport("psapi.dll")]
    static extern int EmptyWorkingSet(IntPtr hwProc);

    public static void ClearMemory()
    {
        try
        {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            EmptyWorkingSet(Process.GetCurrentProcess().Handle);
        }
        catch
        {
        }
    }
}

I hope this will help.

edtsoftware · April 26, 2012, 2:27am

hassan.farrukh:

Hi Reuben,

I'm representing Aspose.Slides,

I've observed Memory leak issue in case of Aspose.Slides and requested our development team to share their thoughts regarding memory leak issue and as soon as I receive some response, I will share that with you.

I've also executed multiple runs and able to see that in first run it shows substantial memory leak but in subsequent runs I've noticed a minute memory leak.

We are sorry for your inconvenience,

Hi Hassan,

There seems to be a memory leak when processing new files that haven't been processed before. If you reprocess files that have already been processed in the same session, then there is little or no memory leak.

edtsoftware · April 26, 2012, 9:04am

imran.rafique: Hi Reuben, Thanks for the query. First off, please note that EmptyWorkSet function removes as many pages as possible from the working set of the specified process. For more details please visit Microsoft documentation.

Moreover, please follow up the code snippet as workaround and let us know how it goes on your side.

I hope, this will help.

Hi Imran,

Thanks for the code snippet. It was my mistake to measure the working set size, because the working set only includes the physical memory (RAM) being used by the process. Calling EmptyWorkingSet does indeed reduce the working set to about 1MB, because it causes almost all the memory to be swapped out to the swap file.

This isn’t of much benefit though, as the memory is later swapped into RAM again when it is accessed the next time a file is processed. All it’s really achieving is a lot of unnecessary swapping to and from disk.

Private memory is a better measurement to use, as it gives a better idea of the total amount of memory that the process is using (physical memory + paged memory in the swap file).

I have added a private memory counter to the application (updated source file attached).
You’ll see that the private memory keeps going up with each new document that is converted to PDF,
despite calling GC.Collect() and EmptyWorkingSet(). Reprocessing documents that have already been converted to PDF in the same session causes the memory usage to remain more or less unchanged.

adam.skelton · April 27, 2012, 6:21am

Hi Reuben,

Thanks for your inquiry.

Our developer for Aspose.Words is currently looking into this issue. We will provide you some feedback as soon as it is available.

Thanks,

edtsoftware · May 2, 2012, 12:07am

In the next couple of weeks, we are looking at releasing the first version of our product to use the Aspose libraries. We are seeing instability when converting datasets in the order of 30,000 documents to PDF, which we believe is due to the memory issues I have raised. Real world datasets may be larger than this.

We would appreciate an update on how long you think it will take for fixes to be implemented.

Thanks

adam.skelton · May 4, 2012, 9:32am

Hi Reuben,

Thanks for your inquiry. We will inform you as soon as the developer has found the root of the issue.

Thanks,

phaselden · May 16, 2012, 9:52pm

Hi Adam

Any idea when there will be more information (a fix, schedule, or workaround etc) on this report? It seems like a pretty critical issue.

Cheers,

Phil

adam.skelton · May 21, 2012, 4:14am

Hi Phil,

Thank you for your patience.

The developer has taken a look into your issue and unfortunately was unable to locate any memory links in the Aspose.Words rendering engine.

Could you please run some tests on your side and see if you can reproduce the issue you are having with the simple code below:

for (int i = 1; i <= 3; i++)
{
    TestMem(i);
    GC.Collect();
    GC.WaitForPendingFinalizers();
    Console.WriteLine("Press enter to continue...");
    Console.ReadLine();
}

private static void TestMem(int i)
{
    string sourceDirectoryPath = @"C:\AsposeMemoryLeak\AsposeMemoryLeakTest\TestData\Word" + i.ToString();
    string[] filePaths = Directory.GetFiles(sourceDirectoryPath);
    foreach (String filePath in filePaths)
    {
        if (filePath.EndsWith(".pdf"))
            continue;
        Document doc = new Document(filePath);
        PdfSaveOptions pdfSaveOptions = new PdfSaveOptions();
        doc.Save(filePath + ".pdf", pdfSaveOptions);
        pdfSaveOptions = null;
        doc = null;
    }
}

Thanks,

edtsoftware · May 30, 2012, 3:36am

Hi Adam,

I ran the code you posted, and here are my results (values taken from Task Manager):

<!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-AU</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:DontVertAlignCellWithSp/>
<w:DontBreakConstrainedForcedTables/>
<w:DontVertAlignInTxbx/>
<w:Word11KerningPairs/>
<w:CachedColBalance/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val="–"/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–><!–[if gte mso 10]>

/* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;} table.MsoTableGrid {mso-style-name:"Table Grid"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-priority:59; mso-style-unhide:no; border:solid windowtext 1.0pt; mso-border-alt:solid windowtext .5pt; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-border-insideh:.5pt solid windowtext; mso-border-insidev:.5pt solid windowtext; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi; mso-fareast-language:EN-US;}

<![endif]–>

Stage	Working Set	Private Working Set	Commit Size
Before processing	34,340 K	14,668 K	32,848 K
After processing Word 1	59,796 K	34,108 K	52,704 K
After processing Word 2	61,284 K	35,464 K	54,020 K
After processing Word 3	69,716 K	42,538 K	61,080 K

You can see the memory usage increases after each set of 10 documents is converted to PDF, despite forcing a garbage collection each time.

Thanks,

Reuben

adam.skelton · June 5, 2012, 6:17am

Hi Reuben,

Thanks for this additional information. I will do some more testing and have a chat to the developers again and get back to you.

Thanks,

adam.skelton · June 15, 2012, 1:15am

Hi Phil,

Thanks for waiting. The developer has investigated this issue again and has ruled out any memory leak within the rendering process. We have narrowed down the private data never being released to the font cache system.

When a document is rendered the fonts are loaded into memory and used during rendering. These are cached in memory to improve performance. We found if you left a conversion process long enough, with many unique documents then the font cache becomes very large and this may lead to an OutOfMemory exception.

This therefore means it is not a bug, it’s the expected behavior after so many documents are converted. There are two options I can suggest to solve this:

Run Aspose.Words on a separate application domain and “restart” this system when memory usage becomes high.

If that does not suffice, I can log a new issue for the development team to introduce a member to clear the font cache. However note that this may not be implemented straight away.

Please let us know your thoughts.

Thanks,

aspose.notifier · July 2, 2012, 12:00am

The issues you have found earlier (filed as WORDSNET-6341) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

edtsoftware · July 2, 2012, 7:18am

Hi Adam,

I see that the ticket <!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-AU</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:DontVertAlignCellWithSp/>
<w:DontBreakConstrainedForcedTables/>
<w:DontVertAlignInTxbx/>
<w:Word11KerningPairs/>
<w:CachedColBalance/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val="–"/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–><!–[if gte mso 10]>

/* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;}

<![endif]–><span style=“font-size:12.0pt;font-family:“Times New Roman”,“serif”;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-ansi-language:
EN-AU;mso-fareast-language:EN-AU;mso-bidi-language:AR-SA”>WORDSNET-6341 was referenced in the release notes for Aspose.Words 11.5. Were any changes implemented, or was the issue just investigated?

Thanks,

Reuben

adam.skelton · July 2, 2012, 7:22pm

Hi Reuben,

Thanks for your inquiry. Yes, the developer did make his analysis, however once complete we closed the issue as not a bug. Please see the following post from earlier on which explains all about this.

Please let us know how this sounds.

Thanks,

adam.skelton · July 29, 2012, 5:59am

Hi Reuben,

I happened to notice a particular remark on one of the FontSettings methods which may be useful to you.

The FontSettings.SetFontsSources member contains the remark “Setting this property resets the cache of all previously loaded fonts.”. This may be useful to you as you could use this method to reset the font cache instead of creating a separate applicaiton domain. Something like this would probably work:

FontSettings.SetFontsSources(FontSettings.GetFontsSources());

Please let me know if this helps.

Thanks,

edtsoftware · July 29, 2012, 8:21pm

Hi Adam,

Thanks for the suggestion.

After converting 100 Word documents to PDF, resetting the fonts after each conversion, there was 67MB of private memory in use, compared to 78MB in the same scenario without the resets.

It did slow things down a fair bit though. I guess I shouldn’t call it after each conversion

Cheers,

Reuben