We use Aspose.Cells, Words and Slides to convert Office documents to PDFs. A typical batch might contain tens of thousands of files. We notice that as a batch is being processed, its memory usage steadily increases until the point where we start getting OutOfMemoryException errors.
I have written a small application to demonstrate the problem (the source is attached to this post). There are three batches of Word, Excel and PowerPoint files, each with 10 files (9 batches of files in total). When you click a button, the corresponding batch of 10 files is converted to PDF, and saved in the system temp folder.
After each batch is processed, a garbage collection is run, then the resulting working set size for the application is displayed, along with the relative increase/decrease in the working set compared to its size prior to the batch being processed.
You will notice that each time a new batch is run, the working set size increases by a few MB. If you re-run a batch that has already been run, then the working set size stays roughly the same.
For example, when I run the first batch of Word files, the working set increases 23MB to 62MB. When I run the second batch, it increases to 64MB. The third batch increases it to 72MB. If I re-run the first, second or third batch, it stays at 72MB.
I would expect that when a batch of files is converted to PDF, the working set size should be roughly the same before and after the batch is run. It seems like the Aspose libraries are retaining data about files that they have processed.
Am I missing something obvious, like Dispose() methods that I should be calling? Is there any way to purge data that is stored about previously processed documents?
Thanks
Btw, you will need to place an Aspose.Total license file in the output directory in order to run the application.
I'm representing Aspose.Slides,
I've observed Memory leak issue in case of Aspose.Slides and requested our development team to share their thoughts regarding memory leak issue and as soon as I receive some response, I will share that with you.
I've also executed multiple runs and able to see that in first run it shows substantial memory leak but in subsequent runs I've noticed a minute memory leak.
We are sorry for your inconvenience,
Hi Reuben,
long newWorkingSet = Environment.WorkingSet;<o:p></o:p>
static class MemoryHelper<o:p></o:p>
{<o:p></o:p>
[DllImport(“psapi.dll”)]<o:p></o:p>
static extern int EmptyWorkingSet(IntPtr hwProc);
public static void ClearMemory()<o:p></o:p>
{<o:p></o:p>
try<o:p></o:p>
{<o:p></o:p>
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
EmptyWorkingSet(Process.GetCurrentProcess().Handle);<o:p></o:p>
}<o:p></o:p>
catch<o:p></o:p>
{<o:p></o:p>
}<o:p></o:p>
}<o:p></o:p>
}<o:p></o:p>
hassan.farrukh:Hi Reuben,
I'm representing Aspose.Slides,
I've observed Memory leak issue in case of Aspose.Slides and requested our development team to share their thoughts regarding memory leak issue and as soon as I receive some response, I will share that with you.
I've also executed multiple runs and able to see that in first run it shows substantial memory leak but in subsequent runs I've noticed a minute memory leak.
We are sorry for your inconvenience,
Hi Hassan,
There seems to be a memory leak when processing new files that haven't been processed before. If you reprocess files that have already been processed in the same session, then there is little or no memory leak.
imran.rafique:Hi Reuben,Thanks for the query. First off, please note that EmptyWorkSet function removes as many pages as possible from the working set of the specified process. For more details please visit Microsoft documentation here:Moreover, please follow up the code snippet as workaround and let us know how it goes on your side?....I hope, this will help.
Hi Imran,
Thanks for the code snippet.
It was my mistake to measure the working set size, because the working set only includes the physical memory (RAM) being used by the process. Calling EmptyWorkingSet does indeed reduce the working set to about 1MB, because it causes almost all the memory to be swapped out to the swap file. This isn't of much benefit though, as the memory is later swapped into RAM again when it is accessed the next time a file is processed. All it's really achieving is a lot of unnecessary swapping to and from disk.
Private memory is a better measurement to use, as it gives a better idea of the total amount of memory that the process is using (physical memory + paged memory in the swap file).
I have added a private memory counter to the application (updated source file attached). You'll see that the private memory keeps going up with each new document that is converted to PDF, despite calling GC.Collect() and EmptyWorkingSet(). Reprocessing documents that have already been converted to PDF in the same session causes the memory usage to remain more or less unchanged.
Hi Reuben,
In the next couple of weeks, we are looking at releasing the first version of our product to use the Aspose libraries. We are seeing instability when converting datasets in the order of 30,000 documents to PDF, which we believe is due to the memory issues I have raised. Real world datasets may be larger than this.
We would appreciate an update on how long you think it will take for fixes to be implemented.
Thanks
Hi Reuben,
Hi Adam
Hi Phil,
<span style=“font-size:
10.0pt;font-family:“Courier New”;color:blue;mso-no-proof:yes”>for<span style=“font-size:10.0pt;font-family:“Courier New”;mso-no-proof:yes”> (int i = 1; i <= 3; i++)<o:p></o:p>
{
TestMem(i);
GC.Collect();
GC.WaitForPendingFinalizers();
Console.WriteLine("Press enter to continue...");
Console.ReadLine();
}
private static void TestMem(int i)
{
string sourceDirectoryPath = @"C:\AsposeMemoryLeak\AsposeMemoryLeakTest\TestData\Word " + i.ToString();
string[] filePaths = Directory.GetFiles(sourceDirectoryPath);
foreach (String filePath in filePaths)
{
if (filePath.EndsWith(".pdf"))
continue;
Document doc = new Document(filePath);
PdfSaveOptions pdfSaveOptions = new PdfSaveOptions();
doc.Save(filePath + ".pdf", pdfSaveOptions);
pdfSaveOptions = null;
doc = null;
}
}
Hi Adam,
I ran the code you posted, and here are my results (values taken from Task Manager):
<!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-AU</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:DontVertAlignCellWithSp/>
<w:DontBreakConstrainedForcedTables/>
<w:DontVertAlignInTxbx/>
<w:Word11KerningPairs/>
<w:CachedColBalance/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val="–"/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–><!–[if gte mso 10]>
<![endif]–>
Stage |
Working Set |
Private Working Set |
Commit Size |
Before processing |
34,340 K |
14,668 K |
32,848 K |
After processing Word 1 |
59,796 K |
34,108 K |
52,704 K |
After processing Word 2 |
61,284 K |
35,464 K |
54,020 K |
After processing Word 3 |
69,716 K |
42,538 K |
61,080 K |
You can see the memory usage increases after each set of 10 documents is converted to PDF, despite forcing a garbage collection each time.
Thanks,
Reuben
Hi Reuben,
Hi Phil,
The issues you have found earlier (filed as WORDSNET-6341) have been fixed in this .NET update and this Java update.
This message was posted using Notification2Forum from Downloads module by aspose.notifier.
Hi Adam,
I see that the ticket <!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-AU</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:DontVertAlignCellWithSp/>
<w:DontBreakConstrainedForcedTables/>
<w:DontVertAlignInTxbx/>
<w:Word11KerningPairs/>
<w:CachedColBalance/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val="–"/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–><!–[if gte mso 10]>
<![endif]–><span style=“font-size:12.0pt;font-family:“Times New Roman”,“serif”;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-ansi-language:
EN-AU;mso-fareast-language:EN-AU;mso-bidi-language:AR-SA”>WORDSNET-6341 was referenced in the release notes for Aspose.Words 11.5. Were any changes implemented, or was the issue just investigated?
Thanks,
Reuben
Hi Reuben,
Hi Reuben,
Hi Adam,
Thanks for the suggestion.
After converting 100 Word documents to PDF, resetting the fonts after each conversion, there was 67MB of private memory in use, compared to 78MB in the same scenario without the resets.
It did slow things down a fair bit though. I guess I shouldn’t call it after each conversion
Cheers,
Reuben