Using .Net framework 4.5 on Windows 7 Visual Studio 2012 and most recent version of Aspose Dlls.
RAM seems to be continuously increasing while extracting text from a very large PDF page by page. Ideally each page would take a constant amount of RAM and the memory profile would be flat or near flat over time. It seems that Aspose.Pdf is holding something internally, leading to reserved memory slowly increasing for each page that is processed. Running the code below we produced this memory profile with ANTS memory profiler 8.6. Note that garbage collection is explicitly fired each time, running the same code using other tools (iText) does produce a flat profile.
Here’s the c# code we used
public static IEnumerable getFileContents(string filePath)
{
if (File.Exists(filePath))
{
Document myDoc = new Document(filePath);
string contents = “”;
//create text device
TextDevice textDevice;
//set text extraction options - set text extraction mode (Raw or Pure)
TextExtractionOptions textExtOptions = new TextExtractionOptions(
TextExtractionOptions.TextFormattingMode.Raw);
foreach (Page pdfPage in myDoc.Pages)
{
using (MemoryStream textStream = new MemoryStream())
{
textDevice = new TextDevice();
textDevice.ExtractionOptions = textExtOptions;
//convert a particular page and save text to the stream
textDevice.Process(pdfPage, textStream);
//close memory stream
textStream.Close();
contents = Encoding.Unicode.GetString(textStream.ToArray());
}
yield return contents;
}
}
}
static void Main(string[] args)
{
//set the license for aspose. Without this it will not fully extract text!
AsposeTotalLicense asposeLic = new AsposeTotalLicense();
asposeLic.ApplyLicensePdf();
int i = 0;
foreach (String s in getFileContents(@“C:\Users\ADS\Desktop\Aspose Tests\Aus.pdf”))
{
Console.WriteLine(i++);
GC.Collect();
}
}
Here is a link to the file in question. It’s so big simply to give the problem long enough to materialize. We are aware that this is a scanned PDF that contains no text, it’s here only to show the increasing trend in RAM use over time.