Hi,
when extracting text from attached pdf using the following code memory allocation of e.g. a 64bit test console is increasing up to 12GB and extraction is never finished.
static void ExtractTextFromPDF()
{
Aspose.Pdf.License lic = new Aspose.Pdf.License();
lic.SetLicense("Aspose.Total.lic");
Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(@"c:\@@tmp\Layers_1.pdf");
TextAbsorber textAbsorber = new TextAbsorber();
textAbsorber.ExtractionOptions.FormattingMode = TextExtractionOptions.TextFormattingMode.Raw;
textAbsorber.Visit(pdf);
// Just the same....
//pdf.Pages.Accept(textAbsorber);
Console.WriteLine(textAbsorber.Text);
}
static void Main(string[] args)
{
try
{
ExtractTextFromPDF();
}
catch (Exception ex)
{
Console.WriteLine(ex);
Console.WriteLine("Workingset: {0}", System.Diagnostics.Process.GetCurrentProcess().WorkingSet64);
}
Console.ReadLine();
}
Aspose_Memory.png (24.0 KB)
Another issue:
I am not able to access layers of the document using Page.Layers property. Page.Layers is always null.
Had this behaviour for all documents I checked.
foreach (Aspose.Pdf.Page page in pdf.Pages)
{
if (page.Layers != null) //page.Layers is always null!!
{
// do something
}
}
I was not able to upload example PDF (22MB).
Best Regards
Chris2Stein