We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

HIgh memory usage on pdf document object

Hello. We are using aspose.pdf in our product and using it to search with regex certain patterns.
My IDE is claiming that there is a big memory allocations for pdf document object.
The code example which is shows that:

using var pdfDocument = new Pdf.Document(inputStream);
var patterns = input.Texts.Aggregate((acc, next) => $"({acc})" + "|" + $"({next})");

var textFragmentAbsorber =
    new TextFragmentAbsorber(new Regex($"{patterns}", RegexOptions.IgnoreCase));

pdfDocument.Pages.Accept(textFragmentAbsorber);

…Process result

And IDE shows me that 18425 mb allocated on this line pdfDocument.Pages.Accept(textFragmentAbsorber);

The test document if 140 pages document with size of 1.6 mb.

What can we do with this? I think this may lead to problem when there will be high amount of users using that. We are in development process and going to production soon. Don’t want to face into the issue with memory.

@grinaypps

The memory usage depends upon the document size as well as complexity of its structure. It is quite possible that a small size PDF may have complex structure and a lot of elements on single page that big memory allocations are required during its processing. Make sure to use x64 mode of debugging while working with the larger and complex documents.

Also, you can break the text absorbing on page level instead of doing it for whole document at a time. This way the memory consumption will be low and code will produce results quicker. For example,

foreach(Page page in pdfDocument.Pages)
{
 var textFragmentAbsorber = new TextFragmentAbsorber(new Regex($"{patterns}", RegexOptions.IgnoreCase));
 page.Accept(textFragmentAbsorber);
// do some stuff
}