Hi,
We are looking to redact upwards of thousands of objects on the some documents. I have been trying varying ways to redact with Aspose.PDF and noticed that there are huge memory and execution time increases as more redaction annotations are applied–for example, applying 400 redactions on a very small file takes up ~1.7gb of memory, whereas applying 800 takes 4x as long as applying 400 and takes upwards of 6.8gb of memory. I was wondering if there was a way to redact in an efficient manner? Or is there a way to clean up the document as redactions are applied so the execution time/memory spike doesn’t happen?
I tried re-opening the file as every X redactions and applying new redactions but it seemed like the time and memory spike per redaction was still there.
Repro code and file below (although this is happening on all files we’ve tried so far):
using (var document = new Document("SimpleText.pdf"))
{
var page = document.Pages[1];
for (var j = 0; j < 1_000; j++)
{
var annotation = new RedactionAnnotation(page, new Rectangle(j, j, j + 1, j + 1))
{
FillColor = Color.Black,
Color = Color.Black,
BorderColor = Color.Black
};
page.Annotations.Add(annotation);
annotation.Redact();
}
}
Looped (still has memory spike issue, just slightly more manageable):
using (var document = new Document("SimpleText.pdf"))
{
var page = document.Pages[1];
for (var i = 0; i < 10; i++)
{
for (var j = 0; j < 100; j++)
{
var annotation = new RedactionAnnotation(page, new Rectangle((100 * i) + j, (100 * i) + j, (100 * i) + j + 1, (100 * i) + j + 1))
{
FillColor = Color.Black,
Color = Color.Black,
BorderColor = Color.Black
};
page.Annotations.Add(annotation);
annotation.Redact();
}
}
}
SimpleText.zip (36.3 KB)
On a related note, there also seems to be an issue with disposing of the file properly after redactions have been made. Disposing the Aspose.Pdf.Document
did not completely clear up memory usage within our .NET application.
Thanks!