We are using Aspose.Pdf in .NET to replace sensitive text in pdf. We want to replace with ### all the regular expressions found in the all the pages in the pdf.
If Pdf file is small and it has few words to find, then everything is fine. But if the pdf is huge (with a lot of pages and multiple regular expressions to replace), it takes too much time in replacing it and it also consumes too much cpu.
This is our code using TextFragmentCollection class:
public byte[] ReplaceSensitiveText(byte[] docPdf, List regularExpressions)
{ using MemoryStream ms = new MemoryStream(docPdf); using Document pdfDocument = new Document(ms); foreach (var item in regularExpressions) { TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(Encoding.UTF8.GetString(Convert.FromBase64String(item))); TextSearchOptions textSearchOptions = new TextSearchOptions(true); textFragmentAbsorber.TextSearchOptions = textSearchOptions; foreach (var page in pdfDocument.Pages) { page.Accept(textFragmentAbsorber); TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments; foreach (TextFragment textFragment in textFragmentCollection) { textFragment.Text = "###"; } } } using MemoryStream mso = new MemoryStream(); pdfDocument.Save(mso); return mso.ToArray();
We have also tried another solution using PdfContentEditor class. This way of doing is faster but it consumes too much memory.
The code is:
public byte[] ReplaceSensitiveText(byte[] docPdf, List regularExpressions)
{
using var ms = new MemoryStream(docPdf);using var mso = new MemoryStream(); using PdfContentEditor pdfContent = new PdfContentEditor(); pdfContent.BindPdf(ms); foreach (var item in regularExpressions) { pdfContent.ReplaceTextStrategy = new ReplaceTextStrategy() { IsRegularExpressionUsed = true, ReplaceScope = ReplaceTextStrategy.Scope.ReplaceAll }; pdfContent.ReplaceText(Encoding.UTF8.GetString(Convert.FromBase64String(item)), "###"); } pdfContent.Save(mso); pdfContent.Close(); return mso.ToArray(); }
We have recently licensed the last version of Aspose Total.
We would like you to tell us which is the fastest and the most efficient way to replace text in pdf document.