Can you tell me if your PDF API supports text redaction, as cannot find any mention of it on website and nothing in thesamples on github?
I need to be able to remove confidential text from PDF files e.g. dates of birth, TAX ID etc either by supplying text I want to remove as a string or even better by supplying as RegEx so can remove ID’s when I know format but not exact values
Thanks
@wingers999
Thanks for contacting support.
You can surely use Aspose.PDF to extract/find text using regular expressions and redact found text using RedactionAnnotation Class. Please check following code snippet:
Document pdf = new Document(dataDir + "input.pdf");
string regex = @"[0-9-]{19}|[0-9 ]{19}|[0-9]{16}";
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(regex, new TextSearchOptions(true));
textFragmentAbsorber.TextReplaceOptions.ReplaceAdjustmentAction = TextReplaceOptions.ReplaceAdjustment.None;
pdf.Pages.Accept(textFragmentAbsorber);
foreach (TextFragment textFragment in textFragmentAbsorber.TextFragments)
{
RedactionAnnotation ra = new RedactionAnnotation(textFragment.Page, textFragment.Rectangle);
ra.FillColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Black);
ra.BorderColor = Aspose.Pdf.Color.Black;
textFragment.Page.Annotations.Add(ra);
ra.Color = Aspose.Pdf.Color.Black;
ra.Redact();
}
pdf.Save(dataDir + "redacted.pdf");
In case you face any issue while implementing the functionality, please share your sample PDF document with us along with the information you want to redact. We will test the scenario in our environment and address it accordingly.
Thank you, that seems to work well - will continue my testing and come back if I have any questions