Find and redact text in PDF using Aspose.PDF for .NET

        TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("inc");


        textFragmentAbsorber.TextSearchOptions.IsRegularExpressionUsed = true;
        pdfDocument.Pages.Accept(textFragmentAbsorber);
        TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
      
        int lastpageno = 0;
     
        foreach (TextFragment textFragment in textFragmentCollection)
        {
           
                
            
			Aspose.Pdf.Rectangle myrectange;
  
			myrectange = textFragment.Rectangle;

		  
			RedactionAnnotation ra = new RedactionAnnotation(textFragment.Page, myrectange);
		  
			textFragment.Page.Annotations.Add(ra);
		 
			ra.FillColor = textFragment.Page.Background;
			ra.Color = textFragment.Page.Background;
			ra.BorderColor = textFragment.Page.Background;
		}

input file:
test.pdf (282.8 KB)
output file:
test_R.pdf (282.2 KB)
It seems that result is not what I want. How to solve? Many thanks!

@tomgreen

We could not find the word “inc” in your PDF while searching it in Adobe Reader. Would you please share an expected output PDF where redaction annotation is added at your desired location? We will test the scenario in our environment and address it accordingly.

Using the code I’ve given above, Aspose will use reg-expression to draw a rectangle,resulting enclose the word “cosθ + y sin” (which is showed in the result file “test_R.pdf”) .

@tomgreen

We tested the scenario in at our end and did not notice any issues. We used Aspose.PDF for .NET 21.2 and it generated expected output result. We used below code snippet for testing which was the same as you shared except - we added ra.Redact() at the end inside loop to apply the redaction:

Document pdfDocument = new Document(dataDir + "test.pdf");
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("inc");
textFragmentAbsorber.TextSearchOptions.IsRegularExpressionUsed = true;
pdfDocument.Pages.Accept(textFragmentAbsorber);
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

foreach (TextFragment textFragment in textFragmentCollection)
{
 Console.WriteLine(textFragment.Text);
 Aspose.Pdf.Rectangle myrectange;
 myrectange = textFragment.Rectangle;
 RedactionAnnotation ra = new RedactionAnnotation(textFragment.Page, myrectange);
 textFragment.Page.Annotations.Add(ra);
 ra.FillColor = textFragment.Page.Background;
 ra.Color = textFragment.Page.Background;
 ra.BorderColor = textFragment.Page.Background;
 ra.Redact();
}
pdfDocument.Save(dataDir + "redacted.pdf");

Redacted.pdf (282.9 KB)

Would you please try to use the latest version at your end and let us know in case you still face any issues.

I mean the text absorber seems a little strange as to finding text. The text is not I want.What I want to find is “inc”.But aspose catches “cosθ + y sin”. Maybe the direction is wrong ,which causes the “false positive” result.

@tomgreen

The Aspose.PDF mimics the behavior of Adobe Reader and when we tried to search the text “inc” while opening your PDF in Adobe Reader, it did not return any results either. Would you please share a screenshot of this text over the PDF which was shared here? We will further proceed to assist you accordingly.

As you said, “found nothing” is the result I expect. While Aspose still return the false positive result,which is kind of strange.

@tomgreen

We apologize for the confusion caused during the initial investigation. Yes, you are right as the API should not return any result while searching a word which is not present in the PDF. We have logged an investigation ticket as PDFNET-49492 in our issue management system to analyze this behavior of the API. We will look into its details and keep you posted with the status of ticket resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.