Aspose.pdf redaction is not removing the underlying text

I am trying to redact the underlying text in pdf. PDF is sent to method as a multipart file object.
Option1(preferred):

  1. Getting the inkstrokes to get the annotation data.
  2. Saving the output as bytestream for the client download.

Option2:

  1. Saving the file as png to rasterize the pdf doc
  2. converting the png back to pdf
  3. saving the output pdf to bytestream

When the downloaded file is openend in edge and I click on Hide all annotations the underlying text is displayed.

I am using aspose-pdf # 21.7 for jdk17

Please advise what can I do?

@Hbharati

To effectively redact underlying text in a PDF using Aspose.PDF, you can utilize the RedactionAnnotation class. This class allows you to specify a region of the PDF that you want to redact, and it can remove the text and images beneath that area. Here’s a step-by-step guide on how to implement this:

Option 1: Using RedactionAnnotation

  1. Open the PDF Document: Load your PDF document using the Document class.
  2. Create a RedactionAnnotation: Define the area you want to redact by creating a RedactionAnnotation instance and specifying the coordinates of the rectangle that covers the text.
  3. Set Properties: You can set properties such as fill color, border color, overlay text, and text alignment for the redaction.
  4. Redact the Content: Call the Redact() method on the annotation to remove the underlying content.
  5. Save the Document: Save the modified document to a desired output location.

Here’s a code example in Java:

import com.aspose.pdf.Color;
import com.aspose.pdf.Document;
import com.aspose.pdf.Rectangle;
import com.aspose.pdf.RedactionAnnotation;

public class RedactExample {
    public static void main(String[] args) {
        // Load the PDF document
        Document doc = new Document("input.pdf");

        // Define the area to redact
        Rectangle rect = new Rectangle(200, 500, 300, 600);
        RedactionAnnotation annot = new RedactionAnnotation(doc.getPages().get_Item(1), rect);
        
        // Set properties for the redaction
        annot.setFillColor(Color.getBlack());
        annot.setOverlayText("REDACTED");
        annot.setTextAlignment(HorizontalAlignment.Center);
        annot.setRepeat(true);
        
        // Add the annotation to the page
        doc.getPages().get_Item(1).getAnnotations().add(annot);
        
        // Perform the redaction
        annot.Redact();
        
        // Save the document
        doc.save("RedactedOutput.pdf");
    }
}

Option 2: Rasterizing the PDF

If the redaction does not work as expected, you can consider rasterizing the PDF as a PNG and then converting it back to PDF. This method effectively removes all underlying text but may result in a loss of quality and searchability.

  1. Convert PDF to PNG: Use a method to convert the PDF pages to PNG images.
  2. Create a New PDF: Convert the PNG images back to a new PDF document.
  3. Save the New PDF: Save the new PDF document for download.

Troubleshooting

If the underlying text is still visible after redaction, ensure that you are calling the Redact() method on the annotation after adding it to the page. Additionally, check that the coordinates of the rectangle accurately cover the text you wish to redact.

For further assistance, you may want to refer to the official Aspose documentation or community forums for more specific use cases and examples.

Sources