Hello Aspose Support,
We need to perform true, irreversible redaction across Word (DOC/DOCX) and PDF using Aspose.Total — not just masking, but fully deleting sensitive text and metadata so nothing remains searchable or recoverable.
Could you confirm:
- Which built-in functions/APIs exist for full redaction (text/content, annotations, embedded objects, metadata)?
- Any best-practice samples or recommended patterns for applying and verifying these redactions?
Thanks,
Ashiq
@ashiqshanavas
Hello Ashiq,
To achieve true, irreversible redaction in Aspose.Total for both Word and PDF documents, follow these steps:
Cause: It is essential to redact sensitive information to protect privacy and comply with data protection regulations.
Solution: Utilize the following built-in functions and APIs for effective redaction:
-
For PDF Redaction:
- Use the
PdfContentEditor
class to remove text and images by specifying the areas to redact and applying the changes.
- The
RedactionAnnotation
class allows you to create redaction annotations for specific areas of the PDF.
-
For Word Redaction:
- The
Document
class provides methods to find and replace sensitive text. Use Range.Replace
to remove specific text.
- To ensure metadata is also redacted, use the
Document.RemoveAllProperties
method to clear any document properties that may contain sensitive information.
-
Annotations and Embedded Objects:
- For both formats, ensure to remove any annotations or embedded objects that may contain sensitive data. In PDF, iterate through the annotations and remove them using the
Annotations
collection.
-
Metadata Removal:
- In Word, use the
Document.RemoveAllProperties
method to clear metadata.
- In PDF, use the
PdfDocumentInfo
class to access and clear metadata.
Best Practices:
- Always create a backup of your documents before performing redactions.
- After redaction, verify the document by searching for the sensitive text to ensure it has been completely removed.
- Consider using a test document to validate your redaction process before applying it to sensitive documents.
Next Steps: Here’s a minimal code example for redacting text in a PDF:
using Aspose.Pdf;
using Aspose.Pdf.Annotations;
// Load the PDF document
Document pdfDocument = new Document("input.pdf");
// Create a redaction annotation
RedactionAnnotation redaction = new RedactionAnnotation(pdfDocument.Pages[1], new Rectangle(100, 100, 200, 200), "Redacted");
pdfDocument.Pages[1].Annotations.Add(redaction);
// Apply the redaction
redaction.Apply();
// Save the document
pdfDocument.Save("output.pdf");
And for Word:
using Aspose.Words;
// Load the document
Document doc = new Document("input.docx");
// Remove sensitive text
doc.Range.Replace("sensitive text", "", new FindReplaceOptions());
// Remove all properties
doc.RemoveAllProperties();
// Save the document
doc.Save("output.docx");
Please let me know if this information helps! If you have any further questions or need more examples, feel free to ask. I’m here to assist you.
@ashiqshanavas
1). You may use Aspose.PDF for .NET to achieve redaction using the RedactionAnnotation
class. This allows you to specify a region of a page that you want to redact.
e.g.,
using (var document = new Aspose.Pdf.Document("input.pdf"))
{
var annot = new Aspose.Pdf.Annotations.RedactionAnnotation(document.Pages[1], new Aspose.Pdf.Rectangle(200, 500, 300, 600));
annot.FillColor = Aspose.Pdf.Color.Green;
annot.BorderColor = Aspose.Pdf.Color.Yellow;
annot.Color = Aspose.Pdf.Color.Blue;
annot.OverlayText = "REDACTED";
annot.TextAlignment = Aspose.Pdf.HorizontalAlignment.Center;
annot.Repeat = true;
document.Pages[1].Annotations.Add(annot);
annot.Redact();
document.Save("RedactPage_out.pdf");
}
2). For redacting contents in a Word document (DOC/DOCX), you may try using Find/Replace options provided by Aspose.Words. See the document with examples for your reference: Find and Replace in C#|Aspose.Words for .NET
Moreover, to give you better guidance and complete details, my colleagues from Aspose.Words and Aspose.PDF teams will assist you soon. @alexey.noskov, @asad.ali FYI.
@ashiqshanavas I am afraid, there is no built-in method for making redactions in MS Word documents using Aspose.Words. I am not sure this is possible to achieve in MS Word documents, you can replace content in the document with some dummy content and fill it with black background.
@alexey.noskov
Thanks for the suggestions so far. I was able to get PDF redaction working fine using annotations and the Redact()
method — that part is solid.
I was able to handle Word redaction by replacing sensitive text with black square characters. However, this approach still exposes information — for example, when the document is opened, someone can select the redacted text and infer the original word length (since the number of replacement characters matches the original count).
Is there a way in Aspose.Words to:
- Completely remove the original text instead of just replacing it, and insert a fixed-length placeholder (e.g., one ■ regardless of original length)?
- Or alternatively, draw a black rectangle/shape over the removed content so the underlying length cannot be guessed?
Any recommendations or best practices for achieving true irreversible redaction in Word would be much appreciated.
@ashiqshanavas As an option, you can try fully remove redacted text and replace it with black rectangle. For example see the following code:
string[] redactionWords = new string[] { "James", "Bond", "Agent", "007", "Viktor", "Malenkov", "Nyx", "Prague", "Astronomical", "Clock", "Tower", "Charles", "Singapore", "AURORA-9", "SPECTRE" };
string redactionWordsRegexString = @"(" + String.Join("|", redactionWords.Select(s => Regex.Escape(s))) + ")";
Regex redactionWordsRegex = new Regex(redactionWordsRegexString, RegexOptions.IgnoreCase);
Document doc = new Document(@"C:\Temp\in.docx");
// Replace words that must be redacted with themselves to make them to be represented with a single Run.
FindReplaceOptions opt = new FindReplaceOptions();
opt.UseSubstitutions = true;
opt.ApplyFont.HighlightColor = Color.Red;
doc.Range.Replace(redactionWordsRegex, "$1", opt);
// Wrap each word into bookmark to be able to calculate size of the redacted words using LayoutCollector and LauoutEnumerator.
List<string> tempBookmakrs = new List<string>();
int i = 0;
foreach (Run r in doc.GetChildNodes(NodeType.Run, true))
{
// LayoutCollector and LauoutEnumerator do not work with nodes in documents header/footer and inside shapes.
if (r.GetAncestor(NodeType.HeaderFooter) != null || r.GetAncestor(NodeType.Shape) != null)
continue;
if (redactionWordsRegex.IsMatch(r.Text))
{
string bkName = $"tmp_{i}";
tempBookmakrs.Add(bkName);
r.ParentNode.InsertBefore(new BookmarkStart(doc, bkName), r);
r.ParentNode.InsertAfter(new BookmarkEnd(doc, bkName), r);
i++;
}
}
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
foreach (string bkName in tempBookmakrs)
{
Bookmark bk = doc.Range.Bookmarks[bkName];
// Calculate rectangle occupied by the word to be redacted
enumerator.Current = collector.GetEntity(bk.BookmarkStart);
RectangleF start = enumerator.Rectangle;
enumerator.Current = collector.GetEntity(bk.BookmarkEnd);
RectangleF end = enumerator.Rectangle;
RectangleF result = RectangleF.Union(start, end);
// Create shape with the same size as the redacted word.
Shape s = new Shape(doc, ShapeType.Rectangle);
s.WrapType = WrapType.Inline;
s.FillColor = Color.Black;
s.Width = result.Width;
s.Height = result.Height;
s.Font.Position = -(result.Height / 2);
// Remove redacted text.
bk.Text = "";
// Insert shape.
bk.BookmarkStart.ParentNode.InsertAfter(s, bk.BookmarkStart);
// remove temporary bookmark.
bk.Remove();
}
doc.Save(@"C:\Temp\out.docx");
in.docx (17.0 KB)
out.docx (14.4 KB)