How to redact text in Word, PDF, Excel and PPTX documents using Aspose.Total for .NET

Hi,

We have Aspose.Total for .NET license and want to redact text in Word, PDF, Excel and PPTX documents. Please suggest and sample code snippet is very much appreciated.


This Topic is created by tilal.ahmad using Email to Topic tool.

@stvarghese,

For PDF, please try to use following code snippet via Aspose.PDF API:
e.g
Sample code:

Aspose.Pdf.Document doc = new Aspose.Pdf.Document(dataDir + "test.pdf");
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(searchTerm);
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
doc.Pages.Accept(textFragmentAbsorber);
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
foreach (TextFragment textFragment in textFragmentCollection)
{
Page page = textFragment.Page;
Aspose.Pdf.Rectangle annotationRectangle = textFragment.Rectangle;
Aspose.Pdf.Annotations.RedactionAnnotation annot = new Aspose.Pdf.Annotations.RedactionAnnotation(page, annotationRectangle);
annot.FillColor = Aspose.Pdf.Color.Black;
doc.Pages[textFragment.Page.Number].Annotations.Add(annot, true);
annot.Redact();
}
doc.Save(dataDir + "output.pdf"); 

For Excel files, you may try to find your specified text (in the cells) and replace with some dummy data via Aspose.Cells APIs. See the following sample code for your reference:
e.g
Sample code:

  1) Find/Replace in a worksheet

    Workbook wb = new Workbook("e:\\test2\\Input.xlsx");
            Worksheet sheet = wb.Worksheets[0];
            FindOptions opts = new FindOptions();
            opts.LookInType = LookInType.Values;
            opts.LookAtType = LookAtType.Contains;
            opts.RegexKey = true;
            Aspose.Cells.Cell cell = null;
            do
            {   //find only whole word and not part of any word.  
                cell = sheet.Cells.Find("\\bKIM\\b", cell, opts);
                if (cell != null)
                {
                    string celltext = cell.Value.ToString();
                    celltext = celltext.Replace("KIM", "^^^^^^^^^^");
                    cell.PutValue(celltext);
                }
            }
            while (cell != null);

            wb.Save("e:\\test2\\out1.xlsx");


        2) Find/Replace in whole workbook.

        Workbook wb = new Workbook("e:\\test2\\Input.xlsx");
        wb.Replace("\\bKIM\\b", "^^^^^^^^", new ReplaceOptions() { RegexKey = true });  
       wb.Save("e:\\test2\\output.xlsx");

For PPT/PPTX, Redact support is not available in Aspose.Slides API. We have added a new feature request with an id “SLIDESNET-42091” into our database. We will look into it soon.

For Word documents, we will share the details soon.

@stvarghese,

Regarding MS Word documents using Aspose.Words API, you can use the approach shared in the document here:

@stvarghese,

Using Aspose.Slides, you can use highlight text color feature to Redact text. You can consider the following sample code on your end.

using (Presentation pres = new Presentation("pres.pptx"))
{
    const string toRedact = "bKIM";
    string stub = new string(' ', toRedact.Length);

    foreach (ISlide slide in pres.Slides)
    {
        ITextFrame[] textFrames = SlideUtil.GetAllTextBoxes(slide);
        foreach (ITextFrame textFrame in textFrames)
        {
            textFrame.Text = textFrame.Text.Replace(toRedact, stub);
            textFrame.HighlightText(stub, Color.Black);
        }
    }

    pres.Save("pres-edited.pptx", SaveFormat.Pptx);
}

Hi @mudassir.fayyaz,

Could you also help me with the code snipet for PDF find and Replace word?

Regards,
Stefy Varghese

@stvarghese,

This is already provided in one of the previous post in the thread. I will paste it again here.
e.g
Sample code:

Aspose.Pdf.Document doc = new Aspose.Pdf.Document(dataDir + "test.pdf");
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(searchTerm);
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
doc.Pages.Accept(textFragmentAbsorber);
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
foreach (TextFragment textFragment in textFragmentCollection)
{
Page page = textFragment.Page;
Aspose.Pdf.Rectangle annotationRectangle = textFragment.Rectangle;
Aspose.Pdf.Annotations.RedactionAnnotation annot = new Aspose.Pdf.Annotations.RedactionAnnotation(page, annotationRectangle);
annot.FillColor = Aspose.Pdf.Color.Black;
doc.Pages[textFragment.Page.Number].Annotations.Add(annot, true);
annot.Redact();
}
doc.Save(dataDir + "output.pdf"); 

Let us know if it does not suit your needs.

Ok, i seen the fill colour and thought this is for Highlight… So which is for Highlight.
@Amjad_Sahi/@mudassir.fayyaz

@stvarghese,

See the following code snippet which adds highlight annotation in PDF:
e.g
Sample code:

foreach (TextFragment textFragment in textFragmentCollection1)
{
Aspose.Pdf.Annotations.HighlightAnnotation highlightText = new Aspose.Pdf.Annotations.HighlightAnnotation(textFragment.Page, new Aspose.Pdf.Rectangle(textFragment.Position.XIndent, textFragment.Position.YIndent, textFragment.Position.XIndent + textFragment.Rectangle.Width, textFragment.Position.YIndent + textFragment.Rectangle.Height));
highlightText.Opacity = 0.5;
highlightText.Color = Aspose.Pdf.Color.Yellow;
textFragment.Page.Annotations.Add(highlightText);
} 

Hope, this helps a bit.

The issues you have found earlier (filed as SLIDESNET-42091) have been fixed in this update.