In pdf document replace text between two string to be ***

ortasa · May 10, 2017, 1:12am

Thank you for the quick reply.

Can you please help me with the second section: replacing all text from expression till the end of the document? with the current expression only the text until the end of the line is replaced.

Thanks,

Ortal

ortasa · May 10, 2017, 2:38am

Good morning,

I have another question: how can I remove the underline that remain after I replace text ***.

Source text :Examples of text

The current result after the replacement : ***

The desired result result after the replacement : ***

Thans,

Ortal

imran.rafique · May 10, 2017, 1:41pm

Hi,

We have shared a code snippet and an output PDF in our earlier reply (here). It replaces text till the end of the PDF document by incorporating line breaks. However, if the code example is not working for your other PDF documents, then you may need to enhance the regular expression or share the source PDF and details with us. We will then assist you appropriately.

ortasa:
Source text :Examples of text
The current result after the replacement : ***
The desired result result after the replacement : ***

To remove the underline, you can set an Underline property of text as below:

[.NET, C#]

textFragment.TextState.Underline = false;

If this does not help, then please send us your PDF template. We will investigate and share our findings with you.

ortasa · May 10, 2017, 11:28pm

Hi,

textFragment.TextState.Underline = false; Did not work for me (see atached befor and after pdf).
Can you please help me with the second section: replacing all text from expression till the end of the document? with the current expression only the text until the end of the line is replaced. ?

Thanks,

Ortal

imran.rafique · May 11, 2017, 9:24am

Hi Ortal,

Thank you for sending PDF documents.

ortasa:
textFragment.TextState.Underline = false; Did not work for me (see atached befor and after pdf).

Please use the following code and we have attached an output PDF to this reply.

[.NET, C#]

// open document Document pdfDocument = new Document(@"C:\Pdf\test33\Underline befotr.pdf");
// create TextAbsorber object to find all instances of the input search
String from = "<'RepeatingView2'";
String till = "lbj";
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(from + "((.|\n)*)" + till, new TextSearchOptions(true));
// accept the absorber for first page of document
pdfDocument.Pages.Accept(textFragmentAbsorber);
// get the extracted text fragments into collection
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
// loop through the Text fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
for (int count = 1; count <= textFragment.Segments.Count; count++)
{
// Create RedactionAnnotation instance for specific page region
RedactionAnnotation annot = new RedactionAnnotation(pdfDocument.Pages[1], textFragment.Segments[count].Rectangle);
annot.FillColor = Aspose.Pdf.Color.White;
//annot.BorderColor = Aspose.Pdf.Color.Yellow;
annot.Color = Aspose.Pdf.Color.Black;
// Repat Overlay text over redact Annotation
annot.Repeat = false;
if (count == 1)
annot.OverlayText = from + "***" + till;
// Add annotation to annotations collection of first page
pdfDocument.Pages[1].Annotations.Add(annot);
// Flattens annotation and redacts page contents (i.e. removes text and image
// Under redacted annotation)
annot.Redact();
}
}
pdfDocument.Save(@"C:\Pdf\test33\output.pdf");

ortasa:
Can you please help me with the second section: replacing all text from expression till the end of the document? with the current expression only the text until the end of the line is replaced. ?

Kindly prepare and send us your input and expected output PDF documents. We will take a closer look and reply you appropriately. Your response is awaited.

ortasa · May 14, 2017, 4:07am

Good morning,

Thank you for the rapid replay.

I am looking for a way to catch all the text after the word “Conclusion” (": Normal lCT Angiography, no acute pathology.") and replace it in *** in the fdf: “Example anonymous until the end of the document.pdf”.
I could not get rid of the underline (you can see the result in pdf “Example cannot remove underline.pdf”).Do you have another idea?

My code:

// Get the extracted text fragments

Aspose.Pdf.Text.TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

foreach (Aspose.Pdf.Text.TextFragment textFragment in textFragmentCollection)

{

for (int count = 1; count <= textFragment.Segments.Count; count++)

{

// Create RedactionAnnotation instance for specific page region

Aspose.Pdf.Annotations.RedactionAnnotation annot = new Aspose.Pdf.Annotations.RedactionAnnotation(pdfDocument.Pages[1], textFragment.Segments[count].Rectangle);

annot.FillColor = Aspose.Pdf.Color.White;

[//annot.BorderColor](https://annot.bordercolor/) = Aspose.Pdf.Color.Yellow;

annot.Color = Aspose.Pdf.Color.Black;

// Repat Overlay text over redact Annotation

annot.Repeat = false;

textFragment.TextState.FontStyle = Aspose.Pdf.Text.FontStyles.Regular;

// Update text and other properties

textFragment.Text = " *** ";

// Add annotation to annotations collection of first page

pdfDocument.Pages[1].Annotations.Add(annot);

// Flattens annotation and redacts page contents (i.e. removes text and image

// Under redacted annotation)

annot.Redact();

LogManager.GetCachedLogger(“R2DocumentConverter”).Debug(string.Format(“The text that start {0} and end {1} was anonymous.”, startText, endText));

}

}

Thanks,

Ortal

imran.rafique · May 14, 2017, 3:49pm

Hi Ortal,

Thank you for the inquiry.

ortasa:
I am looking for a way to catch all the text after the word “Conclusion” (“: Normal lCT Angiography, no acute pathology.”) and replace it in *** in the fdf: “Example anonymous until the end of the document.pdf”.

There is no direct way to detect the end element of a PDF document. You can extract text from all the pages of a PDF document, and then get the ending text phrase. You will then use this ending text phrase in the regular expression to replace the text after the word “Conclusion” (also known as: starting text phrase) in the PDF document. Please refer to this help topic: Extract Text From All the Pages of a PDF Document

ortasa:
I could not get rid of the underline (you can see the result in pdf “Example cannot remove underline.pdf”).Do you have another idea?

We see that you have modified our code snippet as we shared in the previous post (here). Kindly let us know the problem which you are facing in using our code snippet, because it is working fine on our side. In your modified code, you are not using an OverlayText property of RedactionAnnotation class to replace the text. Your response is awaited.