Delete all selected text from PDF document using TextFragmentAbsorber Class in Aspose.PDF for .NET

I am using the TextFragmentAbsorber to iterate over the text fragments contained in a PDF document, and I would like to deleted selected TextFragment objects. How can I do this? The TextFragmentCollection class does not seem to have a Remove() option.

Hi Avi,


Thanks for contacting support.

In order to remove TextFragment, you can replace the fragment with blank instance. Once the contents are replaced, you can re-arrange page contents to avoid any formatting issues. Please visit the following links for related information on


In case you encounter any issue, please share your resource files, so that we can test the scenario in our environment.

Please elaborate and explain what you mean by “replace the fragment with blank instance” (I examined the links, but they don’t have any examples of blank instances).



Hi Avi,


In order to accomplish your requirement, the text can be replaced with blank value and page contents can be auto adjusted as shown in following code snippet. However during my testing, I have observed that page contents are not being auto arranged. For the sake of correction, I have logged this problem
as PDFNEWNET-39171 in our issue tracking system. We will further
look into the details of this problem and will keep you updated on the status
of correction. Please be patient and spare us little time. We are sorry for
this inconvenience.

[C#]

// Load source PDF file<o:p></o:p>

Document doc = new Document("c:/pdftest/42441893 (1).pdf");

// Create TextFragment Absorber object with regular expression

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("organisationer");

textFragmentAbsorber.TextReplaceOptions.ReplaceAdjustmentAction = TextReplaceOptions.ReplaceAdjustment.AdjustSpaceWidth;

doc.Pages.Accept(textFragmentAbsorber);

// Replace each TextFragment

foreach (TextFragment textFragment in textFragmentAbsorber.TextFragments)

{

// Set font of text fragment being replaced

textFragment.TextState.Font = FontRepository.FindFont("Arial");

// Set font size

textFragment.TextState.FontSize = 12;

textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.Navy;

// Replace the text with larger string than placeholder

textFragment.Text = "";

}

// Save resultant PDF

doc.Save(“c:/pdftest/TextRemove.pdf”);

Dear Nayyer,


1] Thank you for this explanation. I now understand my mistake. Previously I was setting the TextSegment.Text field to a blank string, which had no effect at all. I see now that I have to reset the TextFragment.Text field, rather than the TextSegment.Text field.

2] Indeed, as you note, after deleting selected characters, the rest of the text is completely misarranged. Worse, many of the text fields become doubled (e.g. fragments that were “2” are now “22”), and others have disappeared from the page completely. Please do notify us when this has been resolved.

Sincerely,
Avi

ashmid_a:
2] Indeed, as you note, after deleting selected characters, the rest of the text is completely misarranged. Worse, many of the text fields become doubled (e.g. fragments that were "2" are now "22"), and others have disappeared from the page completely. Please do notify us when this has been resolved.
Hi Avi,

Concerning to above stated scenario, can you please share some resource files and code snippet, so that we can further look into this matter.

Please note that during my testing, I used one of my sample PDF files and did not notice contents duplication issue.

@ashmid_a

Concerning your initial inquiry, we would like to share with you that now you can Remove All Text from PDF document using TextFragmentAbsorber which is a faster way to remove text. Please use the linked example with the latest version of the API and in case you need further information, please feel free to let us know.