I just wonder if there is a way to remove all text in a PDF file. I only need vector images. Because I cannot extract vector images out of a PDF file, if this works, that will help a lot. Thanks.
Hi Tony,
In order to remove all the text from the PDF file, you may try our new merged Aspose.Pdf for .NET 6.2.0. You need to replace all the text segments with the empty string i.e. “”. Please use the following sample to remove all the text from the PDF file:
//open document
Document pdfDocument = new Document(“input.pdf”);
//create TextAbsorber object to find all text fragments
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
//get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
//replace text with empty string
textFragment.Text = “”;
}
pdfDocument.Save(“output.pdf”);
Regards,
You example code not working with the attached PDF. I confirmed I tried version 6.2.0.0
Hi Tony,
I have reproduced this problem at my end and logged it as PDFNEWNET-30701 in our issue tracking system. Our team will look into this issue and you’ll be updated via this forum thread once it is resolved.
We’re sorry for the inconvenience.
Regards,
Hi Tony,
We would like to share with you that Aspose.PDF for .NET now offers a much faster way to delete all text from PDF document. Please check following snippet in order to achieve that:
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(myDir + @"2.pdf");
// Used text showing operators
Operator[] operators = new Operator[]
{
new Operator.ShowText(),
new Operator.SetGlyphsPositionShowText(new List()),
new Operator.MoveToNextLineShowText(),
new Operator.SetSpacingMoveToNextLineShowText(0,0,""),
};
foreach (Page page in pdfDocument.Pages)
{
ArrayList list = new ArrayList();
OperatorCollection pageOperators = page.Contents;
foreach (Operator op in operators)
{
OperatorSelector operatorSelector = new OperatorSelector(op);
pageOperators.Accept(operatorSelector);
list.AddRange(operatorSelector.Selected);
}
pageOperators.Delete(list);
}
pdfDocument.Save(myDir + "TextRemoved_operators_18.4.pdf");