We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

PDF redaction text

Hello. I have a pdf-documents with redactions, how i can get marked text?

Example pdf in attachments

Thanks

Hi Ilya,


Thanks for contacting support.

I am afraid currently you cannot directly extract text from redacted area of PDF document. However you may consider extracting the text from complete PDF document using Aspose.Pdf for .NET and then you can trim/select the desired text from complete contents. Please visit the following link and try using the code snippet shared over Extract Text from all the Pages using Text Device

In case it does not satisfy your requirements or you have any further query, please feel free to contact. We are sorry for this inconvenience.

Nayyer,


I just want to make sure I understand your reply properly.

Do I understand correctly there is no way to specifically extract text ‘marked for redaction’ from a PDF using Aspose.Pdf?

I understand that I could extract ALL the text, but then how would I know what text is marked for redaction?

NOTE:
Text ‘marked for redaction’ is still visible in the document. It is NOT completely redacted. When it is completely redacted, it is irrevocably blacked out. In Acrobat Pro X, the text ‘marked for redaction’ has red borders around it. Please see attached.

Thanks,
Nick

Hi Nick,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Nick Miller:
Do I understand correctly there is no way to specifically extract text 'marked for redaction' from a PDF using Aspose.Pdf?

Yes, at the moment Aspose.Pdf for .NET does not provide the feature to specifically extract text ‘marked for redaction’. I have created a new feature request in our issue tracking system with issue id: PDFNEWNET-34521 for our development team to further analyze the issue. They will check if this can be supported in future.

Sorry for the inconvenience,

Hi,

Is there any update on the progess of PDFNEWNET-34521?
I’ve got a patched version of iText which will do this, but would prefer to use Aspose if possible.
Nick

Hi Nick,


Thanks for your inquiry. I’m afraid we haven’t plan the feature at the moment due to other priority tasks. However, we will update you via this forum thread as soon as we plan the feature analysis and implementation.

Thanks for your patience.

Best Regards,

Hi Nick,


Thanks for your patience. Please consider following code snippet to extract PDF redaction text. You may iterate through annotations and extract text from related rectangle.

Document pdfDocument = new Document(inputFile);<o:p></o:p>

StringBuilder builder = new StringBuilder();

foreach (Page page in pdfDocument.Pages)

{

foreach (Aspose.Pdf.InteractiveFeatures.Annotations.Annotation annotation in page.Annotations)

{

if (annotation is PopupAnnotation)

continue;

Aspose.Pdf.Rectangle rect = annotation.Rect;

TextAbsorber absorber = new TextAbsorber();

absorber.TextSearchOptions.Rectangle = rect;

absorber.Visit(page);

builder.Append(absorber.Text);

builder.Append("\r\n\r\n");

}

}

System.IO.File.WriteAllText(outFile, builder.ToString());

Please feel free to contact us for any further assistance.


Best Regards,

The issues you have found earlier (filed as PDFNEWNET-34521) have been fixed in Aspose.Pdf for .NET 10.0.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.