How to read low-level properties of the object representing a form (/Subtype/Form) in a page using Aspose.Pdf

Scenario

Kofax Power PDF allows users to add text on PDF pages. Users use Edit > Modify > Typewriter tool for this. My task is to remove all instances of such text from PDF (created by Typewriter tool) but leave all other text.

Findings

What I figured out, it that editor adds this text inside form (/Subtype/Form); this form can be accessed via pdf.Pages[N].Resources[N].Forms[N].

So, theoretically, I need a way to identify forms that were created using Typewriter tool and remove them. I looked at the PDF file structure and found out that this form is represented by the following object:

<<
    /Subtype/Form
    /BBox[ 242.2637 546.4844 612.0137 558.7783]
    /Matrix[ 1 0 0 1 0 0]
    /Length 136
    /IT/Typewriter
    /Rotate 0
    /Contents(ccccccccccccccc\r)
    /RC(<?xml version="1.0" ?> <body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:7.0.0" xfa:spec="2.0.2" style = "font-size:12.00pt;font-family:'Times New Roman'"><p><span style="text-decoration:;font-size:9.00pt;font-family:'Avenir Black'">ccccccccccccccc&#13;</span></p></body>)
    /DS(text-decoration:;font-size:12.00pt;font-family:'Times New Roman')
    /GUID(e0b013a0-4a19-40cf-af-7f16701f2ebb60)
    /Rect[ 242.2637 546.4844 612.0137 558.7783]
    /Resources
    <<
        /ProcSet[/PDF/Text/ImageB/ImageC/ImageI]
        /Font<</F0 471 0 R>>
    >>
>>stream.....

It seems that I can use /Subtype/Form and /IT/Typewriter property-value pair to identify such forms, but I do not know how to get access to these properties using Aspose.Pdf.

Questions

  • Is it possible to get access to low-level object properties mentioned above to identify form for removal?
  • Is there alternative way to identify and remove text inserted by Typewriter tool.

sample for aspose.pdf (6.2 KB)

Related Issue?

I tried to remove forms from the page using the latest Aspose.PDF 20.6:

var pdf = new Document(@"...");
pdf.Pages[1].Resources.Forms.Clear();

but forms are still present in the collection.

UPDATE 2020.06.06

Added more details to the initial post.

Thanks.

@licenses

You may please try removing text from PDF using operators or TextFragmentAbsorber and check if output contains forms collection or not. However, please share the details about PDF Editor that you are using to add text if issue still persists. We will test the scenario in our environment and address it accordingly.

Hi.

Here are more details/comments:

  • Text that I need to identify and remove is added using Edit > Modify > Typewriter tool in Kofax Power PDF Standard application.
  • I cannot use TextFragmentAbsorber, because it will remove all test entries in the PDF and that is not what I need. I need to remove only text, that has been added using Typewriter tool.

I attached new sample PDF. It contains 2 text entries:

  • one added while editing new (blank) document in Kofax Power PDF Standard;
  • second added using Edit > Modify > Typewriter tool.

Created then edited in Kofax Power PDF Standard.pdf (7.7 KB)

I need to remove 2nd text entry. This text entry is added inside /Subtype/Form form with /IT/Typewriter property.

Thanks.

@licenses

Thanks for providing further details.

We regret to share that Aspose.PDF does not provide a feature to extract/remove text on the basis of subtype/markup. However, a feature request as PDFNET-48381 has been logged in our issue tracking system for the sake of implementation. We will investigate the feasibility of this feature and keep you posted with the status of its implementation. Please spare us some time.

We are sorry for the inconvenience.

Could you please create a ticket to investigate the issue, why forms were not removed. See initial comment for more info.

@licenses

This behavior of the API has also been logged along with the ticket and we will also investigate it while analyzing feasibility of the logged feature. We will inform you as soon as we have additional updates in this regard. Please spare us some time.

Off topic question.

Why do not I receive email notification when new comment is added in the thread that I created (like this one)?

My Profile > General > Email is set to my email address.

Thanks.

@licenses

Would you kindly try to check Spam or Others folder in your mailbox. If issue still persists, please create a topic under Complaint forum category where you will be assisted accordingly.

Hi asad.ali

We worked with support on notification issue. Could you please post another message here to check that issue has been resolved everywhere.

Thanks.

@licenses

This is just a test message. Maybe in the past it was a propagation issue between the identity server and the forums. I hope this message reaches your inbox.

The issues you have found earlier (filed as PDFNET-48381) have been fixed in Aspose.PDF for .NET 23.12.