How do I find / Remove this element?

dlaskey · December 4, 2019, 5:19pm

We have some older documents which are PDFs that were created from word documents, but instead of added a “watermark” to the document, it looks like they added a 2 text boxes to the Header. One text box contains the text, the other does the rotation / transparency.

I am able to find and remove these textboxes when I have the word document, but unsure what the element gets converted into in PDF. I have have done many attempts trying to find the “Closed to Accrual” text with fragments and/or an Absorber, but have been unsuccessful.

Is it getting converted to a drawing layer?

dlaskey · December 4, 2019, 5:21pm

05-060 PDF CF.pdf (146.9 KB)

asad.ali · December 5, 2019, 6:25am

@dlaskey

The elements are converted into form fields in PDF document. You can call Document.Flatten() method in order to disable them. In case you need further assistance, please feel free to let us know.

dlaskey · December 5, 2019, 10:52am

Not sure I understand.

I am trying to REMOVE the element that looks like a watermark (“Closed to Accrual”).

It does not appear to be a watermark, nor an annotation.

I believe when these originally created the “Closed to Accrual” text was put into a text box in Word that was then rotated and set so it appears on top of the underlying content.

These documents already exist as PDFs, so I am trying to figure out in the PDF how to identify what element the “Closed to Accrual” is and how to remove it.

When these elements were created as watermarks, I can remove them fine, but need a second set of code that then works for the “faux watermark” like exists in the document uploaded

asad.ali · December 5, 2019, 9:50pm

@dlaskey

Thanks for explaining the issue again.

Would you kindly provide the sample code snippet that you have tried so far to remove the watermark. We will further proceed to assist you accordingly.

dlaskey · December 6, 2019, 1:40pm

I tried going at it as annotations.

Aspose.Pdf.Document doc = new Aspose.Pdf.Document(@convertDIR.ToString() + inFileName);

            String watermarkText = theMark.ToString();


            foreach (Aspose.Pdf.Page page in doc.Pages)
            {
                foreach (Annotation annot in page.Annotations)
                {
                    page.Annotations.Remove(annot);
                }
            }

This is successful at removing the Textboxes we add as part of our stamping process, but no luck with the watermark

This code WORKS when we have the underlying WORD document - But in most cases, we only have the PDF stored or available, since that is what we publish to the investigators. Tried to use this logic against PDFs with no luck

Aspose.Words.Document doc = new Aspose.Words.Document(@convertDIR.ToString() + inFileName);

            String watermarkText = theMark.ToString();
            foreach (Aspose.Words.HeaderFooter hf in doc.GetChildNodes(NodeType.HeaderFooter, true))
            {
                Console.WriteLine(hf.HeaderFooterType);
                Console.WriteLine(hf.IsHeader);
                foreach (Aspose.Words.Drawing.Shape shape in hf.GetChildNodes(NodeType.Shape, true))
                {
                    Console.WriteLine(shape.GetText());
                    string myText = shape.GetText();
                    Console.WriteLine(shape.Name);
                    Console.WriteLine(shape.TextPath.Text);

                    if (shape.Name.Contains("Not For Subject Use") || shape.TextPath.Text.Contains(watermarkText) || shape.GetText().Contains(watermarkText) || shape.TextPath.Text.ToLower().Contains(watermarkText.ToLower()))
                    {
                        shape.Remove();
                    }
                }
            }

Similarly tried Text Fragments to at least loop through in debug and see if I can see the item in the collection… came back 0 object size

                               Aspose.Pdf.Text.TextFragmentAbsorber TFA = new Aspose.Pdf.Text.TextFragmentAbsorber("Closed \r\nto\r\nAccrual");
                                doc.Pages[1].Accept(TFA);
                                TextFragmentCollection TFC = TFA.TextFragments;

                                foreach(TextFragment TF in TFC)
                                {
                                    TF.Text = "";
                                }

Tried Stamps

Aspose.Pdf.Facades.PdfContentEditor contentEditor = new Aspose.Pdf.Facades.PdfContentEditor();

contentEditor.BindPdf(doc);

Aspose.Pdf.Facades.StampInfo[] stampInfo = contentEditor.GetStamps(1);

for (int i = 0; i < stampInfo.Length; i++)

{

}

And in the above, was just trying to see if it gave me a collectioon and then I could see the collection in debug as how best to identify, but came back with size of 0

asad.ali · December 7, 2019, 4:59am

@dlaskey

We have logged an issue as PDFNET-47420 in our issue tracking system for further investigation on this scenario. We will check it in details and keep you posted as soon as we have definite updates about issue resolution. Please be patient and spare us little time.

We are sorry for the inconvenience.