Unable to clear Keywords in existing PDF

I am trying to clear several properties of an existing PDF document, but the keywords field doesn't get cleared when I also clear the subject and title fields. Is this a bug?

Here's some VB code that shows the issue:

Dim Document = New Aspose.Pdf.Document(FileName)

Document.Info.Title = ""

Document.Info.Subject = ""

Document.Info.Keywords = ""

Document.Save(FileName)

If you comment out the second or third lines, then it works correctly. But, if you leave it as is, the keywords field isn't cleared - it simply adds a new blank keyword to the keywords list ("keywords" becomes " ; keywords"). Is there a way to remove the keywords from the PDF properties?

Hi Michael,


Thanks for contacting support.

I have tested the scenario while using Aspose.Pdf for .NET 7.7.0 where I have used one of my sample PDF files and I am unable to notice any issue. The information inside PDF file is being removed. I have used the following code snippet.

[C#]

//open document<o:p></o:p>

Document pdfDocument = new Document("c:/pdftest/Wiht_Info.pdf");

//specify document information

DocumentInfo docInfo = new DocumentInfo(pdfDocument);

docInfo.Title = "";

docInfo.Subject = "";

docInfo.Keywords = "";

//save output document

pdfDocument.Save("c:/pdftest/Wihtout_info.pdf");


Can you please share the source PDF file causing the problem so that we can test the scenario at our end. We are sorry for this inconvenience.

I've attached the before and after PDFs that are used in my testing. The after document still has the keyword "nokeys" displayed in the properties.

Note: I've been accessing the document properties through the Info property of the Document object. Is it prefered to use a new DocumentInfo object, as you demonstrate in your sample? Both appear to behave the same to me - is there a difference?

Either way, neither your code (translated to VB.NET - we do a lot of work with MS Office, and most of our development is done in VB.NET, with some legacy VB6 and VBA) nor mine worked in my testing. If you open the document using Adobe Reader, the old keywords are still there in the properties dialog. In addition, "nokeys" appears in the PDF markup.

Thanks for looking into this.

Hi Michael,


Thanks for sharing the details.

I have again tested the scenario using Aspose.Pdf for .NET 7.7.0 over Windows 7(X64) and as per my observations, the Keywords and other information is being removed from PDF file. For your reference, I have also attached the updated PDF file which is generated over my end.

Furthermore, please note that using DocumentInfo is recommended approach.


[VB.NET]

'open document<o:p></o:p>

Dim pdfDocument As New Document("c:/pdftest/sample.pdf")

'specify document information

Dim docInfo As New DocumentInfo(pdfDocument)

docInfo.Title = ""

docInfo.Subject = ""

docInfo.Keywords = ""

'save output document

pdfDocument.Save("c:/pdftest/Removed_Info_output.pdf")

The document you attached still has the keywords in it. Open the document in Adobe Reader and look at the file's properties and you'll see them. I've attached a screen shot of what I'm seeing.

In addition, when I open the file in notepad and look at the markup, the original keywords are still stored in the tag. We need to be able to completely remove keywords from the document, including the markup. Is there a way that works to do this with Aspose?

Note: your test file was marked as having been changed by "Apose.PDF for .NET 7.8.0", so you are testing on a later version than I have available, but it appears the issue still exists.

Thanks for the help on this.

Hi Michael,


Thanks for providing additional information, I've managed to reproduce the issue at my end and logged it as PDFNEWNET-35067 in our issue tracking system for further investigation and resolution. We will keep you updated regarding issue status via this thread.


Sorry for the inconvenience faced.


<span style=“font-size:10.0pt;line-height:115%;font-family:“Arial”,“sans-serif”;
mso-fareast-font-family:Calibri;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Best Regards,

Hi Michael,


Thanks for the information and sharing the details. In my earlier attempt, I tested the scenario with Aspose.Pdf for .NET 7.7.0 and also with upcoming release version 7.8.0 and I did not notice any keywords under properties section of resultant file. However I have again tested the scenario on a separate machine and I can see that properties are still there.

Since my fellow worker has been also able to reproduce it and its been logged in our issue tracking system, we will further investigate this problem and will keep you posted on the status of correction. We are really sorry for this inconvenience.

PS, I will try to figure out the difference between two machines.

Thanks for the information. One detail I forgot to mention that might help when investigating this issue: if you don't change both the Title and the Subject properties, and don't touch the other properties at all, then the keywords appear to be correct when opened in Adobe Reader, but they still appear in the markup. For our purposes, we need all of the keywords completely removed from the document, and not just from the properties dialog - if someone can read the PDF markup and see what the keywords were, then it's just as much of a problem as if they're in the properties dialog.

Thanks again.

mwhalen:
One detail I forgot to mention that might help when investigating this issue: if you don’t change both the Title and the Subject properties, and don’t touch the other properties at all, then the keywords appear to be correct when opened in Adobe Reader, but they still appear in the markup. For our purposes, we need all of the keywords completely removed from the document, and not just from the properties dialog - if someone can read the PDF markup and see what the keywords were, then it’s just as much of a problem as if they’re in the properties dialog.
Hi Michael,

Thanks for sharing this valuable information. We will definitely consider this information during the resolution of this problem. Furthermore, I have again tested the scenario using code snippet shared over 453808 and as per my observations, when viewing the resultant PDF file in Adobe Reader 7, the keywords does not appear and when viewing the same file in Adobe Reader 10.1.4 or Adobe Acrobat 10, the keywords appear. In my earlier attempt when I have stated that I cannot see Keywords in resultant file, I used Adobe Reader 7.

For your reference, I have also attached the image file explaining this behavior.

Hi Michael,


In order to remove Keywords information from PDF file, please try using following code snippet.

[C#]

Document pdfDocument = new Document(“c:/pdftest/Sample (2).pdf”);<o:p></o:p>

pdfDocument.RemoveMetadata(); // remove XML metadata

pdfDocument.Info.Remove("Title");

pdfDocument.Info.Remove("Subject");

pdfDocument.Info.Remove("Keywords");

pdfDocument.Save("c:/pdftest/RemovedInfo.pdf");

The issues you have found earlier (filed as PDFNEWNET-35067) have been fixed in Aspose.Pdf for .NET 8.0.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan