Allowing extraction of PDF data without allowing modification

I have a situation where I need to create a PDF using the toolkit that doesn’t allow a user to modify the PDF, but does allow 3rd party products to extract text and data from the PDF. What’s the proper way of allowing this to happen?

Right now I’m using:

PdfFileSecurity pdfFileSecurity = new PdfFileSecurity(theInFile, theOutFile);

DocumentPrivilege priv = DocumentPrivilege.AllowAll;
priv.AllowModifyContents = false;

pdfFileSecurity.SetPrivilege(priv);

This does indeed stop users from modifying the document, but it also apparently doesn’t allow extraction. See the attached screenshots of the Properties/Security tab of the generated PDF. Thanks!

Bryan

Hello Steve,

In order to set a feature to copy the contents of PDF documents, please set the value of IsCopyingAllowed property of Security class to true. Please take a look over the following code snippet in which I have created a simple PDF document, where contents modification is not allowed but user can copy the contents of the document. The screenshot of the security tab of document properties section and resultant PDF document are also attached.

[C#]

//Instantiate Pdf instance by calling its empty constructor
Pdf pdf1 = new Pdf();
//Assign a security instance to Pdf object
pdf1.Security = new Security();

//Restrict annotation modification
pdf1.Security.IsAnnotationsModifyingAllowed = false;
//Restrict contents modification
pdf1.Security.IsContentsModifyingAllowed = false;
//Restrict copying the data
pdf1.Security.IsCopyingAllowed = true;
//Allow to print the document
pdf1.Security.IsDocumentAssemblyingAllowed= false;
//Restrict form filling
pdf1.Security.IsFormFillingAllowed = false;

//Add a section in the Pdf
Aspose.Pdf.Section sec1 = pdf1.Sections.Add();
//Create a text paragraph
Text text1 = new Text(sec1,"this is text content");
//Add the text paragraph to the section
sec1.Paragraphs.Add(text1);
//Save the Pdf
pdf1.Save(@"d:/pdftest/PdfSecurityTest.pdf");

In case it does not resolve your problem or you have any further query, please feel free to contact.

We are sorry for your inconvenience.

Thanks for the help. I’m trying to get this to work using the PdfFileSecurity object, instead of using a Pdf object. I followed similar actions to what you did above, using this object instead, but the results are the same as what you see above in my screenshots.

What I’m doing is using the Aspose Document object to save a Word document to a PDF, and then I need to set the privs on that PDF:

Document doc = new Document(“WordDoc.doc”);
doc.SaveToPdf(“unprotected.pdf”);

PdfFileSecurity pdfFileSecurity = new
PdfFileSecurity(“unprotected.pdf”, “protected.pdf”);



DocumentPrivilege priv = DocumentPrivilege.AllowAll;

priv.AllowModifyContents = false;

priv.AllowModifyAnnotations = false;

priv.AllowCopy = true;

priv.AllowAssembly = false;

priv.AllowFillIn = false;



pdfFileSecurity.SetPrivilege(priv);

Any ideas?

Hello Steve,

Thanks for sharing the detailed information.

Please ignore my previous post as in my earlier post, I have shared the procedure on how to apply the security constraints over the PDF document while its being generated using Aspose.Pdf for .NET. But, as per your requirement, you are converting the word document into PDF format using Aspose.Words and then applying the security constraints over the resultant PDF using Aspose.Pdf.Kit.

For your convenience, I am moving this thread to Aspose.Pdf.Kit forum and I believe our team of experts taking care of that product, will be able to answer your query in more appropriate way.

We apologize for your inconvenience.

Hi Steve,

Please share the problematic PDF (converted from Word file) with us, so we could test the issue at our end and you’ll be updated accordingly.

We’re sorry for the inconvenience.
Regards,

Sure - please see attached.

Bryan

Hi Bryan,

Can you please try the following code at your end?


Aspose.Pdf.Kit.PdfFileSecurity fileSec = new Aspose.Pdf.Kit.PdfFileSecurity(“input.pdf”, “output.pdf”);


Aspose.Pdf.Kit.DocumentPrivilege priv = Aspose.Pdf.Kit.DocumentPrivilege.AllowAll;

priv.AllowCopy = false;

priv.AllowModifyContents = false;

//other attributes not required set to false…


fileSec.SetPrivilege(priv);


This code sets text extraction to allowed while setting content copy or modify features to not allowed. Please try it at your end and see if it helps in your scenario.

If you find any more questions, or need further assistance, please do let us know.
Regards

Sorry, but this still doesn’t work. The resulting PDF shows the following in View/Properties (see screenshot below).

My exact code, where theInFile is an existing PDF that allows copying and page extraction.

public static void SetPdfPriviledges(string theInFile, string theOutFile) {
Aspose.Pdf.Kit.PdfFileSecurity fileSec = new Aspose.Pdf.Kit.PdfFileSecurity(theInFile, theOutFile);

Aspose.Pdf.Kit.DocumentPrivilege priv = Aspose.Pdf.Kit.DocumentPrivilege.AllowAll;
priv.AllowCopy = false;
priv.AllowModifyContents = false;

fileSec.SetPrivilege(priv);
}

Hi Bryan,

Yes, I can see the current view/properties; however, I have noticed that after setting these attributes you can’t copy or modify the contents, but you can extract the text from the PDF file. I have successfully extracted the text from the file.

Can you please try extracting the text and content copying at your end? Also, please share if that helps in your scenario.

We’re looking forward to help you out.
Regards,

The problem that I have is that I don’t know what tool my customer’s client is using to try to extract information. I just know that they’re saying their tool can’t get the information out.

I figured that getting the “Page Extraction” property to be “true” in the properties would do the trick. How can I use Aspose to get that flag to be true, while not allowing modification of content? Or are the two mutually exclusive?

Bryan

Hi Bryan,

I have logged this issue as PDFKITNET-15796 in our issue tracking system. Our team will investigate this issue in detail and you’ll be updated accordingly.

We’re sorry for the inconvenience.
Regards,

Hi Bryan,

I would like to update you regarding issue that the Text Copying and Text Extraction are the same thing. You can either allow both (Copy and Extraction) or neither of these two. I would like to add that it is treated the same according to the PDF specification, so it is not possible to allow only one of these two actions.

We’re sorry for the inconvenience. If you have any further questions, please do let us know.
Regards,



I know it’s been a while, but this is coming back to haunt me. I have the following code:

        public static void SetPdfPriviledges(string theInFile, string theOutFile) {
PdfFileSecurity pdfFileSecurity = new PdfFileSecurity(theInFile, theOutFile);
        <span style="color:#2b91af;">DocumentPrivilege</span> priv = <span style="color:#2b91af;">DocumentPrivilege</span>.AllowAll;
	priv.AllowModifyContents = <span style="color:blue;">false</span>;
	priv.AllowModifyAnnotations = <span style="color:blue;">false</span>;
	priv.AllowCopy = <span style="color:blue;">true</span>;
    	priv.AllowAssembly = <span style="color:blue;">false</span>;
	priv.AllowFillIn = <span style="color:blue;">false</span>;

        pdfFileSecurity.SetPrivilege(priv);
    }<br><br><br><font size="3" face="Times New Roman">Note that I have "AllowCopy" set to "true".  However, the PDF that's created still has <br>"Page Extraction" set to "False" in the resulting PDF. Please see attached PDF as an example.  <br>Question:<br>  - What controls the "Page Extraction" setting on the resulting PDF?<br><br>Thanks,<br><br>Bryan</font><br></pre>

Hi Bryan,

We’re looking into this issue and will get back to you shortly.

We’re sorry for the inconvenience.
Regards,

Hi Bryan,

I have looked into your requirement and would like to share the details with out. When we apply privileges on a PDF file, we also need to set the change password on the file. Adobe Acrobat also requires setting change password or owner password while setting privileges. However, even if we allow all privileges after encrypting the file using the owner password the page extraction is still not allowed. This is the same behavior as the Adobe Acrobat. The only way to set the page extraction to allowed is by decrypting the PDF using owner password. I’m afraid, it is not feasible to set page extraction to allowed while setting the other privileges. You can also test using Adobe Acrobat.

We’re sorry for the inconvenience. If you have any further questions, please do let us know.
Regards,