but it is not clear how I can get the metadata To, From, Subject, Keywords, etc that is
associated to portfolio PDF attachments. I can see how to extract attachments from the PDF, that
is well documented, but not to extract the associated metadata
For example see a screen shot of a portfolio PDF with an attachment that has
from, subject, to, date and other metadata fields
As per our understandings, you want to extract custom metadata properties of PDF which is present as an attachment (not the actual PDF which has attachment). Would you please confirm if our understandings are correct and share your sample PDF document with us. We will further test the scenario in our environment and address it accordingly.
Your understanding is correct.
This is a sample file:
I would like to extract From, Subject, Date and other metadata fields associated to the
file attached to this portfolio PDF.
And for example I would expect the value 5/3/2011 1:13:43 PM to be returned for the date field.
Would you please try using following code snippet which returns respective values for metadata.
// Open document
Document pdfDocument = new Document(dataDir + "PorfolioWithCustomMetadata.pdf");
// Get particular embedded file
FileSpecification fileSpecification = pdfDocument.EmbeddedFiles[1];
// Get the file properties
Console.WriteLine("Name: {0}", fileSpecification.Name);
Console.WriteLine("Description: {0}", fileSpecification.Description);
Console.WriteLine("Mime Type: {0}", fileSpecification.MIMEType);
// Get the attachment and write to file or stream
byte[] fileContent = new byte[fileSpecification.Contents.Length];
fileSpecification.Contents.Read(fileContent, 0, fileContent.Length);
pdfDocument = new Document(new MemoryStream(fileContent));
DocumentInfo docInfo = pdfDocument.Info;
// Show document information
Console.WriteLine("Author: {0}", docInfo.Author);
Console.WriteLine("Producer: {0}", docInfo.Producer);
Console.WriteLine("Creation Date: {0}", docInfo.CreationDate);
Console.WriteLine("Keywords: {0}", docInfo.Keywords);
Console.WriteLine("Modify Date: {0}", docInfo.ModDate);
Console.WriteLine("Subject: {0}", docInfo.Subject);
Console.WriteLine("Title: {0}", docInfo.Title);
The value exists in fileSpecification.Description object. Furthermore, if we open attached PDF separately and check its properties, the created and modified dates are different than description. Would you please check the shared code snippet and share your feedback with us. We will further proceed accordingly.
PART 2. is metadata associated to the PDF attachment, which is different from the
metadata in the portfolio PDF, so this does not help us in this particular case
PART 1. is metadata associated to the file in the portfolio PDF, but does not include all the
information we are trying to extract. For reference we are trying to migrate from another PDF
library which provides the metadata in the portfolio as a dictionary with a set of arbitrary
keys and value. and for example provides for the file attached to this portfolio PDF a dictionary
with the following 7 entries:
So the FileSpecification.Description field that Aspose PDF returns, collapses 3 of these fields
(From Date and Subject) into a single one. We have to process arbitrary PDFs which might
have different metadata fields, so this API makes it hard even to retrieve 3 of the 7 fields we’d need.
We have logged an enhancement request as PDFNET-46141 in our issue tracking system for your requirements. We will surely investigate the ticket to implement required functionality in the API. As soon as there are some significant updates regarding ticket resolution, we will let you know. Please be patient and spare us little time.