Extract attachement in a PDF document generated with acrobat XI

321654987 · August 21, 2014, 10:38am

Hi,

Is it normal that Java Aspose library can not get attachements from a PDF document generated with Acrobat 11? (pdfDocument.getEmbeddedFiles().size() returns me 0 ). Or do I need an license to do this ? I’haven’t got any evaluation License. How can I get one and how should I install It?

Thanks in advance

Richard

P.S. I was sucessfull executing “getAllAttachementsFromPDF.java” example code using your data but not when I replace the input.pdf file by an input.pdf file generated with acrobat XI.

tilal.ahmad · August 22, 2014, 12:27am

Hi Richard,

Thanks for your inquiry. I am afraid we do not have any such known issue. Please share your sample PDF document here, we will test the scenario at our end and will guide you accordingly.

Moreover in reference to license, please note Aspose.Pdf
evaluation version has two limitations, evaluation watermark and at most
four elements of any collection can be viewed. You may request for a 30
days temporary license to evaluate our product without
any limitation.

Best Regards,

321654987 · August 25, 2014, 10:25am

Hi Ahmad,

Thank you for your quick answer! I tested several examples of "input" files to identify when the problem appears. I finally found a case that you shoud reproduce in your environnement.

Before this, I have to explain my aspose usage:

1) My source comes from an email (microsoft Outlook professional 2010) with an attached document stored (Word Document Attached.docx).

2) then I "convert it to Adobe PDF file format" (right click on the email I selected). This function in outlook appears when I installed Acrobat pro 11 on my desktop and I get a pdf file (actually a PDF "folder" ["input.pdf" here attached]) which include my email (in pdf format) with its attached document.

3) when I execute my "GetAllTheAttachementsFromPDF.java" on "input.pdf", I successfully extract one File called "Aspose attached Mail document.pdf". Good to know that Aspose can extract PDF elements from PDF Folders!

Now "Aspose attached Mail document.pdf" is my email in PDF format with the "Word Document Attached.docx" Attached. ["inputAutoAttached.pdf" here attached]

4) When I tried to extract the word Document from my "Aspose attached Mail document.pdf" (renamed "input.pdf") , using Java, the pdfDocument.getEmbeddeedFiles().size() returns me 0!

5) But if I manually extract the word document from this same "input.pdf" and I attached it again ["inputManuallyAttached.pdf" here attached], I can sucessfully extract it using JAVA.

You can also noticed that after the point 5), "inputManuallyAttached.pdf" is 5 KB smaller than "inputAutoAttached.pdf"

Thanks in advance for your help. I really would like to use your JAva library for this purpose.

best regards

Richard

tilal.ahmad · August 26, 2014, 2:37am

Hi Richard,

Thanks for sharing the source document. While testing the scenario with the latest version of Aspose.Pdf for Java 9.3.1, we have managed to reproduce the reported issue and logged it in our bug tracking system as PDFNEWJAVA-34397 for further investigation and resolution. We will notify you via this thread as soon as it is resolved.

We are sorry for the inconvenience caused.

Best Regards,

321654987 · August 26, 2014, 2:41am

Hi Ahmad,

Thank you for your aswers and I will be wating for your notification (as soon as possible I hope).

Thanks

Richard

rajeevkrmathur · August 27, 2014, 11:14am

Hi Richard,

I saw your post and thought to share my code which I use to extract attachments. I am not sure if this is going to be helpfull for you, but I hope it helps.

Form pdfForm = new Form();

ByteArrayOutputStream out = new ByteArrayOutputStream();

pdfForm.bindPdf(PDF_FILE);

//get attachment
com.aspose.pdf.FileSpecification fileSpecification = pdfForm.getDocument().getEmbeddedFiles().get_Item(1);
//retrieve the contents of the attachment.
InputStream input = fileSpecification.getContents();
//Storing the Attached File Name.
String fileName = pdfForm.getDocument().getEmbeddedFiles().get_Item(1).getName();
//Deleting the existing attachment.
pdfForm.getDocument().getEmbeddedFiles().delete(fileName);

//Loading the attachment in a new pdf and flattening it.
com.aspose.pdf.Document obj = new com.aspose.pdf.Document(input);

Thanks
Rajeev

codewarior · August 28, 2014, 8:04am

Hi Rajeev,

Thanks for sharing a workaround. I hope that Richard will test it and will share his findings. Besides this, the problem has already been logged in our issue tracking system. The development team will definitely consider resolving the originally reported issue.

tilal.ahmad · August 28, 2014, 8:26am

Hi Rajeev,

Thanks for your suggestion. I am afraid the suggested code also does not extract attachment from Richard’s document. As soon as our development team investigates the scenario and we will update our findings/ETA for solution.

Best Regards,

321654987 · August 28, 2014, 9:41am

Hi Rajeev,

Thank you for your suggestion. I've been trying to fix my problem using your code but even replacing the document source object from a com.aspose.pdf.Document to a com.aspose.pdf.facades.Form, I still have the retun "0" when I do pdfForm.getDocument().getEmbeddedFiles().size();

I hink there is a real bug as noticed by Tilal Ahmad from Apsopse Staff member and I'm waiting for an unpdate from aspose to fix it as soons as possible

Thanks

Richard

codewarior · August 29, 2014, 12:33pm

Hi Richard,

Thanks for sharing your feedback. As shared earlier by Tilal, he also managed to reproduce the issue using above stated code. Nevertheless, as soon as we have made some definite progress towards the resolution of this issue, we will be more than happy to update you with the status of correction. Please be patient and spare us little time.

tilal.ahmad · April 15, 2016, 12:01pm

Hi Richard,

Thanks for your patience. Our product team has investigated the issue and suggested to use following code snippet to find FileAttachmentAnnotations. You may use .getContent() method to extract the file

com.aspose.pdf.Document pdfDocument = new
com.aspose.pdf.Document(myDir + “Aspose attached Mail
document.pdf”);<o:p></o:p>

if (pdfDocument.getEmbeddedFiles().size() != 0)

{

//get particular embedded file

com.aspose.pdf.FileSpecification fileSpecification = pdfDocument.getEmbeddedFiles().get_Item(0);

//get the file properties

System.out.printf("Name: - " + fileSpecification.getName());

System.out.printf("\nDescription: - " + fileSpecification.getDescription());

System.out.printf("\nMime Type: - " + fileSpecification.getMIMEType());

} else if (pdfDocument.getPages().get_Item(1).getAnnotations().size() != 0)

{

Page page = pdfDocument.getPages().get_Item(1);

//let's look for all the annotations

for (Annotation annotation : (Iterable) page.getAnnotations())

{

//Annotation annotation = pdfDocument.getPages().get_Item(1).getAnnotations().get_Item(2);//The fileAttachment is third

if (annotation instanceof FileAttachmentAnnotation)

{

FileSpecification file = ((FileAttachmentAnnotation) annotation).getFile();

if (file != null && file.getName() != null)

{

System.out.printf("Name: - " + file.getName());

System.out.printf("\nDescription: - " + file.getDescription());

System.out.printf("\nMime Type: - " + file.getMIMEType());

InputStream is = file.getContents();

byte[] buffer = new byte[is.available()];

System.out.printf("\navailable bytes: - " + buffer.length);

is.read(buffer);

OutputStream os = new FileOutputStream(myDir + file.getName());

os.write(buffer);

os.close();

}

}.

Please feel free to contact us for any further assistance.

Best Regards,