Hello,
I am using Aspose.Pdf v. 6.8 and the old Aspose.Pdf.Kit (5.5) to extract file attachments from PDFs (both embedded files and page file annotations).
There are issues with certain PDFs, mostly very large PDF Portfolios (100MB >).
The old pdf.kit version runs into an out of memory error when calling PdfExtractor.ExtractAttachment(). If I ignore the error and continue, it does at least retrieve some of the attachments (until the out of memory error) which can be saved using PdfExtractor.GetAttachment().
In the new version, I have tried using both the Aspose.Pdf.Facades.PdfExtractor and Aspose.Pdf.Document classes to extract the attachments. There are 2 major issues here:
1 - There are numerous pdf files that it works with Pdf.Kit but not the new version. I can produce samples, but they are large files so I will need an FTP site to upload them to. For these files, all of which are PDF portfolios (and the PDFFileInfo class correctly identifies them as such with the "HasCollection" property), no attachments are found. The EmbeddedFiles collection is empty for the Pdf.Document class, and PdfFileExtractor returns no files either.
2 - In the case where there is a very large single attachment to a Pdf (my sample file that I can send has a 600 MB file embedded in the PDF), there appears to be no way to extract the file without reading the whole thing into memory. Instead of throwing an out of memory exception, it keeps going but at an exceptionally slow speed and the output file does not save the whole file if I let it finish (which took well over an hour, though I do not know exactly how long). I can tell this using Task Manager as soon as I access "Contents" of the PdfFileSpecification class - initially the Memory Usage and I/O read bytes increase very quickly to about 500 MB/280 MB, then it hits a threshold - the memory usage drops to ~300 where it stays and then the I/O read bytes slows to a crawl, about 10-15 seconds per MB.
I can provide an FTP link to sample files or upload them to a site of your choosing.
Any help would be appreciated.
Thanks
Doug