I am using Aspose.PDF to extract images from a PDF file. The PDF file contains a bunch of images in JPG format (that is, they are encoded in the PDF file with the /Filter /DCTDecode option, after which comes the actual JPG data).
Hi Avi,
ashmid_a: Here’s a directory with a single JPG file, and a PDF file containing that JPG file (which I created with Aspose.PDF):
As you can see if you look at the hex data for “newfile.pdf”, the JPG is included within the PDF file as is, byte for byte (from location 0x010F to location 0x6D545). Indeed, this is the great thing about the /DCTDecode filter in PDF files: it allows the PDF file to contain a complete JPG file, without doing any sort of recompression or transcoding.
However, when I run extractor.GetNextImage() to extract the image (as detailed in my previous message), the resulting JPG is significantly smaller, and it is apparent that Aspose.PDF is not providing access to the original JPG data that is within the PDF file, but rather it is reencoding and recompressing it. Instead, I’d like to be able to use Aspose.PDF to extract JPG images from PDF files without any loss of quality. The JPG data is fully there, so it should be accessible without a problem.
Hi Avi,
Thanks for sharing the resource files.
I have tested the scenario and I am able to reproduce the same problem. For the sake of correction, I have logged it in our issue tracking system as PDFNEWNET-37075. We will investigate this issue in details and will keep you updated on the status of a correction.
We apologize for your inconvenience.