I am using Aspose.PDF to extract images from a PDF file. The PDF file contains a bunch of images in JPG format (that is, they are encoded in the PDF file with the /Filter /DCTDecode option, after which comes the actual JPG data).
I’ve tried the Save method of the xImage object, and I’ve tried the GetNextImage method of the PdfExtractor object (see below). However, in both cases, the resulting JPG file is somewhat smaller than the original data encoded in the file. It is clear that Aspose is recompressing the data before it saves it as a JPG.
Instead, I’d like to access the actual JPG data, just as it appears within the PDF file. How can I get the actual JPG data for a given xImage object?
Here are the methods that I tried (unsuccessfully):
1]
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(args[1]);
// traverse through individual pages of PDF file
for (int pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++) {
// traverse through each image extracted from PDF pages
foreach (XImage xImage in pdfDocument.Pages[pageCount].Resources.Images) {
string savefilename = “image-” + pageCount + “.jpg”;
FileStream fs = new FileStream(savefilename, FileMode.CreateNew);
//save output image
xImage.Save(fs);
}
}
2]
PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(args[1]);
extractor.ExtractImage();
int i = 1;
while (extractor.HasNextImage()) {
Console.WriteLine("Getting image number " + i);
extractor.GetNextImage(“image-” + i + “.jpg”);
i++;
}
Note: I tried also specifying ImageFormat.Jpeg, but to no avail; I still receive a smalled, recompressed image. Instead, I’d like to be able to access the actual image data as stored in the PDF file, and to then write that out as a file.