Extract images from PDF document using Aspose.PDF for .NET - Extraction on 2nd page and other fail

bartroozendaal · May 10, 2020, 1:48pm

Hi,

Enclsoed you find a PDF created with InDesign. The images on the first page can be extracted from the Resources properties of the page, but subsequent pages don’t report the images. If the document is converted into docx, the images are in the Word document as image objects.

Did I find a bug?Rijksdienst voor Oudheidkundig Bodemonderzoek; Kerkstraat 1, Amersfoort - hires.pdf (8.5 MB)

Thanks for letting me know…

Adnan.Ahmad · May 11, 2020, 8:28am

@bartroozendaal,

Thanks for contacting support.

Can you please share complete environment details along with working sample project so that we may further investigate to help you out.

bartroozendaal · May 11, 2020, 1:13pm

Please find enclosed a console app demonstrating the problem. As you can see, the tool reports 0 images on page 2 and further, while there are images in the PDF.BugAspose.zip (8.4 MB)

Adnan.Ahmad · May 11, 2020, 7:32pm

@bartroozendaal,

Thanks for contacting support.

I have observed your issue and like to inform that I have created investigation ticket with ID PDFNET-48166 in our issue tracking system to investigate and resolve this issue as soon possible.

bartroozendaal · May 26, 2020, 6:44pm

We’re 14 days on. Is there any update on this issue?

Adnan.Ahmad · May 27, 2020, 11:11am

@bartroozendaal,

I regret to inform that issue is still unresolved. As per our company policy, the first priority for investigation is given to the Paid Support i.e. Enterprise and Priority Support on first come first serve basis. After that the issues from normal support forum are scheduled for investigation on first come first serve basis. I request for your patience and we will share good news with you soon.

bartroozendaal · June 6, 2020, 8:58am

Is there any news? Any thought on when this will be looked at?

I paid $999 for this tool. To others that may not be a lot, but for me this is a big, big deal.

asad.ali · June 8, 2020, 2:08pm

@bartroozendaal

Would you please try using following code snippet in order to extract images from PDF document as we tested using following code snippet and all images were extracted:

Document pdf = new Document(dataDir + "Rijksdienst voor Oudheidkundig Bodemonderzoek; Kerkstraat 1, Amersfoort - hires.pdf");
int index = 0;
foreach (Page page in pdf.Pages)
{
 ImagePlacementAbsorber imagePlacementAbsorber = new ImagePlacementAbsorber();
 page.Accept(imagePlacementAbsorber);
 foreach (ImagePlacement imagePlacement in imagePlacementAbsorber.ImagePlacements)
 {
  // Get the image using ImagePlacement object
  XImage image = imagePlacement.Image;
  string outputFileName = dataDir + "img_" + index + ".jpg";
  FileStream fs = new FileStream(outputFileName, FileMode.OpenOrCreate);
  image.Save(fs, 300);
  fs.Close();
  index += 1;
 }
}

bartroozendaal · June 8, 2020, 4:04pm

Thank you for this. This seems to work much better. However, there is at least one file that I just tried, that is getting into an infinitive loop at page 14 of that document. The file is pretty big (450Mb). I will implement this workaround and run the logic over my complete set of files again. Should I find a smaller file that still gives an error, I can send that, otherwise I will get permission from my client to send the big file.

I’ll let you know…

asad.ali · June 8, 2020, 9:49pm

@bartroozendaal

Thanks for your feedback.

You can surely share your problematic PDF document with us regardless of its size. Our forum supports maximum of 10MB of uploading. However, you can upload your larger file to some public file sharer e.g. Dropbox, Google Drive, and share the link with us. We will surely test the scenario in our environment and address it accordingly.

bartroozendaal · June 10, 2020, 8:11pm

Hi, i managed to extract all images from the files based on the code you provided. This method seems to be a whole lot slower than what I had before, many files taking much more time than before.

Nevertheless, it looks like my problem is solved in this case, with an additional dose of patience that is. Thanks for you help in this.

asad.ali · June 11, 2020, 12:15am

@bartroozendaal

Thanks for sharing your feedback.

It is good to know that things have started working at your side.

An issue has already been logged in our issue tracking system for the previous approach you were using and we will surely share updates with you as soon as it is resolved. Please spare us some time.