Extract BufferedImage from image in a PDF page (JAVA)

The link below describes how to extract metadata of images in a PDF.

I modified this snipped to get the BufferedImage from an ImagePlacement like this:

// Open document
Document document = new Document(dataDir + "Test.pdf");

// Create ImagePlacementAbsorber object to perform image placement search
ImagePlacementAbsorber abs = new ImagePlacementAbsorber();

// Accept the absorber for first page
document.getPages().accept(abs);

// Display image placement properties for all placements
for (Object imagePlacement : abs.getImagePlacements())
{     
    ImagePlacement placement = (ImagePlacement) imagePlacement;
	XImage xImage = placement.getImage();
	BufferedImage bi = xImage.lf(); // doesn't look right
	System.out.println("BufferedImage: " + bi.toString());
}

My questions:

  1. How can I get a BufferedImage from the ImagePlacement?
    Although xImage.lf() works, I don’t think this is the correct way to do it.

  2. What’s the difference between an Aspose XImage and Aspose Image?
    Is there some sort of conversion?

@martin.duerig

Could you please share the sample source PDF with us as well so that we can also test the scenario in our environment and address it accordingly.

You can find a sample PDF here
https://jeroen.github.io/images/ocrscan.pdf

@martin.duerig

We have logged an investigation ticket as PDFJAVA-40416 in our issue tracking system for your requirements. We will further look into ticket details and let you know a soon as it is resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.

@martin.duerig

Regarding PDFJAVA-40416,

ImagePlacement contains image data in some internal format not connected with BufferedImage. To get BufferedImage instance we need to save internal image data into an image with required options as a stream, then load it as BufferedImage:

Example:

//get image as Jpeg
ByteArrayOutputStream baos = new ByteArrayOutputStream();
xImage.save(baos, ImageType.getJpeg(), 150);
BufferedImage imBuff = ImageIO.read(new ByteArrayInputStream(baos.toByteArray()));
System.out.println("BufferedImage1: " + imBuff.toString());

//get image as PNG
baos = new ByteArrayOutputStream();
xImage.save(baos, ImageType.getPng(), 300);
imBuff = ImageIO.read(new ByteArrayInputStream(baos.toByteArray()));
System.out.println("BufferedImage2: " + imBuff.toString());

com.aspose.Image is used only during the PDF generation process. This object should be added into Paragraphs and will be processed when the document will be saved or processed paragraphs.