HOW TO: loop through a PDF and find a text boxes- images etc

Hi,


I have a project where I need to identify each element in a PDF.

The elements will be a mixture of images and text boxes.

I want to open document, loop through it and identify what the element is, for image get its x,y,width,height,colorspace.

For text box I want to get: x,y, width,height, font details.

I am having trouble getting started, can anyone point me in the right direction please.

Thanks!

Hi joshua,


Thanks for contacting support.

Please check following documentation pages for details
Moreover, you can use following code snippet to extract the images and get its properties.

c#

Aspose.Pdf.Document doc = new Aspose.Pdf.Document(dataDir+ “ImagePlacement.pdf”);
ImagePlacementAbsorber abs = new ImagePlacementAbsorber();
// Load the contents of first page
doc.Pages[1].Accept(abs);
foreach (ImagePlacement imagePlacement in abs.ImagePlacements)
{
// Get image properties
Console.Out.WriteLine(“image width:” + imagePlacement.Rectangle.Width);
Console.Out.WriteLine(“image height:” + imagePlacement.Rectangle.Height);
Console.Out.WriteLine(“image LLX:” + imagePlacement.Rectangle.LLX);
Console.Out.WriteLine(“image LLY:” + imagePlacement.Rectangle.LLY);
Console.Out.WriteLine(“image horizontal resolution:” + imagePlacement.Resolution.X);
Console.Out.WriteLine(“image vertical resolution:” + imagePlacement.Resolution.Y);

// Retrieve image with visible dimensions
Bitmap scaledImage;
using (MemoryStream imageStream = new MemoryStream())
{
// Retrieve image from resources
imagePlacement.Image.Save(imageStream, System.Drawing.Imaging.ImageFormat.Png);
Bitmap resourceImage = (Bitmap)Bitmap.FromStream(imageStream);

// Create bitmap with actual dimensions
scaledImage = new Bitmap(resourceImage, (int)imagePlacement.Rectangle.Width, (int)imagePlacement.Rectangle.Height);
}
}

If you need further assistance, please feel free to contact us.

Best Regards,