Image Only PDF

caustin · February 17, 2009, 4:58am

Hi,

Is there some way to determine whether a page of a PDF contains a single image and nothing else?

I’m sure I saw something once, but can’t remember where,

Thanks,

Chris

shahzadlatif · February 17, 2009, 6:14am

Hi Chris,

Aspose.Pdf.Kit can help you in this requirement; however, there is no one straight forward method that can tell you that the page contains only images. Nevertheless, if you use Aspose.Pdf.Kit.Pdf.PdfExtractor calss that'll help you out. You can use HasNextPageText and HasNextImage methods. To specify a particular page before searching for the text or the images StartPage and EndPage properties will work.

Moreover, you can read following topics for better understanding:

Extract Text from PDF Document

Extract Image from PDF Document

I hope this helps. If you need any further help please do let us know.

Regards,

codewarior · February 17, 2009, 11:55am

Hello Chris,

Adding more to Shahzad's comments, if you need to check if the Pdf page contains Image or Text, then you can use the following code snippet to accomplish your requirement,

[C#]

// Instantiate a memoryStream object to hold the extracted text from Document
MemoryStream ms = new MemoryStream();
//Instantiate PdfExtractor object
PdfExtractor extractor = new PdfExtractor();
/ Specify the Start page
extractor.StartPage = 1;
// Specify the End Page
extractor.EndPage = 1;
//Bind the input PDF document to extractor
extractor.BindPdf(@"d:\pdftest\tiffsize_issue.pdf");
//Extract text from the input PDF document
extractor.ExtractText();

//Save the extracted text to a text file
extractor.GetText(ms);
// Check if the MemoryStream length is greater than or equal to 1
if (ms.Length <= 1)
MessageBox.Show("Page contains an Image");
else
MessageBox.Show("Pdf contains text");

and if you need to check, if the Pdf page contains one or more than one image, then you can use the following code snippet.

[C#]

//Instantiate PdfExtractor object
PdfExtractor extractor = new PdfExtractor();
//Bind the input PDF document to extractor
extractor.BindPdf(@"d:\pdftest\tiffsize_issue.pdf");

//Set the Number of Page in PDF document from where to start image extraction
extractor.StartPage = 1;
//Set the Number of Page in PDF document, where to end image extraction
extractor.EndPage = 1;
//Extract images from the input PDF document
extractor.ExtractImage();

//A variable to store the prefix (First Part usually file name) of the image file name
String prefix = @"d:\pdftest\Image_Storage\";
//A variable to store the suffix (Last Part usually file extension) of the image file name
String suffix = ".jpg";
//A variable to count number of extracted images
int imageCount = 0;
//Calling HasNextImage method in while loop. When images will finish, loop will exit
while (extractor.HasNextImage())
{
//Call GetNextImage method to store image as a file
extractor.GetNextImage(prefix + imageCount + suffix);
//Incrementing image counter variable
imageCount++;
}

if(imageCount == 1)
MessageBox.Show("Page contains 1 Image file");
else if (imageCount >1)
MessageBox.Show("Page contains more than 1 Image file" );

In case it does not satisfy your requirement, please feel free to share.

caustin · February 18, 2009, 12:50am

Thanks, the first sample was precisely what I needed to do.