Detecting an "Image only" PDF

Hi,


What is the simpliest way to detect that a PDF is only built from images (like in case when the PDF has been created by a MFD or a scanner) ? I tried to use
doc.Pages[pageCurrent].Contents and 
doc.Pages[pageCurrent].Resources.Images
but in some cases you can have a PDF whis is not only built from plain page images 
and contains one Image per PDF page (ex. an Invoice with a logo…)

Thanks for you help

Hi Daniel,


Thanks for your inquiry. I’m afraid currently there is no such functionality available in Aspose.Pdf to confirm whether Pdf document is only built form images. However, I’ve logged an investigation issue in our issue tracking system as PDFNEWNET-35144. We will notify you via this forum thread as soon as it is resolved.

Please feel free to contact us for any further assistance.

Best Regards,

Hi Daniel,


Thanks for your patience. Please check following code snippet for detecting whether PDF document has only images. Also please pay attention that we’ve supplied the most simple way of defining image only PDFs. Proposed code snippet uses show text operator to deduce that it is image only PDF. In general there can be other rules of detecting image only PDFs and that can be defined using DOM (i.e. by analyzing pages content).

bool
HasOnlyImages(string filename)<o:p></o:p>

{

Document document = new Document(filename);

OperatorSelector os;

foreach (Page page in document.Pages)

{

os = new OperatorSelector(new Operator.ShowText());

page.Contents.Accept(os);

if (os.Selected.Count != 0)

return false;

}

return true;

}

Hopefully it will help to achieve your requirements.


Best Regards,