Hi,
What is the simpliest way to detect that a PDF is only built from images (like in case when the PDF has been created by a MFD or a scanner) ? I tried to use
doc.Pages[pageCurrent].Contents and
doc.Pages[pageCurrent].Resources.Images
but in some cases you can have a PDF whis is not only built from plain page images
and contains one Image per PDF page (ex. an Invoice with a logo…)
Thanks for you help
Hi Daniel,
Thanks for your inquiry. I’m afraid currently there is no such functionality available in Aspose.Pdf to confirm whether Pdf document is only built form images. However, I’ve logged an investigation issue in our issue tracking system as PDFNEWNET-35144. We will notify you via this forum thread as soon as it is resolved.
Please feel free to contact us for any further assistance.
Best Regards,
Hi Daniel,
Thanks for your patience. Please check the following code snippet for detecting whether a PDF document has only images. Also, please pay attention that we’ve supplied the simplest way of defining image-only PDFs. The proposed code snippet uses the show text operator to deduce if it is an image-only PDF. In general, there can be other rules for detecting image-only PDFs, and these can be defined using the DOM (i.e., by analyzing the page’s content).
bool HasOnlyImages(string filename)
{
Document document = new Document(filename);
OperatorSelector os;
foreach (Page page in document.Pages)
{
os = new OperatorSelector(new Operator.ShowText());
page.Contents.Accept(os);
if (os.Selected.Count != 0)
return false;
}
return true;
}
Hopefully, it will help to achieve your requirements.
Best Regards,