Hello Aspose.Pdf support,
I have scanned pdf files which some blank pages. I try to detect/find its use method isBlank of com.aspose.pdf.Page class of Aspose.Java, but has not good result.
The effect of the threshold-factor is not clear for me. Can you give me more information?
On attached document with blank pages blanks.pdf (164.9 KB)
I try to set different value of fillThresholdFactor (0.01; 0.1; 1; 100) and max number of detect page is 5 (total 10 pages).
Is it bug or is it correct? If it is correct, why are no more pages deleted, when value is higher?
What aspects are taken into account in the algorithm? Calculate the number of colored pixels in a document? Is the border of a document taken into account? Etc.
Are there any other factor, like black-white or use crop like in other applications?
p.s. Also on another document, I noticed that the algorithm often marks the page as empty if the page contains only hand-written text. is It right?
p.s.s. My test code is very easy:
PageCollection pages = pdfDoc.getPages();
for(int i=1;i<=pages.size();i++){
Page p = pages.get_Item(i);
if(p.isBlank(fillThresholdFactor)){
System.out.println(“Empty page:”+i);
}
}