page.IsBlank do not detect all pages

Hello Aspose.Pdf support,

I have scanned pdf files which some blank pages. I try to detect/find its use method isBlank of com.aspose.pdf.Page class of Aspose.Java, but has not good result.
The effect of the threshold-factor is not clear for me. Can you give me more information?

On attached document with blank pages blanks.pdf (164.9 KB)
I try to set different value of fillThresholdFactor (0.01; 0.1; 1; 100) and max number of detect page is 5 (total 10 pages).
Is it bug or is it correct? If it is correct, why are no more pages deleted, when value is higher?
What aspects are taken into account in the algorithm? Calculate the number of colored pixels in a document? Is the border of a document taken into account? Etc.
Are there any other factor, like black-white or use crop like in other applications?

p.s. Also on another document, I noticed that the algorithm often marks the page as empty if the page contains only hand-written text. is It right?

p.s.s. My test code is very easy:
PageCollection pages = pdfDoc.getPages();
for(int i=1;i<=pages.size();i++){
Page p = pages.get_Item(i);
if(p.isBlank(fillThresholdFactor)){
System.out.println(“Empty page:”+i);
}
}

@aleksand

Thank you for contacting support.

The PDF document shared by you does not contain entirely blank pages in most of the cases. They contain noise instead of plain white and blank. So setting bigger threshold can ignore smaller noise but problems can occur if small text also exists on some document. Technically, the pages are not blank so we are afraid this scenario may not be detected efficiently.

For problem with hand written contents, please share sample document while mentioning what value are you using for fillThresholdFactor with that file.

Yes I understand that technically, the pages are not blank but its no any text. I was hoping you use algorithms that could detect such pages (easy algorithm which can detect only 1 color page no required any additional methods in library).

I am sorry but I do not understand about threshold again. I understand that bigger threshold can ignore smaller noise but I try to set 100 and no any difference between 0.1. In description it wrote that minimal value is 0.01 but no max value. Can you write which max value of this parameters and which result it will give (all pages will detect as blank)?

I attached problem document with hand written contents. If I set fillThresholdFactor as 0.02, result will blank for both pages.
hand.pdf (63.0 KB)

@aleksand

Thank you for elaborating further.

We have been able to notice the problem with threshold. A ticket with ID PDFJAVA-38906 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

The issues you have found earlier (filed as PDFJAVA-38906) have been fixed in Aspose.PDF for Java 19.11.