Detect blank page in scanned PDF documents | Aspose.PDF for .NET | Page.IsBlank is false

Hello Support

Please review the attached scanned image. If you notice on page 3, which is a blank page, the scanner is adding an image to the length side of the page. When passing this page to the Page.Isblank function flags it as not blank. We have tired various threshold factor setting of, .01, .001. Is there any clean up functions to remove anything along the margins of the page, or to ignore anything along the margins.

Unfortunately, we can’t force clients to change settings on their end to determine what is causing this, so the fix needs to be on the back end

@cfavNQ,

Can you please specify which API you are using on your end. Also please check source file is not attached to this thread as you mentioned that you have attached image.

our release version is using 18.11 but in development I have update to 18.12.8b385dd9-54a3-48a5-8b30-f34843c7cf6e_0.PDF (1.6 MB)

@cfavNQ,

Can you please share sample code along with source file so that we may further investigate to help you out.

                                    var pdfDoc = new Aspose.Pdf.Document(coverScan);
                                    var collection = pdfDoc.Pages.OrderBy(f => f.Number);
                                    foreach (var page in collection)
                                    {
                                        if (page.IsBlank(ThresholdFactor))
                                        {
                                            var docPage = childJob.DocumentPages.FirstOrDefault(f => f.OriginalPageNumber == page.Number);
                                            if (docPage != null)
                                            {
                                                _logger.DebugFormat($"Page# {docPage.OriginalPageNumber} is blank in document {childJob.Id}");
                                                docPage.Blank = true;
                                            }
                                        }
                                        else
                                            _logger.DebugFormat($"Page# {page.Number} is not blank in document {childJob.Id}");
                                    }
                                }

@cfavNQ

We are checking the details and will get back to you soon.

@cfavNQ

We have tested the scenario in our environment using Aspose.PDF for .NET 20.6 and initial investigation showed that 3rd Page of the PDF is not actually blank but there is a blank image on it. That is the reason API is not honoring it as blank page. Nevertheless, we have logged a ticket as PDFNET-48399 in our issue tracking system to further investigate this scenario.

We will look into details of the logged ticket and keep you informed about its resolution status. Please be patient and spare us some time.

We are sorry for the inconvenience.