Blank Page Detection

Hello,

I have need to detect blank page in a pdf document., Is there an API call that I can use to detect blank page.

Thanks

Soujanya Kumar

Hello Soujanya,

Thanks for using our products.

In order to accomplish your requirement, you may traverse through all the pages of PDF document and try to extract Text, Images, Annotations, Attachments, watermarks and in case the page is empty, nothing will be returned. Please visit the following links for further details on

In case you need any further information, please feel free to contact.

And what if the page contains an empty (= white) image?

I consider this as a blank page but your detection method will not.

Hi Corne,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for using our product.

Well, technically, if a PDF contains an image (which is blank) means it is not a blank PDF because it has image contents. However, regarding this specific scenario, I have requested our development team to share further details if it is possible to handle such a scenario. Once I get a response, I will update you. I have registered an investigation issue in our issue tracking system with issue id: PDFNEWNET-34418.

Thank You & Best Regards,

Hi Corne,


Thanks for your patience.

In order to accomplish the requirement to Check if PDF file contains Blank / Empty Pages (If page has a blank Image on it) can be accomplished with following code snippet.

[C#]

Document doc = new Document(“d:/pdftest/34418.pdf”);<o:p></o:p>

foreach (Page page in doc.Pages)

{

Console.WriteLine("Page {0} is {1}", page.Number, IsBlankPage(page));

}

static private bool HasOnlyWhiteColor(Page page)

{

foreach (Operator op in page.Contents)

if (op is Operator.SetColorOperator)

{

Operator.SetColorOperator opSC = op as Operator.SetColorOperator;

System.Drawing.Color color = opSC.getColor();

if (color.R != 255 || color.G != 255 || color.B != 255)

return false;

}

return true;

}

static private bool IsWhiteImage(XImage image)

{

MemoryStream ms = new MemoryStream();

image.Save(ms);

System.Drawing.Bitmap bmp = new System.Drawing.Bitmap(ms);

for (int j = 0; j < bmp.Height; j++)

for (int i = 0; i < bmp.Width; i++)

{

System.Drawing.Color color = bmp.GetPixel(i, j);

if (color.R != 255 || color.G != 255 || color.B != 255)

return false;

}

return true;

}

static private bool HasOnlyWhiteImages(Page page)

{

// return true if no images exist or all images are white

if (page.Resources.Images.Count == 0)

return true;

foreach (XImage image in page.Resources.Images)

if (!IsWhiteImage(image))

return false;

return true;

}

static private bool IsBlankPage(Page page)

{

if ((page.Contents.Count == 0 && page.Annotations.Count == 0) ||

(HasOnlyWhiteColor(page) && HasOnlyWhiteImages(page)))

return true;

return false;

}


Please try using it and in case you face any problem or you have any further query, please feel free to contact.

The issues you have found earlier (filed as PDFNEWNET-34418) have been fixed in Aspose.Pdf for .NET 7.5.0.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
aspose.notifier:

The issues you have found earlier (filed as PDFNEWNET-34418) have been fixed in Aspose.Pdf for .NET 7.5.0.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

This was an automated message and is very misleading as I spent a lot of time trying to find the new feature that doesn't exist. The fix was not in Aspose.PDF for .NET 7.5.0. Aspose hopes that users find the solution above to be acceptable.

The above solution wont work most of the time because the extra blank page is most frequently caused by white space. (a user had a few extra returns or trailing spaces on their word document or HTML)

Hi Rick,


Please accept our apologies for the inconvenience faced. We will try our best to assign appropriate fix category in future releases notes to avoid any such misunderstanding. Secondly we truly appreciate your feedback and sharing enhanced solution on subjected issue.

Best Regards,

Hi

Any new feature added in latest pdf api to detect blank page ? not using TextAbsorber



Regards
Aravind
bpanchu:
Hi
Any new feature added in latest pdf api to detect blank page ? not using TextAbsorber
Hi Aravind,

Thanks for contacting support.

Are you facing any issue while following the approach shared in one of my earlier posts 426998 shared in this thread. It provides the steps to identify if the input document contains any blank pages or not. In case you are facing any issue, please share the input document, so that we can test the scenario in our environment.

Is this file need to OCR first ?



Regards
Aravind

Hi Aravind,


Thanks for contacting support.

The above shared code snippet will search for Text and image objects inside PDF file and even it will manipulate white images inside the document and you do not need to first perform OCR over document and then perform further manipulation.

In case you encounter any issue, please share the input document.

I am getting this error
Error CS0426 The type name ‘SetColorOperator’ does not exist in the type ‘Operator’

@senthilspi

In the latest versions of the API the operators Classes and methods have been moved under the namespace Aspose.Pdf.Operators. You can please use it like Aspose.Pdf.Operators.SetColorOperator.