Detect and Exclude empty/blank pages from PDF file using Aspose.PDF for .NET

How can I exclude blank pages in a PDF

Hi,

Thank you for being patient.

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Our development team is currently investigating this feature under issue id: PDFNEWNET-34152 in our issue tracking system. We will update you via this forum regarding their feedback.

Sorry for the inconvenience,

Hi,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

You can try Aspose.Pdf.DOM to explore page contents and check if page is blank or not.
In most simple case, page is blank if contents collection is empty (does not contain any operator).
Additionally, you can also check and make sure that page does not contain any annotation. So, you may try and check if page blank or not with the following simple function:

private bool IsPageBlank(Page page)

{

return page.Contents.Count == 0 && page.Annotations.Count == 0;

}

You can iterate through document pages and check every page if it is blank or not. For example:

Document doc = new Document("input.pdf");

for (int i = 1; i <= doc.Pages.Count; i++)

{

if (IsPageBlank(doc.Pages[i]))

Console.WriteLine("Page " + i + " is blank");

}

Also, to delete blank pages, you may use the following code:

Document doc = new Document("input.pdf");

int index = doc.Pages.Count;

while (index > 0)

{

Page page = doc.Pages[index];

if (IsPageBlank(page))

{ doc.Pages.Delete(index); Console.WriteLine("Page " + index + " is blank, deleted"); }

index--;

}

doc.Save("Output.pdf");

Please note that in some more complex cases page may looks blank even if it is not really empty.
For example if page contains only operators to save/restore graphic state and does not contain any output/draw operator, etc. In that case, IsPageBlank function will require enhancement as per the page content.

Please try the above suggested code and in case you still face any issue, please share your generated PDF file with us to check it further.

Sorry for the inconvenience,

And what if the page contains an empty (= white) image?

This is a blank page but your IsPageBlank method will not detect this.

Hi Corne,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for using our product.

Well, technically, if a PDF contains an image (which is blank) means it is not a blank PDF because it has image contents. However, regarding this specific scenario, I have requested our development team to share further details if it is possible to handle such a scenario. As soon as I get a response, I will update you. I have registered an investigation issue in our issue tracking system with issue id: PDFNEWNET-34418.

Thank You & Best Regards,

The issues you have found earlier (filed as PDFNEWNET-34418) have been fixed in Aspose.Pdf for .NET 7.5.0.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(1)

How do we use this new feature? I see nothing in the documentation or searches.

The only thing that I can find is this on the 7.5.0 new features list:
Check if PDF file contains Blank / Empty Pages (If page has a blank Image on it)


I have the following routine to check for blank pages from the document. It checks for whitespace (which is typically black text such as a space that is not visible). It checks to see if all the images are white. It checks to see if all of the operators are white (as suggested by aspose). I assume that the operators include the background.

Function IsBlankPdfPage(page As Aspose.Pdf.Page) As Boolean
If (page.Contents.Count = 0 AndAlso page.Annotations.Count = 0) OrElse ((PdfPageHasNoText(page) OrElse PdfPageHasOnlyWhiteColor(page)) AndAlso PdfPageHasOnlyWhiteImages(page)) Then
Return True
End If
Return False
End Function


<span style=“font-family: “Tahoma”,“sans-serif”; font-size: 10pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-no-proof: yes;”>

Hi Rick,


Thanks for your inquiry. Sorry for any confusion, actually in this fix we have recommended the customer to use Aspose.Pdf.DOM to explore page, if page blank or not here, no new method/properties included in API for the purpose. If this approach doesn’t help you, then please share your source document so we will investigate and suggest you solution accordingly.

Sorry for the inconvenience faced.

Best Regards,

I don't think you gave my response adequate consideration.

Your code wont work most of the time because the extra blank page is most frequently caused by white space. (a user had a few extra returns on their document or HTML)

Your code is very flawed as it doesn't consider carriage returns and spaces. Such invisible characters are called "white space". They might have colored attributes and most of the time do. A black space is invisible.

The code snippet that I provided takes that into account. It includes another function PdfPageHasNoText() which checks to see if the page has any text other than whitespace.

I used "content.Trim().Length() = 0" to determine whether a string was blank or not.

Private Function PdfPageHasNoText(page As Aspose.Pdf.Page) As Boolean
Dim tfa As New Aspose.Pdf.Text.TextAbsorber()
page.Accept(tfa)
Dim content As String = tfa.Text
If content.Trim().Length() = 0 Then
Return True
End If
Return False
End Function

Hi Rick,


Please accept my apology for the misunderstanding. Thanks for sharing the solution. It will be useful for others having similar requirement and we will definitely consider this code enhancement/workaround during the resolution of this problem as it will help us to know all possible reasons of this problem.

Best Regards,

@rickpaul, @cornelos

Aspose.PDF for .NET now offers improved way to detect blank pages inside PDF. You can use Page.IsBlank() method in order to determine whether a page is blank or not. Following is complete code snippet to detect blank page:

Determine whether a PDF page is blank

Document pdfDocument = new Document(dataDir + "Test.pdf");
bool isBlank = pdfDocument.Pages[1].IsBlank(0.01d);

Afterwards, you can delete blank page from PDF document as well. Please check following article in API documentation: