How can I exclude blank pages in a PDF
Hi,
Thank you for being patient.
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Our development team is currently investigating this feature under issue id: PDFNEWNET-34152 in our issue tracking system. We will update you via this forum regarding their feedback.
Sorry for the inconvenience,
Hi,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
You can try Aspose.Pdf.DOM to explore page contents and check if
page is blank or not.
In most simple case, page is blank if contents collection is empty (does not
contain any operator).
Additionally, you can also check and make sure that page does not contain any
annotation. So, you may try and check if page blank or not with the following
simple function:
private bool IsPageBlank(Page page)
{
return page.Contents.Count == 0 && page.Annotations.Count == 0;
}
You can iterate through document pages and check every page if it is blank or not. For example:
Document doc = new Document("input.pdf");
for (int i = 1; i <= doc.Pages.Count; i++)
{
if (IsPageBlank(doc.Pages[i]))
Console.WriteLine("Page " + i + " is blank");
}
Also, to delete blank pages, you may use the following code:
Document doc = new Document("input.pdf");
int index = doc.Pages.Count;
while (index > 0)
{
Page page = doc.Pages[index];
if (IsPageBlank(page))
{ doc.Pages.Delete(index); Console.WriteLine("Page " + index + " is blank, deleted"); }
index--;
}
doc.Save("Output.pdf");
Please note that in some more complex cases page may looks blank
even if it is not really empty.
For example if page contains only operators to save/restore graphic state and
does not contain any output/draw operator, etc. In that case, IsPageBlank
function will require enhancement as per the page content.
Please try the above suggested code and in case you still face any issue, please share your generated PDF file with us to check it further.
Sorry for the inconvenience,
And what if the page contains an empty (= white) image?
This is a blank page but your IsPageBlank method will not detect this.
Hi Corne,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Thank you for using our product.
Well, technically, if a PDF contains an image (which is blank) means it is not a blank PDF because it has image contents. However, regarding this specific scenario, I have requested our development team to share further details if it is possible to handle such a scenario. As soon as I get a response, I will update you. I have registered an investigation issue in our issue tracking system with issue id: PDFNEWNET-34418.
Thank You & Best Regards,
The issues you have found earlier (filed as PDFNEWNET-34418) have been fixed in Aspose.Pdf for .NET 7.5.0.
This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(1)
How do we use this new feature? I see nothing in the documentation or searches.
The only thing that I can find is this on the 7.5.0 new features list:
Check if PDF file contains Blank / Empty Pages (If page has a blank Image on it)
I have the following routine to check for blank pages from the document. It checks for whitespace (which is typically black text such as a space that is not visible). It checks to see if all the images are white. It checks to see if all of the operators are white (as suggested by aspose). I assume that the operators include the background.
Function IsBlankPdfPage(page As Aspose.Pdf.Page) As Boolean
If (page.Contents.Count = 0 AndAlso page.Annotations.Count = 0) OrElse ((PdfPageHasNoText(page) OrElse PdfPageHasOnlyWhiteColor(page)) AndAlso PdfPageHasOnlyWhiteImages(page)) Then
Return True
End If
Return False
End Function
<span style=“font-family: “Tahoma”,“sans-serif”; font-size: 10pt; mso-fareast-font-family: Calibri; mso-fareast-theme-font: minor-latin; mso-no-proof: yes;”>
Hi Rick,
I used "content.Trim().Length() = 0" to determine whether a string was blank or not.
Hi Rick,
Aspose.PDF for .NET now offers improved way to detect blank pages inside PDF. You can use Page.IsBlank() method in order to determine whether a page is blank or not. Following is complete code snippet to detect blank page:
Determine whether a PDF page is blank
Document pdfDocument = new Document(dataDir + "Test.pdf");
bool isBlank = pdfDocument.Pages[1].IsBlank(0.01d);
Afterwards, you can delete blank page from PDF document as well. Please check following article in API documentation: