VB.NET Removing empty pages from document

Hello,
We’re using form-fields to process school-related documents.
In some cases, the output of those fields is empty on some pages, thus we want to remove them.
We didn’t find any way to do so, but:

doc = original document

 Dim newDocument As Aspose.Words.Document = doc.Clone(False), anyChange As Boolean = False
 For pageIndex As Byte = 0 To doc.PageCount() - 1 Step 1
   Dim page As Aspose.Words.Document = doc.ExtractPages(pageIndex, 1)
   If New System.Text.RegularExpressions.Regex("[a-zA-Z0-9]").IsMatch(page.GetText()) Then
     newDocument.AppendDocument(page, Aspose.Words.ImportFormatMode.KeepSourceFormatting)
     newDocument.UpdatePageLayout()
     anyChange = True
   End If
 Next pageIndex

Problem is, design and background are twisted, form-fields aren’t in their original place, etc…
Any suggestions? we’re using v22.7.0.0 (licensed, latest version for this license)

@YosiShamir Could you please attach your input, output and expected output documents here for testing? We will check the issue and provide you more information.

Link removed

This the original file, once processed, the output is:

Link Removed

It seems that since the content is in flow mode, paging isn’t fixed properly and thus, formatting is lost.

@YosiShamir The problem occurs because in the extracted pages the first section has continues section start. In your case you should reset it to new page. Please try modifying your code like this:

Dim notEmptyPageRegex As New System.Text.RegularExpressions.Regex("[a-zA-Z0-9]")

Dim doc As New Document("C:\Temp\in.docx")

Dim newDocument As Aspose.Words.Document = doc.Clone(False), anyChange As Boolean = False
For pageIndex As Byte = 0 To doc.PageCount() - 1 Step 1
    Dim page As Aspose.Words.Document = doc.ExtractPages(pageIndex, 1)
    page.FirstSection.PageSetup.SectionStart = SectionStart.NewPage
    If notEmptyPageRegex.IsMatch(page.GetText()) Then
        newDocument.AppendDocument(page, ImportFormatMode.UseDestinationStyles)
        anyChange = True
    End If
Next pageIndex

newDocument.Save("C:\Temp\out.docx")

Also, in one of future versions of Aspose.Words we are going to provide a built-in method for removing empty pages from the document. This feature request is logged as WORDSNET-24707. We will let you know once this method is available.

Looks good and works well, thank you very much!

1 Like

The issues you have found earlier (filed as WORDSNET-24707) have been fixed in this Aspose.Words for .NET 24.4 update also available on NuGet.