Avoid Extra Blank Pages while Converting HTML to DOCX or PDF using C# .NET over Ubuntu (Linux) | Process Last Paragraph

Hi,

After reading lot of posts for “Extra Blank page” issue, i tried almost all the solutions but none of them helped and issue still persists.

After converting html to docx/pdf using ASPOSE.WORDS there is always Extra Blank page at the end of document.

I am using the latest version i.e. 20.7 for .NET

following is the code.

        Aspose.Words.Document document = new Aspose.Words.Document(commonTemplate);
        
        DocumentBuilder documentBuilder = new DocumentBuilder(document);
        documentBuilder.InsertHtml(html);

        byte[] pdfStream = null;
        using (MemoryStream smgFileStream = new MemoryStream())
        {
            document.UpdatePageLayout();
            document.Save(smgFileStream, Aspose.Words.SaveFormat.Pdf);
            smgFileStream.Seek(0, SeekOrigin.Begin);
            pdfStream = smgFileStream.ToArray();
        }

@anupkasatttl,

To ensure a timely and accurate response, please ZIP and attach the following resources here for testing:

  • Your simplified input Word document
  • The HTML file or String you are inserting in Word document
  • Aspose.Words 20.8 generated DOCX and PDF files showing the undesired behavior

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information.

Hi,

please find attached requested documents. FYI we are using version 20.7

Html to PDF ASPOSE.words.zip (89.8 KB)

Also tell If the website is deployed on Ubuntu Server do we need to add any other nuget packages to the project?

Thanks

@anupkasatttl,

Please note that HTML is a non-paginated format but MS Word has to layout content of DOCX into pages that is why we will sometimes see empty pages at the end. In this case, you can workaround this problem of empty page by reducing the size of last Paragraph etc. Please check following C# code:

Document doc = new Document("C:\\Temp\\Html to PDF ASPOSE.words\\pdfhtml.html");
Paragraph lastPara = doc.LastSection.Body.LastParagraph;
if (lastPara.IsEndOfDocument && string.IsNullOrEmpty(lastPara.ToString(SaveFormat.Text).Trim()))
{
    lastPara.RemoveAllChildren();
    lastPara.ParagraphBreakFont.Size = 1;
}
doc.Save(@"C:\Temp\Html to PDF ASPOSE.words\20.8.docx");
doc.Save(@"C:\Temp\Html to PDF ASPOSE.words\20.8.pdf");  

If Aspose.Words for .NET Standard is intended to be used in Linux environment, an additional NuGet package should be referenced to make it work correctly with graphics: SkiaSharp.NativeAssets.Linux for Ubuntu (it also should work on most Debian-based Linux distributions) or Goelze.SkiaSharp.NativeAssets.AlpineLinux for Alpine Linux.