We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Additional blank page appears between 2 pages

We have html that we convert to docx file for printing. When it’s printed an extra blank page appears between 2 pages. While debugging, we found that PageSplitter.GetDocumentOfPage(1) returns a doc with PageCount = 2, which is supposed to be just the first page. (see NOTE part in code comment)

Aspose.words version 17.3. PageSplitter file is downloaded from https://github.com/aspose-words/Aspose.Words-for-.NET

Code snippet, input and output are attached.

Can you please let us know why an additional page is generated? Thanks.

Hi Erin,

Thanks for your inquiry. I have tested the scenario and noticed the reported issue. I am further looking into the issue and will update you soon.

Best Regards,

Thanks for verifying the issue was able to be reproduced. Can we get any updates on the investigation?

Hi Erin,

Thanks for your patience. We have investigated the issue and found there are some empty rows causing the issue. You may remove empty rows before document splitting as following, it will help you to accomplish the task.

Document doc = new Document(loStream);

// Remove empty rows
Node[] rows = doc.GetChildNodes(NodeType.Row, true).ToArray();
foreach (Row row in rows)
{
    bool removeRow = true;
    foreach (Cell cell in row.Cells)
    {
        if (cell.FirstParagraph != null)
            removeRow = !cell.FirstParagraph.HasChildNodes;
    }
    if (removeRow)
        row.Remove();
}

Best Regards,

Hi Tilal,

I put the code in, and still getting the empty page. Here’s my output files. For second page of 000.docx, before the change there’s a line of text (but printed out as blank page). Now it’s all blank and printed the same (blank page)

Hi Erin,

Thanks for your feedback. Please check following updated code and sample output documents, hopefully it will help you to accomplish the task.

public static void Splitpage()
{
    Document doc = new Document(@"inputHtml.html", new Aspose.Words.HtmlLoadOptions());
    // Remove empty rows
    Node[] rows = doc.GetChildNodes(NodeType.Row, true).ToArray();
    foreach (Row row in rows)
    {
        bool removeRow = true;
        foreach (Cell cell in row.Cells)
        {
            if (cell.FirstParagraph != null)
                removeRow = !cell.FirstParagraph.HasChildNodes;
        }
        if (removeRow)
            row.Remove();
    }

    RenderedDocument layoutDoc = new RenderedDocument(doc);
    foreach (RenderedPage page in layoutDoc.Pages)
    {
        LayoutCollection<LayoutEntity> lines = page.GetChildEntities(LayoutEntityType.Line, true);
        Paragraph paragraph = (Paragraph)lines.Last.ParentNode;
        SplitRuns(paragraph);
    }

    LayoutCollector collector = new LayoutCollector(doc);
    DocumentPageSplitter pageSplitter = new DocumentPageSplitter(collector);
    for (int i = 1; i <= doc.PageCount; i++)
    {
        Document dstDoc = pageSplitter.GetDocumentOfPage(i);
        dstDoc.Save(@"Out" + i + ".docx");
    }
}

private static Run SplitRunNode(Run run, int position)
{
    Run afterRun = (Run)run.Clone(true);
    afterRun.Text = run.Text.Substring(position);
    run.Text = run.Text.Substring(0, position);
    run.ParentNode.InsertAfter(afterRun, run);
    return afterRun;
}

private static void SplitRuns(Paragraph paragraph)
{
    foreach (Run run in paragraph.GetChildNodes(NodeType.Run, true).ToArray())
    {
        int position = run.Text.IndexOf(' ');
        Run runnode = run;
        while (position >= 0 && runnode.Text.Length >= position)
        {
            Run newRun = SplitRunNode(runnode, position);
            position = newRun.Text.IndexOf(' ');
            if (position == -1)
                break;
            position++;
            runnode = newRun;
        }
    }
}

Best Regards,

Thanks for the update! I used the code you provided, but the page count of first page after splitting is still 2, and output file 000.docx has 2 pages. Then I realized I loaded document from stream, when I changed it to loading from a .html file like your code, page count is still 2, but output file has only 1 page, which is strange too. However, the final print still has additional blank page.

But either way, I’m getting pageCount = 2 from Splitter.GetDocumentOfPage(1), which I think should be 1, right?

public void CreatePrintFiles(string asHTMLText, string asXpsPath)
{
    // Save inbound HTML to file
    string lsXpsFileNoExt = asXpsPath.Substring(0, asXpsPath.Length - 4);
    loWriter = new System.IO.StreamWriter(lsXpsFileNoExt + ".html", true, Encoding.UTF8);
    loWriter.Write(asHTMLText);
    loWriter.Close();
    // Create document from HTML text
    byte[] laHTML = Encoding.UTF8.GetBytes(asHTMLText);
    MemoryStream loStream = new MemoryStream(laHTML);
    loStream.Position = 0;
    //Aspose.Words.Document loDoc = new Aspose.Words.Document(loStream, new Aspose.Words.HtmlLoadOptions()); // load from stream
    Aspose.Words.Document loDoc = new Document(lsXpsFileNoExt + ".html", new Aspose.Words.HtmlLoadOptions()); //load from saved file
…
}

Hi Erin,

Thanks for your feedback. We will appreciate it if you please share a sample console application here. It will help us to reproduce your reported issue at our end exactly and address it accordingly.

We are sorry for the inconvenience.

Best Regards,