Problems with HTML to PDF/Word conversion and Page Breaks

I am working on a complex application that generates HTML documents that I am evaluating Aspose Words.NET to convert the HTML documents to PDF/Word format. It does a great job, however, when I have HTML page breaks in table cells of the HTML documents I get the exception “cannot insert the requested break inside a table.” The page breaks are simple HTML tags below.
Has anyone run into this? If so is there a solution?
Thanks for any help!

I am working on a complex application that generates HTML documents that I am evaluating Aspose Words.NET to convert the HTML documents to PDF/Word format. It does a great job, however, when I have HTML page breaks in table cells of the HTML documents I get the exception “cannot insert the requested break inside a table.” The page breaks are simple HTML tags below.

Has anyone run into this? If so is there a solution?

Thanks for any help!

Hi John,

Thanks for your inquiry. It would be great if you please share following detail for investigation purposes.

  • Please attach your input Word and Html document.
  • Please

create a standalone/runnable simple application (for example a Console
Application Project
) that demonstrates the code (Aspose.Words code) you used to generate
your output document

Unfortunately,
it is difficult to say what the problem is without the Document(s) and
simplified application. We need your Document(s) and simple project to
reproduce the problem. As soon as you get these pieces of information to
us we’ll start our investigation into your issue.

Sure it is very easy to reproduce. I included a .NET 4.5 project with a simple sample HTML document in it. It seems to be unhappy with having a page break in a table cell.

See attached.

Hi John,

Thanks for sharing the detail. I have tested the scenario and have managed
to reproduce the same issue at my side. For the sake of correction, I
have logged this problem in our issue tracking system as WORDSNET-12158.
I have linked this forum thread to the same issue and you will be
notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Thanks for the quick responses! Any idea as to timeline for fixes of this nature? We are looking at going into production and buying the component within a couple months.

Hi John,

Thanks
for your inquiry. I would like to share with you that issues are
addressed and resolved based on first come first serve basis. Currently,
this feature is pending for analysis and is in the queue. We will update
you via this forum thread once there is any update available on this feature.

Thank you for your patience and understanding.

The issues you have found earlier (filed as WORDSNET-12158) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

Sorry for the late reply on this thread, I have been away.

Thanks again for taking the time to help me solve this issue! Paging is very important to our product and customers.

I pulled down the version referenced in this thread and tested it. So the problem of getting an exception on a page break in the table cell now is solved however, now it looks like no page breaks are working at all in this version. Inside or outside of a table cell.

I have tried both tags:

and neither work.

I have attached a sample project that demonstrates the problem and references the new version listed in the thread.

Please let me know if there is anything further that you need from me or if I can help in any way.

We are looking to buy an enterprise license by the end of this month to use with our new Cloud solution, however, first my manager wants to make sure we can support pagination since it is so important to our product.

Thanks again for all your help!!

John Lewis

jlewis@comprose.com

COMPROSE, Inc.

425-281-4910

OK slight update to this. I downloaded the latest version of ASPOSE Words .NET from Nuget and I am able to get a page break to work so long as it is not in any other tag by using the BR tag below. As soon as the page break is surrounded by a table cell or another tag like P or DIV then it no longer works.

Hi John,

Thanks for your inquiry. You can use page break style in P and BR tags as shown below.

PAGE BREAK


I suggest you please read ‘Keeps and Breaks’ section of following documentation link. Hope this helps you.
https://docs.aspose.com/words/net/paragraph-features-supported-on-html-import/

Please let us know if you have any more queries.

Thanks for the information, we have tried this with success so long as the break is not in a table. How do we get the breaks to work in a table cell?

Example code: (Does not work with latest version of Aspose)

1:

This is text and then BREAK
Now text on next page.

2:

This is text and then BREAK
Now text on next page.

3:

This is text and then BREAK


PAGE BREAK

Now text on next page.

Thanks again for your help!

Hi John,

Thanks for your inquiry.

Please use FirstCell.FirstParagraph.ParagraphFormat.PageBreakBefore property as shown below to achieve your requirements. Hope this helps you. Please let us know if you have any more queries.

Document doc = new Document(MyDir + "in.html");
foreach (Table table in doc.GetChildNodes(NodeType.Table, true))
{
    foreach (Paragraph para in table.GetChildNodes(NodeType.Paragraph, true))
    {
        if (para.ParagraphFormat.PageBreakBefore)
        {
            ((Row)para.GetAncestor(NodeType.Row)).FirstCell.FirstParagraph.ParagraphFormat.PageBreakBefore = true;
        }
    }
}
// Save output document
doc.Save(MyDir + "Out.docx");

Thanks for the quick response!

We are converting HTML documents to PDF. So there is no way the HTML to PDF conversion cannot read the page breaks without the code you provided? Just curious because the documentation says you support the CSS page breaks.

So I tried this code example with the HTML example below and I attached the result. The page break does not happen at the in the correct place it happens at the top of the table cell rather than inside the cell. In this example it should page break after “FIRST PAGE” but instead it does a page break before FIRST PAGE on the whole cell. Our table cells can have multiple page breaks in them. (See attached SingleRowAllBreak.pdf)

<table>
    <tr>
        <td>
            <p>FIRST PAGE <p style="page-break-before: always;clear:both">PAGE BREAK</p>SECOND PAGE</p>
        </td>
    </tr>
</table>  

Also when there are multiple rows I changed the code, but it didn’t work at all everything stayed on the same page and didn’t break at all. (See attached MultipleRowExample.pdf)

HTML:

<table>
    <tr>
        <td>
            <table>
                <tr><td>
                        <p>This goes on FIRST PAGE</p>
                    </td>
                </tr>
                <tr>
                    <td>
                        <p>
                            FIRST PAGE
                            <p style="page-break-before: always;clear:both">
                                PAGE BREAK
                            </p>
                            SECOND PAGE
                        </p>
                    </td>
                </tr>
            </table>
        </td>
    </tr>
</table>

CODE:

foreach (Table table in doc.GetChildNodes(NodeType.Table, true))
{
    foreach (Paragraph para in table.GetChildNodes(NodeType.Paragraph, true))
    {
        if (para.ParagraphFormat.PageBreakBefore)
        {
            ((Cell)para.GetAncestor(NodeType.Cell)).FirstParagraph.ParagraphFormat.PageBreakBefore = true;
        }
    }
}

I also tried this code to make it actually break on the “Break” paragraph

node but that didn’t work at all either.

foreach (Table table in doc.GetChildNodes(NodeType.Table, true))
{
    foreach (Paragraph para in table.GetChildNodes(NodeType.Paragraph, true))
    {
        if (para.ParagraphFormat.PageBreakBefore)
        {
            var cell = ((Cell) para.GetAncestor(NodeType.Cell));
            foreach (Paragraph innerPara in cell.Paragraphs)
            {
                if (innerPara.ParagraphFormat.PageBreakBefore)
                {
                    innerPara.ParagraphFormat.PageBreakBefore = true;
                }
            }
        }
    }
}

Thanks again for your help!

Hi John,

Thanks for your inquiry. Please note that Aspose.Words mimics the same behavior as MS Word does. If you load the same Html in MS Word, there will no page break. If you insert page break inside table’s row, the table will be split into two tables.

Your input html may have multiple columns. If a paragraph inside TD tag have page break CSS (e.g see the following html), what will be your desired output? If the page break CSS applied to a paragraph in the second or third cell of table, what will be the output?

Please manually create your expected Word document using Microsoft Word and attach it here for our reference. We will investigate as to how you want your final Word output be generated like. We will then provide you more information on this along with code.

FIRST PAGE
PAGE BREAK
SECOND PAGE
First
Second
Third
Second row
Second
Third
Second row
Second
Third

Thanks for the response!

Yeah you are correct, Word does not support page breaks inside of tables. At least the way we like. I have attached the format we are trying to achieve. It is from our existing product and it has a gutter on the side. The only way we have been able to figure out the gutter is by using tables. Is there a better way that Aspose supports to do side-by-side paragraphs?

Thanks again for your help!

Hi John,

Thanks for your inquiry. Please use PageSetup.Gutter Property to get or set the amount of extra space added to the margin for document binding.

If you still face problem, please share your input Html and expected output Word and Pdf documents here for our reference. We will then provide you more information on this along with code.