Html export to PDF/DOC/RTF differences

Hello,
I am currently testing Aspose.words for converting HTML report into DOC, RTF and PDF.
The styles are for the tables a bit of an issue, maybe you could address this in a future release (nearby I hope).
A real issue is that the tables in the pdf are not full width, however in the Doc they are.
Where as in the Doc the header of the table displays a white line at top, which is not shown at all in the PDF.Also the table in the Doc displays an empty row, which again is not displayed in PDF.
The first page contains a table within a table to display a centered logo image and some text, this works ok for the pdf but the Doc shows some of the table gridlines.
How can I get rid of the gridlines within the doc (and rtf)?
Is there a possibility to get rid of the white line in the header of the tables?
How can I get the table width 100% in the pdf, like it is in the Doc and rtf?
Regards,
Kees

Hello,
I was able to remove the last row if it is empty by going through the tables.
However sometimes a empty row is within a table indicating that a new collection of items. I require for this to have those collections to be shown each in their own table, although they are within a single physical dataset/datatable object.
So that remains for this the question how i can locate this empty row and split the table into two tables?
And of course the other two questions (table width in pdf and white line in table header of doc) still remain.
Regards,
Kees

Hi

Thanks for your inquiry. Could you please attach your document and provide sample code, which will allow me to reproduce the problem on my side? I will check the issue and provide you more information.
Best regards,

Hello,
At your request I enclosed output in doc and pdf and the input html source.
My code for the doc:

public byte[] GenerateDoc(string Html, ReportOptions options)
{
    Document doc = new Document(HtmlToStream(Html));
    UpdateTableFormatting(doc);
    UpdateCellFormatting(doc);
    MemoryStream msdoc = new MemoryStream();
    BuildDoc(doc, options, Html);
    doc.SaveOptions.ExportPrettyFormat = true;
    doc.Save(msdoc, SaveFormat.Doc);
    return msdoc.ToArray();
}
private void UpdateTableFormatting(Document doc)
{
    List <Node> nodes = new List <Node> ();
    NodeCollection tables = doc.GetChildNodes(NodeType.Table, true);
    int tablesfound = tables.Count;
    if (tablesfound> 0)
    {
        int c = 0;
        foreach(Table table in tables)
        {
            if (c <2)
            {
                foreach(Row row in table.Rows)
                {
                    row.RowFormat.Borders.ClearFormatting();
                    foreach(Cell cell in row.Cells)
                    {
                        cell.CellFormat.Borders.ClearFormatting();
                    }
                }
            }
            Row lastrow = table.LastRow;
            string lasttext = lastrow.GetText();
            lasttext = lasttext.Replace("\a", "");
            lasttext = lasttext.Trim();
            if (lasttext == null || lasttext == string.Empty)
            {
                table.LastRow.Remove();
            }
            else
            {
                table.LastRow.RowFormat.AllowBreakAcrossPages = false;
            }
            foreach(Row row in table.Rows)
            {
                if (!row.IsFirstRow && !row.IsLastRow)
                {
                    string rowtext = row.GetText();
                    rowtext = rowtext.Replace("\a", "");
                    rowtext = rowtext.Trim();
                    if (rowtext == null || rowtext == string.Empty)
                    {
                        nodes.Add(row);
                        row.RowFormat.AllowBreakAcrossPages = true;
                        row.RowFormat.Borders.LineStyle = LineStyle.None;
                        row.RowFormat.ClearCellPadding();
                        foreach(Cell cell in row.Cells)
                        {
                            cell.CellFormat.Borders.ClearFormatting();
                        }
                        // table.Rows[i].Cells.Clear();
                    }
                    else
                    {
                        row.RowFormat.AllowBreakAcrossPages = false;
                    }
                }
            }
            Row firstrow = table.FirstRow;
            firstrow.RowFormat.AllowBreakAcrossPages = false;
            c++;
        }
    }
    foreach(Node node in nodes)
    {
        // here must come code to split the table and found node.
    }
}
private void UpdateCellFormatting(Document doc)
{
    NodeCollection cells = doc.GetChildNodes(NodeType.Cell, true);
    foreach(Cell cell in cells)
    {
        if (cell.FirstParagraph.ParagraphFormat.Shading.ForegroundPatternColor != Color.Empty)
        {
            cell.CellFormat.Shading.BackgroundPatternColor = cell.FirstParagraph.ParagraphFormat.Shading.ForegroundPatternColor;
        }
    }
}
private ReplaceAction ReplaceWithFieldEvaluator(object sender, ReplaceEvaluatorArgs e)
{
    DocumentBuilder builder = new DocumentBuilder((Document) e.MatchNode.Document);
    Run matchrun = (Run) e.MatchNode;
    string runText = matchrun.Text;
    Run run = (Run) e.MatchNode.Clone(true);
    run.Text = runText.Substring(runText.IndexOf(e.Match.Value) + e.Match.Value.Length);
    matchrun.Text = runText.Substring(0, runText.IndexOf(e.Match.Value));
    e.MatchNode.ParentNode.InsertAfter(run, e.MatchNode);
    builder.MoveTo(run);
    if (e.Match.Value == "[PAGE]")
        builder.InsertField("PAGE", null);
    else if (e.Match.Value == "[NUMPAGES]")
        builder.InsertField("NUMPAGES", null);
    e.Replacement = "";
    return ReplaceAction.Replace;
}
protected void BuildDoc(Document doc, ReportOptions options, string html)
{
    DocumentBuilder builder = new DocumentBuilder(doc);
    PageLayout(builder.PageSetup, options.Layout);
    AddHeader(builder, options, GetPageHeader(html, true));
    builder.MoveToDocumentStart();
    AddFooter(builder, options, GetPageFooter(html, true));
    doc.Range.Replace(new Regex(@"\[PAGE\]"), new ReplaceEvaluator(ReplaceWithFieldEvaluator), true);
    doc.Range.Replace(new Regex(@"\[NUMPAGES\]"), new ReplaceEvaluator(ReplaceWithFieldEvaluator), true);
    builder.MoveToDocumentStart();
}
private void PageLayout(PageSetup pagesetup, ReportLayout layout)
{
    switch (layout.PaperSize)
    {
        case ReportPaperSizeType.A4:
            pagesetup.PaperSize = PaperSize.A4;
            break;
        case ReportPaperSizeType.Letter:
            pagesetup.PaperSize = PaperSize.Letter;
            break;
        default:
            pagesetup.PaperSize = PaperSize.A4;
            break;
    }
    pagesetup.DifferentFirstPageHeaderFooter = false;
    pagesetup.HeaderDistance = layout.Margins.Top;
    pagesetup.FooterDistance = layout.Margins.Bottom;
    pagesetup.LeftMargin = layout.Margins.Left + layout.Offsets.Left;
    pagesetup.RightMargin = layout.Margins.Right + layout.Offsets.Right;
    pagesetup.TopMargin = layout.Margins.Top + layout.Header.Height;
    pagesetup.BottomMargin = layout.Margins.Bottom + layout.Footer.Height;
}
private void AddHeader(DocumentBuilder builder, ReportOptions options, string html)
{
    builder.MoveToHeaderFooter(HeaderFooterType.HeaderPrimary);
    builder.ParagraphFormat.LeftIndent = -options.Layout.Offsets.Left;
    builder.ParagraphFormat.RightIndent = -options.Layout.Offsets.Right;
    builder.InsertHtml(html);
}
private void AddFooter(DocumentBuilder builder, ReportOptions options, string html)
{
    builder.MoveToHeaderFooter(HeaderFooterType.FooterPrimary);
    builder.ParagraphFormat.LeftIndent = -options.Layout.Offsets.Left;
    builder.ParagraphFormat.RightIndent = -options.Layout.Offsets.Right;
    html = html.Replace("%pn%/%pt%", "[PAGE]/[NUMPAGES]");
    builder.InsertHtml(html);
}

My code for the PDF:

public byte[] GeneratePDF(string html, ReportOptions options)
{
    Document doc = new Document(HtmlToStream(html));
    MemoryStream pdf = new MemoryStream();
    BuildDoc(doc, options, html);
    doc.SaveOptions.ExportPrettyFormat = true;
    doc.Save(pdf, SaveFormat.Pdf);
    return pdf.ToArray();
}

The builddoc in the pdf is a call to the builddoc from the doc.

Builddoc is shared among all exports made through Aspose.Words

Best regards,
Kees

Hi

Thank you for additional information.

  1. MS Word shows invisible borders if “Hide Gridlines” option is disabled. When you open the generated document using MS Word, you should select “Hide Gridlines” from “Table” menu. You cannot hide gridlines using Aspose.Words because “Table / Hide Gridlines” is option of MSWord.
  2. White lines are displayed in the table’s header because spacing before of paragraph is specified. To remove these white lines, you can use code like the following:
private void RemoWhiteLine(Document doc)
{
    // Get collection of cells
    Node[] paragraphs = doc.GetChildNodes(NodeType.Paragraph, true).ToArray();
    // Loop through all paragraphs.
    foreach(Paragraph par in paragraphs)
    {
        if (par.ParagraphFormat.SpaceBefore != 0)
            par.ParagraphFormat.SpaceBefore = 0;
    }
}
  1. Width of the table is not 100% because you changed page setup after loading HTML. If you change page setup before loading HTML, table width will be correct. For example, see the following code:
DocumentBuilder builder = new DocumentBuilder();
// Change pagesetup
builder.PageSetup.HeaderDistance = 5;
builder.PageSetup.FooterDistance = 5;
builder.PageSetup.LeftMargin = 25;
builder.PageSetup.RightMargin = 10;
builder.PageSetup.TopMargin = 65;
builder.PageSetup.BottomMargin = 45;
// Insert HTML.
builder.InsertHtml(File.ReadAllText(@"Test001\source.htm"));
// Save as DOC and PDF.
builder.Document.Save(@"Test001\out.doc");
builder.Document.Save(@"Test001\out.pdf");

Hope this information could help you to resolve your problems.
Best regards,

Hello,
Thanks very much, works like a charm.
Best regards,
Kees