InsertHTML 'HTML Fragment' in a Table Cell

Hi,
I am switching from inserting pure text into an Table Cell to using the builder to insert a HTML Fragment. The fragment might look like (in VB.NET):

pstrDataForRow = "<p>This is a fragment.</p>"

(A simple wrapper around the current text).
This is the current code in VB.Net
.Write(pstrDataForRow)
This is the replacement
.InsertHtml(pstrDataForRow)
The normal write into the cell with text only has been working fine for a long time. With the .InsertHTML, I’m getting an extra blank line in the Cell following the insertion of the HTML Fragment.
Is the extra blank line normal or expected using .InsertHTML? Is there any way to set the builder properties to not get the blank line? Is there anyway after the insert to remove the blank line from the cell?
(NOTE: I’ve been able to add bullets, and other HTML that works fine. The only problem is the extra blank line).
(NOTE: I’ve only tested using output to PDF or XPS).
Any throughts or pointers appreciated!!
Regards, Bruce

Hi Bruce,
Thanks for your inquiry.
This is happening most likely because the cell already contains an empty paragraph. You can use code like below to remove it.

builder.StartTable();
Cell cell = builder.InsertCell();
builder.InsertHtml(content);
cell.Paragraphs[1].Remove();
builder.EndRow();
builder.EndTable();

Thanks,

Many thanks!!!I 'm using the .RemoveAt(-1) to remove the last paragraph … this works!!

I would like to be sure that the last paragraph is Empty before removing it. Is there a quick way to check the last paragraph is empty before deleting the paragraph.
should I also use .IsEndOfCell to be sure it is the last?
I’m also getting a blank like showing up after a bulleted list. I’m thinking it might be necessary to scan the cell nodes to see if there are any blank nodes. Some might actually be useful for white space … so I’ll investigate further.
My feeling is that the only blank rows should be due to HTML formatting so I’ll keep investigating on my end to see why they might be showing up in the cell.
I really appreciate the pointers and help!!
Regards, Bruce

Hi Bruce,
Thanks for your inquiry. The problem occurs because text encased within
tag is interpreted as a paragraph, but table cell by default already contains one empty paragraph. That is why after inserting HTML like
text
into table cell, you see two paragraphs in the cell, one with text and another empty. You can easily resolve this by removing the empty paragraph from the table cell. For instance, please see the following code:

DocumentBuilder builder = new DocumentBuilder();
builder.CellFormat.Borders.LineStyle = LineStyle.Single;
Cell cell = builder.InsertCell();
builder.InsertHtml("<p>This is a fragment.</p>");
// Now we can remove and empty paragrapg at the end of the cell.
if (!cell.LastParagraph.HasChildNodes)
    cell.LastParagraph.Remove();
builder.EndRow();
builder.EndTable();
builder.Document.Save(@"Test001\out.doc");

Hope this helps.
Best regards,

Hi,
Just what I was looking for … how to test lastparagraph can be removed.
Re: Bulleted Lists
I’m still getting an extra line after a bulleted list. I need to remove the lastparagraph - however, it seems there is something else causing the extra line and I cannot see it.
I’ve attached some examples of the HTML in the attachment. Could you try these in your example and see if you see the same extra line in the cell.
NOTE: everything else around this .InsertHTML statement appears to work fine as I’m just replacing a .write statement. The HTML is currently being generated from a ‘MarkDown’ converter which seems to be doing a very good job.
Many thanks for your help!!
Regards, Bruce

Hi Bruce,

Thank you for additional information. A gap after the list is spacing after the paragraph. You can reset this spacing to resolve the issue. For instance, you can use the following code:

DocumentBuilder builder = new DocumentBuilder();
builder.CellFormat.Borders.LineStyle = LineStyle.Single;
 
Cell cell = builder.InsertCell();
builder.InsertHtml("<ul><li>First item  </li><li>Second item in the list  </li><li>Third item in the list for testing the InsertHTML  </li><li>This is a really long line.  This is a really long line.  This is also a really long line.  </li><li>This is the final item in the list and may generate an extra row.</li></ul>");
// Now we can remove and empty paragrapg at the end of the cell.
if (!cell.LastParagraph.HasChildNodes)
    cell.LastParagraph.Remove();
// If the last paragraph in the cell is a list item, we can reset spaceafter.
if (cell.LastParagraph.IsListItem)
{
    cell.LastParagraph.ParagraphFormat.SpaceAfterAuto = false;
    cell.LastParagraph.ParagraphFormat.SpaceAfter = 0;
}
 
builder.EndRow();
builder.EndTable();
 
builder.Document.Save(@"Test001\out.doc");

Hope this helps.

Best regards,

Hi,
Excellent!!! I now understand why I did not see a blank paragraph in the nodes; however, I did not think about ‘space after’ causing the space. All is working fine after applying your suggested code!!
I also understand how to tidy up any other problems with the HTML. The Cell Object provides access to all of the nodes and their interpretation of the HTML.
I really appreciated the help!!
Regards, Bruce

Hi Bruce,
It is perfect that you managed to resolve the problems. Please feel free to ask in case of any issues, my colleagues and I will be glad to assist you.
Best regards,

Hi,
Extending this problem just a bit. I now need to InsertHTML at the current point in a document using the builder. Because the previous code was in a cell, there were some nice functions to get the lastparagraph and check if empty.
Is there a way to InsertHTML into a paragraph where I can know that all of the nodes are in one place and find the last paragraph as when using a Cell in a table.
I know I can get the current paragraph and I believe this is the last empty paragraph the InsertHTML added … is this correct, are there some easy ways to remove the line or any space after on bullets.
Thanks in advance …
Regards, Bruce

Hi Bruce,
Thanks for your inquiry.
I think you can use the code found in this post here to achieve what you want. In your case you can implement it like this:

// This will hold the start and end nodes of the inserted HTML
NodeRange nodeRange = new NodeRange();
// Set the node changing handler to catch inserted nodes and pass the node range object used to the store
// the nodes we are looking for.
doc.NodeChangingCallback = new FindNodeRangeHtml(nodeRange);
// Insert the HTML
builder.InsertHtml(text);
// Remove the node changing callback
doc.NodeChangingCallback = null;
// Remove the last node inserted from HTML
nodeRange.EndNode.Remove();

If this does not help could you please provide a sample template which demostrates the issue.
Thanks,

Hi,
Many thanks for the suggestion. I understand the approach and like the NodeChangingCallBack etc.
However, I think my questions are a bit simpler than that:

  1. InsertHTML basic operations…
    I’m assuming that if the HTML being inserted consists of a number of paragraphs, these will be added at the same level as the current paragraph (and not inserted into one node).
The CurrentParagraph before the InsertHTML = starting paragraph

The CurrentParagraph after the InsertHTML = Last Paragraph entered using InsertHTML.
(or is all of the HTML inserted before the currentparagraph or after the current paragraph?)

  1. How do I move to the previous paragraph?

If I find that the current paragraph is actually empty (as in the cell), how do I delete this paragraph and select the previous paragraph so I can check if this needs to be deleted or the space after removed. (see the initial problem with being inside a cell). It was easy in a cell as I could always find the ‘lastparagraph’. The previoussibling function for the paragraph … seems to be node related not paragraph related.
Basically I want to do the same operations as in the cell on the last paragraph inserted by the InsertHTML.
Any pointers or suggestions appreciated!!
Regards, Bruce

Hi Bruce,

Thank you for additional information.

  1. Yes, CurrentParagraph after inserting HTML is the last paragraph inserted by InsertHtml. So you can use code like the following to remove an empty paragraph:
builder.InsertHtml("<p>this is paragraph</p>");
// Remove current paragraph if it is empty.
if (!builder.CurrentParagraph.HasChildNodes)
    builder.CurrentParagraph.Remove();
  1. However, if you remove current paragraph, DocumentBuilder will lot its position. So you need to move to the end of previous paragraph. You can achieve this using code like the following:
builder.InsertHtml("<p>this is paragraph</p>");
// Remove current paragraph if it is empty.
if (!builder.CurrentParagraph.HasChildNodes)
{
    // Get previouse node.
    Node prevNode = builder.CurrentParagraph.PreviousSibling;
    // we do not need to remove current paragraph if th epreviouse node is table for example.
    if (prevNode.NodeType == NodeType.Paragraph)
    {
        builder.CurrentParagraph.Remove();
        builder.MoveTo(prevNode);
    }
}

However, there can be a situation when there are few empty paragraphs, if you need to remove all of them, you should use while loop instead of simple condition:

builder.InsertHtml("<p>this is paragraph</p>");
// Remove current paragraph if it is empty.
while (!builder.CurrentParagraph.HasChildNodes)
{
    // Get previouse node.
    Node prevNode = builder.CurrentParagraph.PreviousSibling;
    // we do not need to remove current paragraph if th epreviouse node is table for example.
    if (prevNode == null || prevNode.NodeType != NodeType.Paragraph)
        break;
    builder.CurrentParagraph.Remove();
    builder.MoveTo(prevNode);
}

Hope this helps.
Best regards,

Hi,
Very Helpful!!! This is what I was looking for!!
For some reason, the currentparagraph after InsertHTML has a single node with “” as text. I will check for both NoChildren and one child with NoText.
Many thanks for your help!!!
Regards, Bruce

Hi Bruce,

It is perfect that the solution works for you. To check whether the paragraph is empty, you can also try using code like the following:

builder.InsertHtml("<p>this is paragraph</p>");
// Remove current paragraph if it is empty.
while (string.IsNullOrEmpty(builder.CurrentParagraph.ToTxt().Trim()))
{
    // Get previouse node.
    Node prevNode = builder.CurrentParagraph.PreviousSibling;
    // we do not need to remove current paragraph if th epreviouse node is table for example.
    if (prevNode == null || prevNode.NodeType != NodeType.Paragraph)
        break;
    builder.CurrentParagraph.Remove();
    builder.MoveTo(prevNode);
}

Best regards,

Hi,
Your last example provided excellent clues … I’ve consolidated the two solutions.
I do have one further question about InsertHTML…
I am now noticing that the insert HTML is making some new styles and changing formatting (especially spaceafter) for any lists etc. When I remove the last blank line, could I also be removing any reset to formatting.
Does the InsertHTML save all of the settings before insert and restore them after insert? I now need to reset some of the settings to keep the remainder of the document consistent. If it does restore some of the settings in the last paragraph, it would be nice if it just left the builder in the original state w/o an extra blank line (just a quess on my part).
I know there is a push and pop to save some of the settings … is there a way to save the whole context before inserthtml and restore the whole context.
Any thoughts appreciated!!
Regards, Bruce

Hi Bruce,

Thanks for your request. Aspose.Words restores formatting after inserting HTML. However, when you move DocumentBuilder cursor to the previous paragraph (paragraph inserted by InsertHtml ), DocumentBuildet inherits formatting of this paragraph.
Maybe in your case you can use PushFont and PopFont methods of DocumentBuilder:
https://reference.aspose.com/words/net/aspose.words/documentbuilder/pushfont/
Best regards,