Free Support Forum - aspose.com

Ordered Lists displaying with "1." for each item in the list

In our product, we collect data using tinyMCE. When users create an ordered list, it looks like it should in the browser. However, when the user chooses to view the document, during the merge process, the ordered list shows each list item preceded with a 1. For example (“

  1. Item 1
  2. \r
  3. Item 2
  4. \r
  5. Item 3
  6. \r
”):

  1. Item 1
  2. Item 2
  3. Item 3
looks like

1. Item 1
1. Item 2
1. Item 3

I have tracked it down to where it appears that each node in the list html appears to be being added as separate paragraphs. We are using the following function to get a list of nodes from the html text in order to build out our document. Also, the calling function is included:

private Node ReplaceHtml(Run run)
{
string text = run.Text;

if (htmlFormatRegex.IsMatch(text))
{
run.Text = “”;

// get the paramodel in the target document where the run is located
Paragraph refPara = (Paragraph)run.GetAncestor(NodeType.Paragraph);

// split the document at the run that is being removed
Paragraph insertAfter = null;
for (Node extractNode = run; extractNode != refPara; extractNode = extractNode.ParentNode)
{
// find the first node in the run’s ancestry that is not the last child
if (extractNode.NextSibling != null)
{
insertAfter = refPara.ExtractRight(extractNode);
break;
}
}

Node refNode = refPara;
foreach (Node node in GetNodesFromHtml(text, run.Document.Styles).Where(n => n.Range.Text != “\f”))
{
if (node is Paragraph)
{
if (this.HtmlContentMergedEventHandler != null)
{
HtmlContentMergedArgs args = new HtmlContentMergedArgs();
args.TemplateParagraph = refPara;
args.MergedParagraph = (Paragraph)node;
this.HtmlContentMergedEventHandler(this, args);
}
}

// import the new node into the target document
Node inserted = Document.Document.ImportNode(node, true);
refNode.ParentNode.InsertAfter(inserted, refNode);

ConvertToSpecialQuotes(inserted, out inserted);

// if the merge field was the only text on the line, remove the trailing line break
// to prevent extra spacing in the rendered document
if (refNode == refPara && refNode.Range.Text == “\r”)
refNode.Remove();

refNode = inserted;
}

// insert the right portion of the original paramodel if needed
if (insertAfter != null)
{
refNode.ParentNode.InsertAfter(insertAfter, refNode);
return insertAfter;
}
else
return refNode;
}

return null;
}

private IEnumerable GetNodesFromHtml(string htmlText, StyleCollection styles)
{
// ensure empty paramodels display
htmlText = htmlLineBreakExpression.Replace(htmlText, "

 

“);

Document doc = new Document();

// ensure all styles exist in the new document being created
foreach (Style style in styles)
{
if (style.Type == StyleType.Paragraph || style.Type == StyleType.List)
{
if (doc.Styles[style.Name] == null)
doc.Styles.Add(style.Type, style.Name);
}
}

DocumentBuilder builder = new DocumentBuilder(doc);
builder.InsertHtml(htmlText);

return ((Body)doc.Document.SelectSingleNode(”//Body")).ChildNodes.Cast().ToList();
}

Do you see anything that stands out as incorrect for what we are trying to do?

I have also attached images of what we are seeing.

Thanks for your assistance.

Hi Scott,


Thanks for your inquiry. Please note that the HTML string you supplied is invalid as ‘\r’ is not allowed between the opening and closing ‘li’ tags in HTML; however, Aspose.Words silently corrects this behaviour. Please try running the following lines of code with the latest version of Aspose.Words:

DocumentBuilder builder = new DocumentBuilder();

builder.InsertHtml("

  1. Item 1
  2. \r
  3. Item 2
  4. \r
  5. Item 3
  6. \r
");

builder.Document.Save(@"C:\Temp\out.docx");


In this case, the numbering is as expected. Please let me know if I can be of any further assistance.

PS: To avoid this you may pre-process the incoming HTML and remove text '\r' from in between and
  • tags.

  • Best regards,

    Thanks for the quick response. I did something similar to what you have suggested, and I did indeed see that in the word document that the list was numbered as expected. However, the “ReplaceHtml” function above seems to now be the issue.


    What I am wondering is how to ensure that each of the nodes that come back from the GetNodesFromHtml function (updated code below) get inserted into the same list. Currently, each node is added, but they get a new ListId, which is what is causing the output I am seeing.

    I have added some code based on another forum post concerning lists and have verified that each item in the IEnumerable list has a ListId of 1.

    (as a side note, the resulting document in our software is not saved to the file system, it is converted to pdf format and output to a byte array and saved in a database, then rendered in a pdf viewer, which is what you see in the second attachment above.)

    private IEnumerable GetNodesFromHtml(string htmlText, StyleCollection styles)
    {
    // ensure empty paramodels display
    // ~ added only for purposes of splitting string
    htmlText = htmlLineBreakExpression.Replace(htmlText, "

     

    “).Replace(”>\r<", “>~<”);

    Document doc = new Document();

    // ensure all styles exist in the new document being created
    foreach (Style style in styles)
    {
    if (style.Type == StyleType.Paragraph || style.Type == StyleType.List)
    {
    if (doc.Styles[style.Name] == null)
    doc.Styles.Add(style.Type, style.Name);
    }
    }

    DocumentBuilder builder = new DocumentBuilder(doc);
    if (htmlText.Contains("
      "))
    {
    Aspose.Words.Lists.List list = doc.Lists.Add(Aspose.Words.Lists.ListTemplate.NumberDefault);
    builder.CurrentParagraph.ListFormat.List = list;
    string[] listSegments = htmlText.Split(’~’);
    for (int i = 0; i < listSegments.Length; i++)
    {
    if (listSegments[i].IndexOf("
  • ") == 0)
  • {
    string listText = listSegments[i].Substring(4, listSegments[i].IndexOf("") - 4);
    if (i < (listSegments.Length - 1))
    builder.Writeln(listText);
    else
    builder.Write(listText);
    }
    }
    }
    else
    {
    builder.InsertHtml(htmlText);
    }

    return ((Body)doc.Document.SelectSingleNode("//Body")).ChildNodes.Cast().ToList();
    }

    My coworker and I figured out what we were doing wrong and have corrected our code. The main problem was in the GetNodesFromHtml function. To resolve the issue, we are now passing in the document and the current paragraph from the ReplaceHtml function:

    GetNodesFromHtml(text, run.Document.Styles, Document, (Paragraph)refNode)

    With this, we are creating the DocumentBuilder with the current document as opposed to a brand new document. We move the builder to the current paragraph and then call InsertHtml.

    For anyone who may be looking at this, here are the two functions after our changes:

    private Node ReplaceHtml(Run run)

    {

    string text = run.Text;


    if (htmlFormatRegex.IsMatch(text))

    {

    run.Text = "";


    // get the paramodel in the target document where the run is located

    Paragraph refPara = (Paragraph)run.GetAncestor(NodeType.Paragraph);


    // split the document at the run that is being removed

    Paragraph insertAfter = null;

    for (Node extractNode = run; extractNode != refPara; extractNode = extractNode.ParentNode)

    {

    // find the first node in the run's ancestry that is not the last child

    if (extractNode.NextSibling != null)

    {

    insertAfter = refPara.ExtractRight(extractNode);

    break;

    }

    }


    Node refNode = refPara;

    foreach (Node node in GetNodesFromHtml(text, run.Document.Styles, Document, (Paragraph)refNode).Where(n => n.Range.Text != "\f"))

    {

    if (node is Paragraph)

    {

    if (this.HtmlContentMergedEventHandler != null)

    {

    HtmlContentMergedArgs args = new HtmlContentMergedArgs();

    args.TemplateParagraph = refPara;

    args.MergedParagraph = (Paragraph)node;

    this.HtmlContentMergedEventHandler(this, args);

    }

    }


    // import the new node into the target document

    Node inserted = node;


    refNode.ParentNode.InsertAfter(inserted, refNode);


    ConvertToSpecialQuotes(inserted, out inserted);


    // if the merge field was the only text on the line, remove the trailing line break

    // to prevent extra spacing in the rendered document

    if (refNode == refPara && refNode.Range.Text == "\r")

    refNode.Remove();


    refNode = inserted;

    }


    // insert the right portion of the original paramodel if needed

    if (insertAfter != null)

    {

    refNode.ParentNode.InsertAfter(insertAfter, refNode);

    return insertAfter;

    }

    else

    return refNode;

    }


    return null;

    }

    private IEnumerable GetNodesFromHtml(string htmlText, StyleCollection styles, Document doc, Paragraph insertHere)

    {

    // ensure empty paramodels display

    htmlText = htmlLineBreakExpression.Replace(htmlText, "

     

    ");


    // ensure all styles exist in the new document being created

    foreach (Style style in styles)

    {

    if (style.Type == StyleType.Paragraph || style.Type == StyleType.List)

    {

    if (doc.Styles[style.Name] == null)

    doc.Styles.Add(style.Type, style.Name);

    }

    }


    DocumentBuilder builder = new DocumentBuilder(doc);

    builder.MoveTo(insertHere);


    Node nextSibling = insertHere.NextSibling;

    builder.InsertHtml(htmlText);


    var insertedNodes = insertHere.FollowingSiblings();

    if (nextSibling != null)

    insertedNodes.TakeUntilItem(nextSibling);


    return insertedNodes.ToList();

    }

    Hi Scott,


    It’s great you were able to find what you were looking for. Please let us know any time you have any further queries.

    Best regards,

    Hello again. Well, we thought we had figured it out, but our solution caused issues with some of our more complex documents. We started getting an error message stating that the document type was invalid. So, I have removed all of the changes we made and we are back to square 1. I have made a modification to our GetNodesFromHtml function. Here is the function:

    Document doc = new Document();

    DocumentBuilder builder = new DocumentBuilder(doc);

    Paragraph insertHere = (Paragraph)((Body)((Section)doc.FirstChild).ChildNodes[0]).ChildNodes[0];

    builder.MoveTo(insertHere);

    builder.InsertHtml(htmlText);

    return insertHere.FollowingSiblings();

    I have verified that this is indeed returning a list of nodes, and in this list of nodes is my ordered list, and for each node representing a list item, the node.ListFormat.List.ListId = 1. So, it looks like all list items should be part of a single list. Now, when I am trying to bring these nodes from this document into our existing document, it appears that they are being converted into their own separate lists containing 1 list item each. Here is the foreach statement that does this:

    foreach (Node node in nodes)

    {

    Node inserted = refNode.Document.ImportNode(node, true);

    refNode.ParentNode.InsertAfter(inserted, refNode);

    if (refNode == refPara && refNode.Range.Text == "\r")

    refNode.Remove();

    refNode = inserted;

    }

    Something inside of ImportNode or InsertAfter is causing each item to be inserted into the resulting document as an entirely new list. How can I ensure that each list item from the source document ends up in the same list in the destination document?

    Thanks

    Hi Scott,


    Thanks for your inquiry. Could you please try using the following overload of ImportNode method and let us know how does it go on your side?

    DocumentBase.ImportNode Method (Node, Boolean, ImportFormatMode)

    It imports a node from another document to the current document with an option to control formatting. You can try both the ImportFormatMode.KeepSourceFormatting and ImportFormatMode.UseDestinationStyles options. I hope, this helps.

    Best regards,

    Hi Awais,

    Thanks for your response. After posting yesterday, I continued working with the code and came up with a solution that appears to work. As I have mentioned in my previous posts, we were using two separate functions when dealing with HTML that needed to be inserted into a merge field. We had one approach that seemed to work initially, but found out that with more complicated merge documents, it caused an invalid document type error. So, I took a closer look at the code that got us to that point and decided to not even call our helper function GetNodesFromHtml. Instead, I pulled the logic from the solution that almost worked and put it into the ReplaceHtml function. Below is the full implementation of the function:

    private Node ReplaceHtml(Run run)
    {
    string text = run.Text;

    if (htmlFormatRegex.IsMatch(text))
    {
    run.Text = "";

    // get the paramodel in the target document where the run is located
    Paragraph refPara = (Paragraph)run.GetAncestor(NodeType.Paragraph);

    // split the document at the run that is being removed
    Paragraph insertAfter = null;
    for (Node extractNode = run; extractNode != refPara; extractNode = extractNode.ParentNode)
    {
    // find the first node in the run's ancestry that is not the last child
    if (extractNode.NextSibling != null)
    {
    insertAfter = refPara.ExtractRight(extractNode);
    break;
    }
    }

    Node refNode = refPara;
    Node refNodeNextSibling = refNode.NextSibling;

    DocumentBuilder builder = new DocumentBuilder(Document);
    builder.MoveTo(refNode);
    builder.InsertHtml(htmlLineBreakExpression.Replace(text, "

     

    "));

    IEnumerable insertedNodes = refNode.FollowingSiblings();
    if (refNodeNextSibling != null)
    insertedNodes = insertedNodes.TakeUntilItem(refNodeNextSibling);

    foreach (Node node in insertedNodes)
    {
    if (node is Paragraph)
    {
    if (this.HtmlContentMergedEventHandler != null)
    {
    HtmlContentMergedArgs args = new HtmlContentMergedArgs();
    args.TemplateParagraph = refPara;
    args.MergedParagraph = (Paragraph)node;
    this.HtmlContentMergedEventHandler(this, args);
    }
    }

    Node inserted = null;
    ConvertToSpecialQuotes(node, out inserted);

    // if the merge field was the only text on the line, remove the trailing line break
    // to prevent extra spacing in the rendered document
    if (refNode == refPara && refNode.Range.Text == "\r")
    refNode.Remove();

    refNode = inserted;
    }

    // insert the right portion of the original paramodel if needed
    if (insertAfter != null)
    {
    refNode.ParentNode.InsertAfter(insertAfter, refNode);
    return insertAfter;
    }
    else
    return refNode;
    }

    return null;
    }

    As you can see, we are no longer building a separate document and trying to insert the nodes from the new document into our final document. I pulled the DocumentBuilder logic from our first attempted solution out of the helper function and put it directly into this function and based it on the current document. I then get the list of nodes that were just inserted into the current document and do our usual processing of those nodes. This appears to work as we wanted and does not cause any invalid document type errors.

    Thanks for all of your help and the information you have provided. I am sure that it will prove to be useful to us in the future.

    Thanks,

    Scott

    Hi Scott,


    Thanks for this additional information. Please let us know if you have any troubles and we will be glad to look into this further for you.

    Best regards,