Inconsistent paragraph formatting when using DocumentBuilder.InsertHTML()

When using DocumentBuilder.InsertHTML with html that contains multiple block elements (for example, something like <p>line 1<br />line2</p><ul><li>one</li><li>two</li></ul><p>line three</p>), the first paragraph generated matches the formatting of the current paragraph (e.g. text alignment), but subsequent ones do not.

This seems like a bug, and is certainly unexpected behaviour.

For now we can work around it by traversing over the paragraphs we have inserted and setting their formatting to match the first.

We are using the latest version of Aspose.Words (9.5.0)

Thanks,

Will

Hi Will,

Thanks for your inquiry. Probably you are using old version of Aspose.Words. in the latest version of Aspose.Words InsertHtml method uses formatting specified in the HTML, if formatting is not specified – the default formatting is applied.
You can download the latest version from here:
https://releases.aspose.com/words/net
Best regards,

Thanks - we are using the latest version of Aspose.Words (9.5.0). I would say that this behaviour is unexpected - I would expect any text without formatting to use the formatting as it is at the position of the merge field in the source document instead of as it is at the document start.

This behaviour causes a number of problems for us, because we frequently use the merge template to specify paragraph formatting, but then merge in html which contains relative font sizes (using ems). As we now have to go over every run inserted by the InsertHTML call and patch up its formatting, we essentially have to parse the html ourselves to figure out what the font sizes should be - this seems like the job of InsertHTML!

For example, if we insert

<p>first thing</p>
<p style='' font-size: 0.8em;">second thing</p>
<p>third thing</p>

and the merge field has formatting of “Arial, bold, 10pt” but the document has formatting of “Times New Roman, normal, 11pt”, then we have to loop over the runs, compare the font to the document default, if it is the same, set it to the merge field font, else if the size is different then figure out the relative size to the merge field font and use that.

All complicated logic that I would expect to be the default behaviour anyway!

Hi

Thank you for additional information. HTML snippets inserted by InsertHtml can be quite complex and frequently include their own CSS style sheets. This would make behavior of InsertHtml very complex and surprising. We should deliver an easy and predictable solution.
So, after investigation we’ve come to the following:

  1. InsertHtml should treat HTML snippets self-sufficiently. Don’t use any formatting applied to DocumentBuilder when creating content nodes from snippets.
  2. If some additional formatting is needed on the inserted contents then other possibilities should be used. The most universal way is handling node insertion. If you need to insert simple paragraphs with formatting then WriteLine should be used.
    Here is code that you can use to change formatting of the inserted HTML on the fly:
public void TestInsertHTMLWithEvent()
{
    Document doc = new Document();
    DocumentBuilder builder = new DocumentBuilder(doc);
    string htmlText = File.ReadAllText("in.html");
    builder.Document.NodeChangingCallback = new HandleNodeFont("Arial", 11.0);
    builder.InsertHtml(htmlText);
    builder.Document.NodeChangingCallback = null;
    doc.Save("out.doc");
}
private class HandleNodeFont: INodeChangingCallback
    {
        private string Name;
        private double Size;
        public HandleNodeFont(string name, double size)
        {
            this.Name = name;
            this.Size = size;
        }

        void INodeChangingCallback.NodeInserting(NodeChangingArgs e)
        { }

        void INodeChangingCallback.NodeInserted(NodeChangingArgs e)
        {
            if (e.Node.NodeType == NodeType.Run)
            {
                ((Run) e.Node).Font.Name = Name;
                ((Run) e.Node).Font.Size = Size;
            }
        }

        void INodeChangingCallback.NodeRemoving(NodeChangingArgs e)
        { }

        void INodeChangingCallback.NodeRemoved(NodeChangingArgs e)
        { }
}

Best regards,

Thanks Alexey - that code is a bit neater than our previous fix for this so will try that.

Will

Hi,

If one needs to have several HandleNodeFont nested classes is that possible? I need one to do basically what is done here in this thread. Then I need another to adjust the paragraph spacing settings. I can not do both at the same time since I need to apply HandleNodeFont to several different bits of HTML code.

I’m hoping that I can have several of these nested classes, but looking at how the name space seems to be working with them, I may have to use conditionals in the HandleNodeFont class to decide what I need adjusted?

Thanks.

-Rob

Hi Rob,
Thanks for your inquiry.
It used to be possible to “nest” these handlers when we were using events, however to facilitate autoporting we needed to change these to interfaces. Using interfaces there is no easy way to stack the handlers. I think the easiest method is to use one HandleNodeFont class and have the logic setup in there. e.g have different constructors which accept different levels of formatting.
Then depending on the HTML you are inserting you can create a new instance using the appropriate constructor. If you have any troubles with this we will be glad to help.
Also if you looking to combine all formatting from the DocumentBuilder or another node automatically (like how InsertHtml behaved like in older versions) then you may want to look into using this code work around here.
Thanks,

Or instead of having to have two or more handlers, can I modify the HTML code. What I would end to insert in to the HTML code would be a tag that sets it to the equivalent of paragraph space before and after of 0. (aka when InsertHtml() is called the paragraph space settings for before and after are set to zero.) Any ideas on this?

Could you please give an example of different constructors which accept different levels of formatting. My problem would be one set of HTML code will need x formatting, and another snippet would need y formatting. Right now there is no way to tell the difference between the HTML snippets. The difference comes at the actual time when I insert them. The stuff needing x formatting is inserted first, and the snippet needing y formatting is much later in the project.

Thanks!

PS: I never got the typical “Aspose.com - Automated Email” for this one. Is this something that can be changed in my forum properties? Maybe my settings only send the “Aspose.com - Automated Email” when I originate the thread?

Hi Rob,
Thanks for the additional information.
You should beable to use HTML like this to specify no spacing before or after:
No Spacing Before or After
Regarding the request for multiple node handlers, please see the example code below. Hopefully this is the sort of thing you were looking for. You can use the code like this:

doc.NodeChangingCallback = new HandleNodeFont("Arial", "11.0");
builder.InsertHtml(html);
doc.NodeChangingCallback = new HandleNodeFont(12.0, 12.0);
builder.InsertHtml(html);
public class HandleNodeFont: INodeChangingCallback
{
    private string mName;
    private double mSize;
    private double mSpaceBefore;
    private double mSpaceAfter;
    public HandleNodeFont(string name, double size)
    {
        // Set font name and size
        mName = name;
        mSize = size;
        // Make sure spacing is cleared.
        mSpaceBefore = -1;
        mSpaceAfter = -1;
    }
    public HandleNodeFont(double spaceBefore, double spaceAfter)
    {
        // Make sure other formatting is cleared
        mName = null;
        mSize = -1;
        // Set spacing
        mSpaceBefore = spaceBefore;
        mSpaceAfter = spaceAfter;
    }
    public void NodeInserted(NodeChangingArgs args)
    {
        if (args.Node.NodeType == NodeType.Paragraph)
        {
            if (mSpaceBefore> -1)
                ((Paragraph) args.Node).ParagraphFormat.SpaceBefore = mSpaceBefore;
            if (mSpaceAfter> -1)
                ((Paragraph) args.Node).ParagraphFormat.SpaceAfter = mSpaceAfter;
        }
        if (args.Node.NodeType == NodeType.Run)
        {
            if (!string.IsNullOrEmpty(mName))
                ((Run) args.Node).Font.Name = mName;
            if (mSize> -1)
                ((Run) args.Node).Font.Size = mSize;
        }
    }
    public void NodeInserting(NodeChangingArgs args)
    {}
    public void NodeRemoved(NodeChangingArgs args)
    {}
    public void NodeRemoving(NodeChangingArgs args)
    {}
}

I’m not quite sure why you didn’t recieve the notification. Can you check that the button at the top of this thread says “Disable Email Subscription” (meaning that it is currently enabled).
Perhaps you clicked on this by a mistake and disabled it.
Thanks,

I have a similar problem.
Here is a short example of code that demonstrates the problem:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
string html = "<style>p{margin-bottom:100px;}</style><p>paragraph1</p><p>paragraph2</p>";
builder.InsertHtml(html);

This produces a document with some paragraphs. But only the first paragraph has a margin. The other paragraphs all have a default SpaceAfter property set to 14.

This sounds like a bug.

Hi Allan,
Thank you for reporting this problem to us. I managed to reproduce the problem on my side. Your request has been linked to the appropriate issue. You will be notified as soon as it is resolved.
Best regards,

The issues you have found earlier (filed as WORDSNET-6074) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.