Free Support Forum - aspose.com

Preserving original ParagraphFormat when using DocumentBuilder.InsertHTML()

I'm currently trying to find a solution for a problem we've encountered within our implementation of Aspose.Words (Version: 13.1.0.0).

The situation is as follows:

A user can create a Word Document which will be used as a template for a report within our application. Within the Template, the user can insert variables which should be replaced with content collected from the application, which matches the variable declaration (See attachment "Audit report Template.doc" for an example of such a template). Upon generating a report, based on the previously mentioned template, the application will collect all user generated content from the application and inserts it at the designated position within the document. The content which needs to be inserted into the report, can contain HTML Code. In order to insert this particular type of content, we use the "DocumentBuilder.InsertHtml(strHTMLStringToInsert)" method.

The problem we're facing is that, once the HTML has been inserted into the DocumentBuilder, the paragraph format of the inserted content won't match the paragraph format defined within the Template document. Next to that, if an Order List (OL) or Unordered List (UL) HTML-tag has been used within the HTML code, then each individual List Item (LI) will have its own Paragraph node, resulting in a List where a Paragraph Spacing has been placed after each List Item (See attachment "Result.doc" for an example of a report after inserting the HTML Code at the designated position(s)).

In order to remove the Spacing from each individual List Item (except for the last Item within the List), we would traverse through the document, verify if the NextSibling of the current ListItem is also considered to be a list item and part of the same (un)ordered list. If that's the case then we would manually set the "SpacingAfter" property of the ParagraphFormat.

But, since we cannot detect which of the Lists have been added via the InsertHTML method, we would update all of the Lists found within the document. Also the Lists which should remain unaltered.

What we are trying to achieve is to insert HTML Code while preserving the paragraph format of the original template, without having to traverse through the document and manually set the ParagraphFormat for each List Item. (See Attachment "Desired Result.doc")

What would you suggest as an alternative means for solving this problem?

See Attachment "FormattedMetafieldContents_HTML.txt" for an example of the HTML Code we want to insert into the Report Document by using Aspose.Words. The code we use for inserting the HTML Code is listed below.

            Public Function Replacing(ByVal e As Aspose.Words.ReplacingArgs) As Aspose.Words.ReplaceAction Implements Aspose.Words.IReplacingCallback.Replacing
            <span style="color: green;">'create DocumentBuilder object</span>
            <span style="color: blue;">Dim</span> objDocumentBuilder <span style="color: blue;">As</span> <span style="color: blue;">New</span> Aspose.Words.<span style="color: rgb(43, 145, 175);">DocumentBuilder</span>(<span style="color: blue;">CType</span>(e.MatchNode.Document, Aspose.Words.<span style="color: rgb(43, 145, 175);">Document</span>))

            <span style="color: green;">'get the concering node</span>
            <span style="color: blue;">Dim</span> objCurrentNode <span style="color: blue;">As</span> <span style="color: rgb(43, 145, 175);">Node</span> = e.MatchNode

            <span style="color: green;">'the first (and may be the only) run can contain text before the match, in this case it is necessary to split the run</span>
            <span style="color: blue;">If</span> e.MatchOffset > 0 <span style="color: blue;">Then</span>
                objCurrentNode = AH.<span style="color: rgb(43, 145, 175);">AsposeHelper</span>.SplitRun(<span style="color: blue;">CType</span>(objCurrentNode, <span style="color: rgb(43, 145, 175);">Run</span>), e.MatchOffset)
            <span style="color: blue;">End</span> <span style="color: blue;">If</span>

            <span style="color: green;">'if there is some other text after the current run(match), split that too</span>
            <span style="color: blue;">If</span> objCurrentNode.GetText.Length > e.Match.Value.Length <span style="color: blue;">Then</span>
                objCurrentNode = AH.<span style="color: rgb(43, 145, 175);">AsposeHelper</span>.SplitRun(<span style="color: blue;">CType</span>(objCurrentNode, <span style="color: rgb(43, 145, 175);">Run</span>), e.Match.Value.Length, objCurrentNode.GetText.Length - e.Match.Value.Length)
            <span style="color: blue;">End</span> <span style="color: blue;">If</span>

            <span style="color: green;">'the node that contains text should be a Run</span>
            <span style="color: blue;">Dim</span> objRun <span style="color: blue;">As</span> <span style="color: rgb(43, 145, 175);">Run</span> = <span style="color: blue;">DirectCast</span>(objCurrentNode, <span style="color: rgb(43, 145, 175);">Run</span>)

            <span style="color: green;">'move to the matching node</span>
            objDocumentBuilder.MoveTo(objRun)

            <span style="color: green;">'clear the text of the Run</span>
            objRun.Text = <span style="color: rgb(163, 21, 21);">""</span>

            <span style="color: green;">'insert the value (= HTML)</span>
            objDocumentBuilder.InsertHtml(_strReplacementValue)

            <span style="color: green;">'return, indicating "Skip", because we manually replaced the value</span>
            _blnFound = <span style="color: blue;">True</span>
            <span style="color: blue;">Return</span> Aspose.Words.<span style="color: rgb(43, 145, 175);">ReplaceAction</span>.Skip

        <span style="color: blue;">End</span> <span style="color: blue;">Function</span></font></pre><pre style="background: white; color: black; font-family: Consolas;">With kind regards,</pre><pre style="background: white; color: black; font-family: Consolas;">Tom Pouwelse<br>Software Engineer<br>Infoland BV.</font></pre>
Hi Tom,

Thanks for your inquiry. Please note that, content inserted by DocumentBuilder.InsertHtml method does not inherit formatting specified in DocumentBuilder options. Whole formatting is taken from HTML snippet. If you insert HTML with no formatting specified, then default formatting is used for inserted content.

In your case, I suggest you please use InsertHtmlWithBuilderFormatting instead of the InsertHtml method as shown in following code snippet. Please insert bookmark at the position where you are inserting the html contents. Once you have inserted the html, iterate through all nodes between BookmarkStart and BookmarkEnd and set Paragraph space after to zero.

I have attached the code related to InsertHtmlWithBuilderFormatting with this post.

builder.MoveTo(your specified nodeā€¦.);

BookmarkStart bookmarkStart = builder.StartBookmark("bm");

InsertHtmlWithBuilderFormatting(builder, File.ReadAllText(MyDir + "in.html"));

//builder.InsertHtml(File.ReadAllText(MyDir + "in.html"));

BookmarkEnd bookmarkEnd = builder.EndBookmark("bm");

Node currentNode = bookmarkStart;

while (currentNode != bookmarkEnd)

{

if (currentNode.NodeType == NodeType.Paragraph && ((Paragraph)currentNode).IsListItem == true)

{

((Paragraph)currentNode).ParagraphFormat.SpaceAfter = 0.0;

}

currentNode = currentNode.NextPreOrder(doc);

}

doc.Save(MyDir + "out.doc");


Hope this helps you. Please let us know if you have any more queries.

Hi Tahir,

I've taken my time to implement and investigate the results of the suggestion regarding the InsertHTMLWithBuilderFormat implementation as mentioned above. After trying several variations of the said code, i came to the conclusion that simply copying the format of the original/previous paragraph would not fully solve the issue at hand.

I've enclosed a generated document where you can see the result of the InsertHTMLWithBuilderFormat implementation.

Next to that, i also found out that by temporarily replacing the Document.NodeChangingCallback with the DocumentBuilderHelper INodeChangingCallback-interface implementation, will generate a document which Microsoft Word 2010 will open in a (red) "Protected View"-mode because it failed to pass the File Validation rules. (See attached image)

After discussing the results internally, we've decided to go for the Bookmark option you've mentioned in your previous post, namely using a bookmark to identify the sections of the document which for which the paragraph/list format needs to be verified and adjusted accordingly and then then remove the bookmark (in order to prevent the bookmarks to be saved within the document itself).

Thank you for the assistance.

With kind regards,
Tom Pouwelse
Software Engineer
Infoland BV.
Hi Tom,

Thanks for your feedback. Yes, the approach of using InsertHtmlWithBuilderFormatting will help you to achieve your requirements. After calling InsertHtmlWithBuilderFormatting , you may need to iterate through all Paragraph nodes and update the paragraphs according to your requirements.

Please let us know if you have any more queries.