How to add overlapping comments from HTML <span data-comment> and preserve all formatting in Aspose.Words for Java?

Hello!

I am working with Aspose.Words for Java.
I need to insert comments into a Word document based on HTML fragments with custom attributes.
However, I want to keep the original text formatting (bold, italics, lists, tables, etc.) from the HTML.
I receive HTML like this (this is the example with overlapping comments):

<span data-comment="discussion-uuid-1">1111</span>
<span data-comment="discussion-uuid-2">
    <span data-comment="discussion-uuid-1">2222</span>
</span>
<span data-comment="discussion-uuid-2">3333</span>'

What I’ve tried:
If I use builder.insertHtml(html, true) all formatting is preserved, but the <span data-comment=...> tags are lost, so I can’t add Word comments to the fragments after that.
If I manually parse the HTML and add comments by traversing the plain text, I lose all the HTML formatting.
I want to preserve original formatting and still add comments to the correct fragments.

It’s important that comments are created on arbitrary text fragments (not just whole paragraphs), and that the solution works with overlapping ranges.

Is there a way to process the HTML and insert comments to overlapping fragments while keeping all formatting?

Does Aspose.Words provide a recommended way to map formatted HTML fragment positions to Word document ranges, so I can programmatically add comments for each fragment?

Thank you very much!

@Liubei

To add overlapping comments from HTML <span data-comment> elements while preserving all formatting in Aspose.Words for Java, you can follow a structured approach. Here’s a step-by-step guide to achieve this:

  1. Parse the HTML: You need to parse the HTML to extract the text and the associated comment identifiers. This can be done using a library like Jsoup or a custom parser.

  2. Insert HTML with Formatting: Use the DocumentBuilder to insert the formatted text into the Word document. However, since builder.insertHtml(html, true) will lose the custom attributes, you will need to handle the insertion manually.

  3. Create Comments: For each text fragment that corresponds to a <span data-comment>, create a comment in the Word document. You can use the Comment class to create comments and associate them with specific text ranges.

  4. Handle Overlapping Ranges: To manage overlapping comments, you will need to keep track of the start and end positions of each comment. Use CommentRangeStart and CommentRangeEnd to define the range for each comment.

Here’s a code example to illustrate this process:

import com.aspose.words.*;

public class AddCommentsFromHtml {
    public static void main(String[] args) throws Exception {
        Document doc = new Document();
        DocumentBuilder builder = new DocumentBuilder(doc);

        // Example HTML input
        String html = "<span data-comment=\"discussion-uuid-1\">1111</span>" +
                      "<span data-comment=\"discussion-uuid-2\">" +
                      "<span data-comment=\"discussion-uuid-1\">2222</span>" +
                      "</span>" +
                      "<span data-comment=\"discussion-uuid-2\">3333</span>";

        // Parse the HTML and insert text with comments
        // This is a simplified example; you would need to implement a proper HTML parser
        String[] parts = html.split("(?=<span data-comment=)|(?<=</span>)");
        for (String part : parts) {
            if (part.contains("data-comment")) {
                String commentId = part.substring(part.indexOf("data-comment=\"") + 14, part.indexOf("\"", part.indexOf("data-comment=\"") + 14));
                String text = part.replaceAll("<span data-comment=\"[^\"]*\">", "").replaceAll("</span>", "");

                // Insert text
                builder.write(text);

                // Create comment
                Comment comment = new Comment(doc, "Author Name", "AN", new Date());
                comment.getParagraphs().add(new Paragraph(doc));
                comment.getFirstParagraph().getRuns().add(new Run(doc, "Comment for " + commentId));

                // Define comment range
                CommentRangeStart start = new CommentRangeStart(doc, comment.getId());
                CommentRangeEnd end = new CommentRangeEnd(doc, comment.getId());

                // Insert comment range markers
                builder.getCurrentParagraph().getChildNodes().add(start);
                builder.getCurrentParagraph().getChildNodes().add(end);
                builder.getCurrentParagraph().getChildNodes().add(comment);
            }
        }

        // Save the document
        doc.save("OutputWithComments.docx");
    }
}

Key Points:

  • This example demonstrates how to parse HTML and insert comments while preserving formatting.
  • You will need to implement a more robust HTML parsing mechanism to handle various cases and nested spans.
  • Ensure that you manage the positions of comments correctly to avoid overlaps.

This approach allows you to programmatically add comments to specific text fragments while maintaining the original formatting from the HTML. If you have further questions or need additional examples, feel free to ask!

this approach will not work for overlapping comments.
In your code, the logic is essentially linear and does not support nesting or intersecting comment ranges.

@Liubei I am afraid there is no way to preserve custom HTML attributes while importing HTML fragment into Aspose.Words DOM.
Please note, Aspose.Words Document Object Model is designed to work with MS Word documents. HTML documents object model is quite different and it is not always possible to provide 100% fidelity after importing or exporting HTML document. Usually Aspose.Words mimics MS Word behavior when work with HTML documents.

Through upon exporting to HTML Aspose.Words writes special tags to preserve comments and their ranges. You can use the same approach. For example see the following documents:
html_with_overlapping_comments.zip (725 Bytes)
out.docx (10.8 KB)