DATA_LOSS warnings when importing HTML with comments using DocumentBuilder.insertHtml()

Hello,

We recently noticed Bookmark DATA_LOSS warnings when converting HTML documents that contain comments into DOCX files.
Despite the warnings, the output DOCX still looks fine (comments are present and visible in Word).

We would like to confirm:

  1. Whether this is expected behavior or not.
  2. If the comment markup syntax has changed (e.g., -aw-comment-* attributes).
  3. How we should adjust our HTML to avoid these warnings.

Additional note: when exporting a Word document with a comment to HTML using Aspose we found the same markup syntax, and no warning occurred.

We weren’t able to find any documentation regarding the HTML comments format, is this documented somewhere?

Reproduction Code (Java):

@Test
void import_doc_from_html() throws Exception {
    final String html = """
        <html style="font-family: helvetica; font-size: 11pt;">
            <body>
              <p style="margin: 0; word-break: break-word;">te
                <a data-cl-comment name="_cmntref1"></a>test
                <span data-cl-comment style="-aw-comment-end:_cmntref1">&nbsp;</span>
                <a data-cl-comment href="#_cmnt1">[1]</a>
              </p>
              <p style="margin: 0; word-break: break-word;">teetete</p>
              <div id="_cmnt1"
                   style="-aw-comment-author:'Concord Tech';
                          -aw-comment-datetime:'2025-10-24T09:28:38.171';
                          -aw-comment-initial:'CT'">
                <p>
                  <a data-cl-comment href="#_cmntref1">[1]</a>
                  <span>test comment</span>
                </p>
              </div>
            </body>
        </html>
    """;

    final Document doc = new Document();
    final DocumentBuilder builder = new DocumentBuilder(doc);
    builder.insertHtml(html);

    final var warningCollector = new WarningInfoCollection();
    doc.setWarningCallback(warningCollector);

    doc.save("html/Aspose_ImportHTMLWithComments.docx");

    for (final var warning : warningCollector) {
        System.err.println("Warning: " + WarningType.getName(warning.getWarningType())
                           + " - " + warning.getDescription());
    }
}

Observed output:

Warning: DATA_LOSS - Bookmark '_cmnt1' without corresponding BookmarkStart was removed.
Warning: DATA_LOSS - Bookmark '_cmnt1' without corresponding BookmarkEnd was removed.

Environment:
Aspose.Words for Java version: 25.9

Thanks in advance!

@concord_tech

Hello,

Regarding your questions about the DATA_LOSS warnings when importing HTML with comments using DocumentBuilder.insertHtml(), here are some insights:

  1. Expected Behavior: The warnings you are encountering are expected when the HTML contains comment markup that does not translate directly into the Word document structure. The Aspose.Words library may not fully support certain HTML comment syntaxes, which can lead to the removal of bookmarks without corresponding start and end tags.

  2. Comment Markup Syntax: The comment markup syntax you are using (e.g., -aw-comment-* attributes) is specific to Aspose. If the expected structure is not met, the library’s handling of this syntax can result in warnings.

  3. Adjusting HTML: To avoid these warnings, ensure that your HTML structure is compatible with what Aspose.Words expects. Consider simplifying the comment structure or using standard HTML comments (<!-- comment -->) instead of the Aspose-specific attributes. However, be aware that standard HTML comments may not be preserved in the same way when converting to DOCX.

Unfortunately, there isn’t extensive documentation specifically addressing the HTML comments format used by Aspose.Words. For more details on supported features, you may want to refer to the general documentation on HTML import/export.

If you have further questions, feel free to reach out!

@concord_tech You can safely ignore the warning. But please note, Aspose.Words Document Object Model is designed to work with MS Word documents. HTML documents object model is quite different and it is not always possible to provide 100% fidelity after importing or exporting HTML document. Usually Aspose.Words mimics MS Word behavior when work with HTML documents. -aw-* CSS properties is Aspose.Words attempt to preserve MS Word roundtrip information in HTML.