Insert Invisible Markers in Converted HTML using Java | Bookmark Paragraph Content in Word DOCX document

Hello.
I want to highlight the paragraph in converted html from docx.

I have two type of data.

  1. paragraph text data.
  2. Split/converted HTML pages.

I processed paragraph text data. and select one paragraph. after then I want to display highlight that paragraph in converted HTML page.
Simple text match is not feasible. because one simple paragraph can have many runs. and same text can be occur in same page.

Example:
docx:
hello, allganize.
how are you?

html:

<div ..><span ..>hello,</span><span .. >allganize</span></div>
<div ..><span ..>how are you?</span></div>

paragraph index, text:
1, hello, allganize
2, how are you?

I tried to using insert paragraph index made by invisible chars. but It broke layout. (invisible chars are like \u2060 ~ \u2064)
example html)

<div ..><span ..>\u2061</span><span ..>hello,</span><span .. >allganize</span><<span ..>\u2061</span>/div>
<div ..><span ..>\u2062</span><span ..>how are you?</span><span ..>\u2062</span></div>

I found zero-width space is not broken layout. \u200b
But it is just single char. So it is not feasible for describe paragraph index. So I will try to insert hyperlink with zero-width space display and link that described paragraph index.

Question 1.
I make hyperlink like this.

DocumentBuilder builder = new DocumentBuilder(doc);
String marker = "\u200B";
builder.moveTo(paragraph);
builder.insertHyperlink(marker, "http://foo.com", false);

But this hyperlink is not rendered at html.

How can I make html like this?
example html)

<div ..><a href="1">\u200b</a><span ..>hello,</span><span .. >allganize</span><a href="1">\u200b</a>/div>
<div ..><a href="2">\u200b</a><span ..>how are you?</span><a href="2">\u200b</a></div>

Question 2.
Can I make hyperlink over the specific paragraph? It also solve my needs.
example html)

<div ..><a href="1"><span ..>hello,</span><span .. >allganize</span></a>/div>
<div ..><a href="2"><span ..>how are you?</span></a></div>

Question 3.
Can I insert maker while converting html? In this case, maker can be any not displayed.
example html)

<div ..><!-- 1 --><span ..>hello,</span><span .. >allganize</span><!-- 1 --></div>
<div ..><!-- 2 --><span ..>how are you?</span><!-- 2 --></div>

@allganize,

One simple way is to Bookmark the entire content of each Paragraph in Word document and then convert it to HTML. Please check if the following solution is acceptable for you?

Document doc = new Document("E:\\Temp\\input.docx");
DocumentBuilder builder = new DocumentBuilder(doc);

int i = 1;
for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
    if (para.getChildNodes().getCount() > 0) {
        builder.moveTo(para);

        BookmarkStart bookmarkStart = builder.startBookmark("bm" + i);
        builder.endBookmark("bm" + i);

        para.insertBefore(bookmarkStart, para.getFirstChild());
    }
    i++;
}

HtmlSaveOptions opts = new HtmlSaveOptions();
opts.setCssStyleSheetType(CssStyleSheetType.EMBEDDED);
opts.setPrettyFormat(true);
doc.save("E:\\Temp\\awjava-20.3.html", opts);

Thank you for response.

But It it not working as I expected.
rendered html like this.

<div><a name="bm4"></a><span ...> ... </span></div>

but I expected this

<div><a name="bm4"><span ..>...</span></a></div>

I confirmed that paragraphs starts with BookmarkStart and ends with BookmarkEnd

I archived it like this. Is it safe method?

            builder.moveTo(paragraph);
            builder.insertHyperlink("\u200B", marker, true);
            NodeCollection<Node> nc = paragraph.getChildNodes();
            int cnt = paragraph.getChildNodes().getCount();
            Node fieldStart = nc.get(cnt - 5);
            Node hyperLink = nc.get(cnt - 4);
            Node fieldSeparator = nc.get(cnt - 3);
            Node run = nc.get(cnt - 2);
            paragraph.insertBefore(fieldStart, paragraph.getFirstChild());
            paragraph.insertAfter(hyperLink, fieldStart);
            paragraph.insertAfter(fieldSeparator, hyperLink);
            run.remove();

@allganize,

It would be great if you please ZIP and upload your sample input Word document (covering all known scenarios) and your expected HTML file showing the desired output here for our reference. We will then investigate the issue further on our end and provide you code to achieve this by using Aspose.Words. Thanks for your cooperation.