Soft hyphens used as bullet points converted to `&shy` in the output HTML and so not visible

Hi, I’ve come across a rendering issue when converting Word documents to xhtml.

If the Word document contains a list which uses “soft hyphen” as its custom bullet point, the soft hyphens are converted to the entity &shy in the HTML document (which is semantically correct) but whereas Word displays that character, the browser doesn’t. And so the list appears to have no bullet points.

Here’s a minimal reproduction document:
test bullet points (1).docx (13.6 KB)

Changing the bullet point character to a “minus hyphen” provides a consistent rendering between Word and HTML. Nevertheless, there is a rendering bug with “soft hyphen” as bullet point character.

Let me know if you require more details.
Thanks in advance.

@njlgad Thank you for reporting this problem to us. But I think it is not a bug, but specific of HTML. If you convert your document to HTML using MS Word, you will get the same result - the custom bullets are not visible.
Also, if you roundtrip your HTML document to DOCX, the custom bullet is preserved properly. For example see the following code:

Document doc = new Document(@"C:\Temp\in.docx");
HtmlSaveOptions opt = new HtmlSaveOptions();
opt.PrettyFormat = true;
doc.Save(@"C:\Temp\out.html", opt);

Document doc1 = new Document(@"C:\Temp\out.html");
doc1.Save(@"C:\Temp\out.docx");

As a workaround, you can also specify HtmlSaveOptions.ExportListLabels to ExportListLabels.ByHtmlTags, in this case the bullets in HTML will be displayed as standard dot, but at least they are visible in HTML

@alexey.noskov,

My reference point, in terms of what the HTML should look like, is the print version generated by Word.

As you can see in the following screenshot, the bullet points are present when printing the document:
Screenshot 2022-03-25 at 12.57.56.png (65.6 KB)

The fact that Word itself fails to generate HTML which renders the document like it should, is a bug with Word as far as I’m concerned.

I appreciate the workaround, but we’re trying to produce HTML with maximum fidelity to the Word document; replacing hyphens (and potentially all other bullet styles) with dots is not really acceptable.

Let me know what you think :slight_smile:

@njlgad Thank you for additional information. I have logged this issue as WORDSNET-23635. We will further investigate it and let you know once it is resolved or there is more information.