How to import HTML into document and match its styles with document styles using Java

Hello,
I am trying to find a way to import an HTML with classes that map to pre-defined styles in the Word template that I’m using to create the Document. I’ve found different behaviours, none of which correspond to what I’m looking for.

Given an HTML such as this:

<div class="boldtext">Some text</div>

When inserting this into a DOCX that has the style boldtext defined I would like the paragraph to have this style applied. Instead what I am seeing is that the paragraph has “boldtext + Times NewRoman, Not Bold, Not Italic, Not Underlined”. This is probably due to the fact that in my docx my “boldtext” style has Bold, Italic and Underline enabled on it while in the HTML this is not the case.

Do you have an example of how to set up classes and styles in a way that the mapping works? I really want to get to the point where I can apply (empty) classes to the HTML and control the formatting via the styles in the Word template. If the result uses the Word style but adds additional formatting on top based on the HTML (defaults) then this is not very useful.

Thanks a lot,
Tilman

@tilman

Please use DocumentBuilder.InsertHtml Method (String, Boolean) as shown below to get the desired output.

Document doc = new Document(MyDir + "input.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
builder.InsertHtml(File.ReadAllText(MyDir + "input.html"), true);

doc.Save(MyDir + "20.6.docx");

When second parameter is false, DocumentBuilder formating is ignored and formatting of inserted text is based on default HTML formatting. As a result, the text looks as it is rendered in browsers.

When second parameter is true, formatting of inserted text is based on DocumentBuilder formatting, and the text looks as if it were inserted with Write.