Convert HTML to Docx by retaining fonts

Hi

How to convert HTML to Docx by retaining format?

Here when below is converted using Aspose.Words, all paragraph have Times New Roman as a font.

"<html><head><style> table, td, th { border: 1px solid; } table { width: 100%; border-collapse: collapse;  } td { height: 20px; vertical-align:top; padding: 0px 5px 0px; } </style></head><body><p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p><p><span style=\\"font-family:\\'Courier New\\', Courier, monospace;\\">Lorem Ipsum is simply dummy text of the printing and typesetting industry.</span></p><p><span style=\\"font-family:\\'Courier New\\', Courier, monospace;font-size:medium;\\">Lorem Ipsum is simply dummy text of the printing and typesetting industry.</span></p></body></html>"

But it is expected to have first paragraph to have DocDefault fonts, 2nd as “Courier New” and 3rd as “Courier New” with medium font size.

Sample code -

var doc = new Document();
var builder = new DocumentBuilder(doc);
builder.InsertHtml(inputHTML);
using var docxStream = new MemoryStream();
doc.Save(docxStream, SaveFormat.Docx);
return docxStream.ToArray();

@bhargavgaglani07 Unfortunately, I cannot reproduce the problem on my side. I put your HTML string into a file nd used the following two code examples from testing. In both cases the second wo paragraphs has Courier New font:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.InsertHtml(File.ReadAllText(@"C:\Temp\in.html"));
doc.Save(@"C:\temp\out.docx", SaveFormat.Docx);
Document doc = new Document(@"C:\Temp\in.html");
doc.Save(@"C:\Temp\out.docx");

For convenience, here is your HTML in more readable form:

<html>
<head>
    <style>
        table, td, th {
            border: 1px solid;
        }

        table {
            width: 100%;
            border-collapse: collapse;
        }

        td {
            height: 20px;
            vertical-align: top;
            padding: 0px 5px 0px;
        }
    </style>
</head>
<body>
    <p>Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
    <p><span style="font-family:'Courier New', Courier, monospace;">Lorem Ipsum is simply dummy text of the printing and typesetting industry.</span></p>
    <p><span style="font-family:'Courier New', Courier, monospace;font-size:medium;">Lorem Ipsum is simply dummy text of the printing and typesetting industry.</span></p>
</body>
</html>

out.docx (7.1 KB)

Hi @alexey.noskov,

Thank you for your response, and apologies for not being clear at first.

Here first paragraph gets Times New Roman by default but it is expected to have no font assigned and should follow Document default. Also 3rd paragraph does not have proper font size.

Let me know if that clears up or in case more information required.

Thanks,
Bhargav

@bhargavgaglani07 Times New Roman fonts is default font in the document created from scratch by Aspose.Words. In the HTML you have provided default font is not specified.
Aspose.Words does not support “absolute-size” values, like small, medium, large etc. Upon reading font-size attribute, Aspose.Words expects either number with units or number value.
In your, however, font size is correct, since medium font size means default font size in document, which is 12pt by default in the document created from scratch by Aspose.Words.