Problem converting to HTML with embedded CSS option

We’re evaluating Aspose.Words for use in our application where we need to convert .doc and .rtf files to HTML. We’re trying to use the embedded CSS option in an attempt to minimize the size of the resulting HTML. The problem we’re having is that the resulting HTML renders a little differently than if we had used the inline CSS option. I’ve attached the MS Word document that illustrates the problem and below is the code we’re using to perform the conversion. I’ve also attached the resulting HTML from using the embedded option below (aspose_embedded.html.txt). Is this a known issue or is there any way to work around this problem?
Below is the code we’re using to convert to HTML. The inputContent parameter below is the raw bytes from the MS Word document. The CHARSET value is set to “UTF-8”.
Thanks.

InputStream is = new BufferedInputStream(new ByteArrayInputStream(inputContent));
Document doc = new Document(is, null, LoadFormat.AUTO, null);
// remove macros
doc.removeMacros();
// remove all images
Node currNode = doc;
while (currNode != null)
{
    Node nextNode = currNode.nextPreOrder(doc);
    if (currNode.getNodeType() == NodeType.SHAPE)
    {
        Shape shape = (Shape) currNode;
        if (shape.canHaveImage())
        {
            shape.remove();
        }
    }
    currNode = nextNode;
}
doc.joinRunsWithSameFormatting();
SaveOptions options = doc.getSaveOptions();
options.setHtmlExportEncoding(Charset.forName(CHARSET));
options.setHtmlExportCssStyleSheetType(CssStyleSheetType.EMBEDDED);
ByteArrayOutputStream os = new ByteArrayOutputStream();
// doc.save("C:/temp/aspose_embedded.html", SaveFormat.HTML);
doc.save(os, SaveFormat.HTML);
String html = os.toString(CHARSET);

Hi

Thanks for your request. I managed to reproduce the problem on my side. You will be notified as soon as it is resolved.
Best regards.

Hello!
I have investigated the issue a bit deeper. I see the two major problems:

  1. Font size of the text in tables is calculated to bigger value when viewed in browser. It should be Times New Roman, 10pt but exported as 12pt. This value is taken from browser defaults. To work-around this you can specify explicit style for these paragraphs of text, not just Normal. Minimally you can create an empty style in the document and apply to text. This will give no difference in Microsoft Word since the new style will derive from Normal and won’t override anything. But in Aspose.Words HTML export module mentioned font will acquire proper size. You can also add this attribute to direct font formatting.
  2. Line spacing is calculated to less than needed. For Name style it should be 24pt but becomes 12pt. Probably the most suitable workaround is adding line spacing to direct formatting of the first paragraph.

Of course you can export HTML with inline CSS unless you really need embedded one. It’s difficult to predict what documents need workarounds and what others will be converted exactly. Please let us know if you discover any other disparities.
Regards,

@erik.morsepeopleclic,
The issues you have found earlier (filed as WORDSNET-2719) have been fixed in this Aspose.Words for .NET 18.2 update and this Aspose.Words for Java 18.2 update.
Please also check the following articles: