We’re evaluating Aspose.Words for use in our application where we need to convert .doc and .rtf files to HTML. We’re trying to use the embedded CSS option in an attempt to minimize the size of the resulting HTML. The problem we’re having is that the resulting HTML renders a little differently than if we had used the inline CSS option. I’ve attached the MS Word document that illustrates the problem and below is the code we’re using to perform the conversion. I’ve also attached the resulting HTML from using the embedded option below (aspose_embedded.html.txt). Is this a known issue or is there any way to work around this problem?
Below is the code we’re using to convert to HTML. The inputContent parameter below is the raw bytes from the MS Word document. The CHARSET value is set to “UTF-8”.
Thanks.
InputStream is = new BufferedInputStream(new ByteArrayInputStream(inputContent));
Document doc = new Document(is, null, LoadFormat.AUTO, null);
// remove macros
doc.removeMacros();
// remove all images
Node currNode = doc;
while (currNode != null)
{
Node nextNode = currNode.nextPreOrder(doc);
if (currNode.getNodeType() == NodeType.SHAPE)
{
Shape shape = (Shape) currNode;
if (shape.canHaveImage())
{
shape.remove();
}
}
currNode = nextNode;
}
doc.joinRunsWithSameFormatting();
SaveOptions options = doc.getSaveOptions();
options.setHtmlExportEncoding(Charset.forName(CHARSET));
options.setHtmlExportCssStyleSheetType(CssStyleSheetType.EMBEDDED);
ByteArrayOutputStream os = new ByteArrayOutputStream();
// doc.save("C:/temp/aspose_embedded.html", SaveFormat.HTML);
doc.save(os, SaveFormat.HTML);
String html = os.toString(CHARSET);