HTML to DOC/DOCX - Conversion problem

Goldenking · April 24, 2008, 9:20am

When converting from HTML to DOC/DOCX, few styles such as height, width, text-align etc., (inside the style attribute) does not take effect. But when they are move as individual attributes of the node, they come into effect.

Example :

<img style="height:100px; width:100px" src="blablabla" /> does not set the size of the image in the converted DOC/DOCX
But <img height="100px" width="100px" src="blablabla" /> does.

<div style="text-align:center;">...</div> doesn’t work
But <div align="center">...</div> does.

Can you give me a list of styles that are not-supported or supported inside the ‘style’ attribute?

Is this an issue with the Aspose converter?
Are there any work-arounds?

-Thangaraj

Klepus · April 24, 2008, 10:18am

Hello!
Thank you for your inquiry.
Here are some restrictions in the current implementation of HTML import. On the div and img nodes most of things are done with separate attributes. And style=“xxx” is good for p and span.
We have no full list of supported formatting in style attribute but basically you can see this one (right from the code):

// paragraph attributes (p tag)
"margin-left",
"margin-right",
"margin-top",
"margin-bottom",
"text-indent",
"text-align",
"page-break-inside",
"page-break-after",
"page-break-before",
"line-height",
"widows",
"orphans",
"writing-mode",
"border-left-style",
"border-left-width",
"border-left-color",
"border-right-style",
"border-right-width",
"border-right-color",
"border-top-style",
"border-top-width",
"border-top-color",
"border-bottom-style",
"border-bottom-width",
"border-bottom-color",
"padding-left",
"padding-right",
"padding-top",
"padding-bottom",
// character attributes (span tag)
"font-family",
"font-size",
"font-weight",
"font-style",
"font-variant",
"text-decoration",
"text-transform",
"letter-spacing",
"vertical-align",
"color",
"background-color",
"display",
"border-style",
"border-width",
"border-color"

Also you can make reverse conversion from DOC to HTML using Aspose.Words and see what we output. Many things (but not all) are aware of roundtrip. It’s a good idea to perform tests on documents with rich formatting.
Some restrictions are also described in this table where you can find elements supported in HTML import:
https://releases.aspose.com/words/net
As a workaround we can suggest replacing what is not recognized with what is recognized. This can be done either manually or with regular expressions. Please let us know if you have further questions or need more help.
Regards,