I am sure this is due to how nasty the word document code is stored by microsoft. I have noticed quite often that there are unnecessary spans in the generated HTML. Quite often a span ends mid word and begins again mid word. For example…
<font face="Garamond">ASPOSE.WORDS</font>
Would look like:
<span style="color=:#ff0000;font-style:italic;font-size:12pt;font-family:Garamond;">AS</span><span style="color=:#ff0000;font-style:italic;font-size:12pt;font-family:Garamond;">POSE.WO</span><span style="color=:#ff0000;font-style:italic;font-size:12pt;font-family:Garamond;">RDS</span>
The above is an example of two unnecessary spans with long definitions. I went through some documents and cleaned them up. It dramatically reduced the size of the generated HTML.
I suggest that Aspose.Words looks to see if a "</span><span"
is exactly the same and removes unnecessary spans. The above HTML would then look much cleaner like this:
<span style="color=:#ff0000;font-style:italic;font-size:12pt;font-family:Garamond;">ASPOSE.WORDS</span>