Free Support Forum - aspose.com

HTML Output - Formatting and Length

Hi Folks,

I've been developing a solution for converting Word docs into HTML that in turn becomes a Knowledge Document (KD) in CA Service Desk Manager (SDM).

My challenge is the target system (SDM) - it cannot support a KDs longer than 32K.

I'm looking for a balance between format and size.

Could you suggest some save options for HTML that may be beneficial?

I've noted that some documents I convert seem to have redundant tags pertaining to text attributes repeatedly even though it's unnecessary.

Thanks, Jeff

Hi Jeff,


Thanks for your inquiry. Please note that Aspose.Words tries mimic the same behavior as MS Word do. Aspose.Words provides some additional options when saving a document into HTML. Please read the detail of HtmlSaveOptions from here:
http://www.aspose.com/docs/display/wordsnet/HtmlSaveOptions+Class

You can call Document.JoinRunsWithSameFormatting method to reduce number of runs in the document. Some documents contain adjacent runs with same formatting. Usually this occurs if a document was intensively edited manually. You can reduce the document size and speed up further processing by joining these runs.

Unfortunately, currently there is no other ways to reduce output HTML size. However, if you need to control how the document is exported to HTML, you can try creating your own HTML converter. You can use the same approach as suggested here:

http://www.aspose.com/docs/display/wordsnet/How+to++extract+content+using+documentvisitor