Free Support Forum - aspose.com

Output from Word to HTML conversion is very large

Hi,


I am trying to use Aspose.Words to convert a series of Word Documents into HTML. My code is very simple, it just creates a Document object and saves it to an HTML output file using the save method, which causes Aspose to convert the document appropriately.

The problem I am having is that the output is extremely large. A one page Word document generated 17,000 characters in the HTML file. This is mainly caused by styles being repeatedly defined on each element in the output, even if they are exactly the same.

Is there any way to get Aspose to try to generate smaller output files by referencing style definitions rather than defining them repeatedly?

Hi

Thanks for your request. Sure you can, you can try using embedded or external CSS to achieve this:

http://www.aspose.com/documentation/java-components/aspose.words-for-java/com/aspose/words/htmlsaveoptions.html#CssStyleSheetType

Also, you can call JoinRunsWithSameFormatign method to reduce number of runs:

http://www.aspose.com/documentation/java-components/aspose.words-for-java/com/aspose/words/document.html#joinRunsWithSameFormatting()

Hope this helps.

Best regards,

Hi Alexey,

Thanks for the quick reply. This does look useful.

I am using Aspose.Words jdk 15, am I correct in thinking that the HtmlSaveOptions class which contains the CSS options you described is not part of this version? Would I have to upgrade to get it?

Thanks,
Scott

Hi Scott,

Thanks for your request. jdk 1.5 is not a version of Aspose.Words library, but the version of JDK it is built for. To check version of Aspose.Words for Java you should unzip Aspose.Words.jdkXX.jar, open META-INF\ MANIFEST.MF file in notepad, you will see the following:

Manifest-Version: 1.0

Specification-Title: Aspose.Words for Java

Implementation-Title: Aspose.Words for Java

Specification-Version: 4.0.0.0

Implementation-Version: 4.0.0.0

Specification-Vendor: Aspose Pty Ltd

Implementation-Vendor: Aspose Pty Ltd

Copyright: Copyright 2003-2009 Aspose Pty Ltd

HtmlSaveOptions was added starting from 10.0.0 version of Aspose.Words.

Best regards,

I see, thanks Alexey.

I am using an earlier version of Aspose.Words so I guess I will not have access to that feature.

I tried it on a trial version of the newest Aspose.Words release and using the HtmlSaveOptions embedded CSS mode did reduce the size of the document by much there were still many repeated style definitions which made the document quite large (around 15000 characters rather than the 17000 it was before)

Hi Scott,

Thanks for your request. Unfortunately, there is no way to further reduce size of generated HTML.

However, if you need to control how the document is exported to HTML, you can try creating your own HTML converter. You can use the same approach as suggested here:

http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/howto-extract-content-using-documentvisitor.html

In this case, you can export only simple HTML.

Best regards,