DOCX to HTML conversion does not look good

We have a document, which doesn’t look good after the conversion. aspose-evaluation.zip (123.7 KB)

Is there a way to handle this case?


This Topic is created by imran.rafique using the Email to Topic plugin.

@guo.maleo

Thanks for your inquiry. Please note Aspose.Words mimics MS Word behavior. When we convert your source document to HTML using MS Word, it produces same results. However, you may try HtmlFixed file format, HTML format using absolutely positioned elements. Please check following sample code snippet for the purpose.

AW_179_fixed.zip (104.1 KB)

Document document = new Document("EN - LAMP STACK.docx");
document.save("AW_179_fixed.html",SaveFormat.HTML_FIXED);

@tilal.ahmad

I just tried using the HTML_FIXED format, but the problem is we are converting word stream to html stream in memory. So we had some other options specified:

    WORD2HTML_OPTS.setExportRoundtripInformation(false);
    WORD2HTML_OPTS.setExportImagesAsBase64(true);
    WORD2HTML_OPTS.setPrettyFormat(false);
    WORD2HTML_OPTS.setAllowNegativeIndent(true);
    //Use embedded CSS 
    WORD2HTML_OPTS.setCssStyleSheetType(1);
    WORD2HTML_OPTS.setUseAntiAliasing(false);
    
    WORD2HTML_OPTS.setMemoryOptimization(true);

    //When set to true, exports drop-down form fields as normal text. When false, exports drop-down form fields as SELECT element in HTML.
    WORD2HTML_OPTS.setExportDropDownFormFieldAsText(true);
    WORD2HTML_OPTS.setExportTextInputFormFieldAsText(true);
    
    //Use 1 for HTML 5
    WORD2HTML_OPTS.setHtmlVersion(1);

Looks like it doesn’t allow me to specify the format.

@guo.maleo

Thanks for your feedback. Please note it is different File format than HTML as it is fixed page format that has absolutely positioned elements. It has different properties than HtmlSaveOptions. Please check API reference for complete list of HtmlFixedSaveOptions properties.

Thanks for you answer.
I see the output looks better, but it looks like the file size increased quite a lot as svg is being used. so we need to find a balance. Is there a way to check if a document needs to be converted to HTML_FIXED or not?

We observed that on windows, we have good performance, but on linux the performance is not good enough, is there any performance difference on windows and linux, and if so, how can we optimize the performance in linux?

@guo.maleo

Thanks for your feedback. To save image as raster image, you should just set HtmlFixedSaveOptions.ExportEmbeddedSvg to false and use ResourceSavingCallback for image processing. Furthermore, I am afraid there is no option available to decide whether document need to be render as HtmlFixed or HTML.