Convert .docx to .html using Java Code | Preserve Alignment of Tables | Alternate 'HTML Fixed' File Format

@tahir.manzoor

I am trying to convert .docx to .html . It’s convert properly but table alignment is not proper.
input file is “Sample 0011 (4).docx”
output file is “example.html”
code is “Test.java”
my expected out put will be same as document.

Test.zip (138.6 KB)
image.png (48.1 KB)

@rabinintig,

Aspose.Words tries to mimic the behavior of MS Word; we have converted your ‘Sample 0011 (4).docx’ document to HTML format by using Aspose.Words 20.2 and MS Word 2019 and attached them here for your reference:

You can observe that problematic tables are exported in HTML in the same way by both Aspose.Words and MS Word 2019.

Alternatively, you can try saving to HTML FIXED format by using the following code:

Document doc = new Document("E:\\Temp\\TEST\\Sample 0011 (4).docx");
HtmlFixedSaveOptions opts = new HtmlFixedSaveOptions();
doc.save("E:\\temp\\TEST\\awjava-20.1.html", opts);

Converted HTML FIXED file is also attached here for your reference:

@awais.hafeez
thanks.
But i am facing some problem to set property. Before i used "HtmlSaveOptions " . But now i am trying to add bellow property in “HtmlFixedSaveOptions” and i am not able to add.
can u please help me to add bellow property in “HtmlFixedSaveOptions”
HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.HTML);
options.setExportTextInputFormFieldAsText(true);
options.setImagesFolder(imagesDir.getPath());
options.setExportXhtmlTransitional(true);

@rabinintig,

Instead of ImagesFolder, you can use ResourcesFolder property of HtmlFixedSaveOptions. We have also logged the following two issues in our issue tracking system:

WORDSNET-20003: Add ExportXhtmlTransitional to HtmlFixedSaveOptions
WORDSNET-20004: Add property ExportTextInputFormFieldAsText to HtmlFixedSaveOptions

Your thread has also been linked to these issues and you will be notified as soon as the required options will be available in HtmlFixedSaveOptions class. Sorry for the inconvenience.

@rabinintig,

It is to inform you that ExportXhtmlTransitional=true writes the XHTML Transitional DOCTYPE to generated HTML documents. However, fixed-page HTML documents use the HTML 5 syntax and the only valid DOCTYPE declaration for them is <!doctype html>, which is always generated by Aspose.Words. We cannot write XHTML Transitional DOCTYPE to HTML 5 documents.

Please tell us, why do you need this property and what should be its effect on fixed-page HTML documents?

@awais.hafeez
hi
Can u please help me to solve my problem using my existing code what i share “Test.java”.

@rabinintig,

As mentioned in my previous post, Aspose.Words tries to mimic the behavior of MS Word. It is not always guaranteed that the Aspose.Words and MS Word generated output HTML files will look exactly the same in web browsers as the input Word documents. This is because of file format differences between Word and HTML (and some other limitations).

Can you please provide your expected HTML file showing the desired output here for our reference?

@rabinintig,

Regarding WORDSNET-20004, we have completed the work on this issue and concluded to close this issue as “Won’t fix”. When saving in the HtmlFixed format, text input form fields are written as text by default (this is controlled by HtmlFixedSaveOptions.ExportFormFields, which is set to false by default).

The issues you have found earlier (filed as WORDSNET-20004) have been fixed in this Aspose.Words for Java 24.4 update.