Cell Width is Changed after HTML to DOCX Conversion using Java

Hello,

We are seeing an issue where HTML containing two tables with same width settings renders incorrectly when imported into a Word Document. Although the tables contain identical width specifications for the columns in both tables - the columns become misaligned in the Word Document.

This behavior can be seen in the latest Aspose Cells for Java version 20.12, the attached MisalignedTables.html file and the following Java code:

final String htmlSrc = [PATH] + "MisalignedTables.html";

// create a new Document and DocumentBuilder
Document wdDoc = new Document();
DocumentBuilder builder = new DocumentBuilder(wdDoc);
builder.moveToDocumentEnd();

// load and insert the SVG source into the doc
String html = new String(Files.readAllBytes(Paths.get(htmlSrc)), "UTF-8");
builder.insertHtml(html, false);

final String newDoc = htmlSrc.replace(".html", ".docx");
Files.deleteIfExists(Paths.get(newDoc));
wdDoc.save(newDoc);
System.out.println("Saved Word Document:  " + newDoc);

Running the above code should produce a new MisalignedTables.docx file similar to the one attached.

Key Observations:

  • Opening the MisalignedTables.html file in a Browser (such as Firefox or Edge) - renders the two tables as expected, where the columns are perfectly aligned.
  • Opening the generated DOCX in MS Word shows the two tables having the same widths (as expected), but the columns are misaligned. This becomes more obvious when the “show gridlines” options is turned on in MS Word.

Environment Details:

  • Aspose Cells for Java 20.12
  • Java version 1.8.0_211
  • Windows 10 OS (but also reproducible under Linux).

File description from the MisalignedTables.zip (13.4 KB) attachment:

  • MisalignedTables.html: Source HTML file used by the code above.
  • MisalignedTables.docx: DOCX file generated by the code above on our environment.
  • MisalignedTables.png: PNG screenshot of the HTML rendered in Firefox. This image represents the expected rendition in Word after the HTML is imported.

Thank you!

@oraspose

The input HTML has seven columns instead of two. The output Word document looks correct. Please note that Aspose.Words mimics the behavior of MS Word. If you convert your HTML to DOCX using MS Word, the output will not be correct. However, output generated by Aspose.Words for Java looks good.

Hi Tahir,

Thanks for looking at this issue. There may be a misunderstanding. To be clear, there are two tables and each table contains seven columns. If you examine the HTML, the column widths between the two tables are identical (e.g. the column width of column 1 in both tables is the same, etc). Even though there is a paragraph separating the two tables, we expect that the columns align when rendered in Word. However, the column widths between the two tables are slightly different.

We certainly agree that output generated by Aspose Words looks good, but we are hoping the column widths are true to what is specified within the HTML.

Thanks again.

@oraspose

We have logged this problem in our issue tracking system as WORDSNET-21563. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

@oraspose

We have closed WORDSNET-21563 issue as ‘Not a Bug’. Please use following code example to get the desired output.

Aspose.Words.Document doc = new Aspose.Words.Document(MyDir + "MisalignedTables.html");
foreach (Table table in doc.GetChildNodes(NodeType.Table, true))
{
    table.AllowAutoFit = false;
}
doc.Save(MyDir + "21.5.0.docx");