HTML to DOCX Conversion | Avoid Table Contents from Splitting across Multiple Tables | C# .NET

Hi,

I am using Aspose.Word 20.11.0 (Licensed Version) to convert html to word. In word (.docx) file, the table is split to multiple nested table.

For you reference, I have attached both files (html and docx)
.

Please let me know how to fix this.

Thanks

Output.docx (10.6 KB)
Input.zip (2.5 KB)

@HassanNorthbay,

We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system with ID WORDSNET-22740. We will further look into the details of this problem and will keep you updated here on the status of correction. We apologize for your inconvenience.

@awais.hafeez

Any updates on WORDSNET-22740 issue?

it has been a while since this issue is reported.

@HassanNorthbay

Your issue has been scheduled for December 2021 release. Please note that this ETA is not final at the moment. However, we will inform you via this forum thread once this issue is resolved.

Hi,
Is there any update on it? Has this issue been resolved in December’21 release?

@kainat123 Unfortunately the issue is not resolved yet. The estimate was shifted and currently the issue is planned to be resolved in the 22.2 February release of Aspose.Words.

Hi,
Has this issue been resolved in the Feb release?

@kainat123 Unfortunately the fix was not included into the most recent 22.2 version of Aspose.Words. The estimate was shifted again and currently the issue is planned to be resolved in the 22.3 March release of Aspose.Words.
The responsible developer assured me that we will be able to deliver the fix before the next release if everything goes smoothly. Once again, please, accept our apologizes for your inconvenience.

Hi,
Has this issue been resolved? If not, please share the estimated date

@kainat123 Unfortunately, the issue has not been fixed in 22.3 version of Aspose.Words. I have asked the responsible developer to take a look at it shortly.

Hi,
Do you have any updates regarding above mentioned issue?

@kainat123 The issue is already resolved in the current 22.4 version of Aspose.Words. You should add the following code to preserve DIVs in the document, like MS Word does:

HtmlLoadOptions loadOptions = new HtmlLoadOptions();
// Enable the new import mode.
loadOptions.BlockImportMode = BlockImportMode.Preserve;
Document doc = new Document("Input.html", loadOptions);
doc.Save("put.docx");

Hi,
I have upgraded the version and added the above mentioned code. But the issue didn’t get resolved. Would you please check ?

@kainat123 I have checked and the output document looks correct on my side. Could you please attach your output and expected output documents? We will check and provide you more information.

Hi, I have shared the html file along with the screenshot of outputs. Please give it a look.Html File With Output.zip (329.3 KB)

@kainat123 Unfortunately, I still cannot reproduce the problem.

HtmlLoadOptions loadOptions = new HtmlLoadOptions();
// Enable the new import mode.
loadOptions.BlockImportMode = BlockImportMode.Preserve;
Document doc = new Document(@"C:\Temp\in.html", loadOptions);
doc.Save(@"C:\Temp\out.docx");

Please make sure load options are passed into the Document constructor.

I am using encoding in code as mentioned below :
htmlLoadOptions.Encoding = Encoding.UTF8;

And the provided solution don’t work properly with the UTF8 encoding. Can you please check?

@kainat123 Still cannot reproduce the problem on my side:

HtmlLoadOptions loadOptions = new HtmlLoadOptions();
// Enable the new import mode.
loadOptions.BlockImportMode = BlockImportMode.Preserve;
loadOptions.Encoding = Encoding.UTF8;
Document doc = new Document(@"C:\Temp\in.html", loadOptions);
doc.Save(@"C:\Temp\out.docx");

Borders are applied properly.

borders are fine but bullets don’t appear correct after applying the above mentioned code. Screenshot has been attached.bullets issue.PNG (26.9 KB)

@kainat123 The problem occurs because encoding is specified improperly. If let Aspose.Words to detect the encoding, bullets are converted properly. Aspose.Words detect the encoding as 1252 Western European (Windows).
You can check what encoding Aspose.Words detect using FileFormatUtil:

FileFormatInfo info = FileFormatUtil.DetectFileFormat(@"C:\Temp\in.html");
Console.WriteLine(info.Encoding.CodePage);
Console.WriteLine(info.Encoding.EncodingName);