HTML to Docx error on tables

Hi

I was trying to convert a HTML file to docx with Aspose.Words 22.5, and I received the next error

Aspose.Words.FileCorruptedException -> The document appears to be corrupted and cannot be loaded.
    inner exception -> Object reference not set to an instance of an object.
        at  ​ .(Paragraph ,     , Boolean )
        at  ​ .(Table )
        at    . ( ​  )
        at    .( ​  , Boolean ,     )
        at    . ( ​  , Boolean )
        at    .( ​  , Boolean )
        at    .(Stream , Encoding , DocumentBuilder )
        at  ​ .     ()
        at Aspose.Words.Document.(Stream , LoadOptions )

After did some test, I found that the issues is on a consecutive tables with next structure (the second one is on a <div></div>)

<body style="font: 10pt Times New Roman, Times, Serif">
    <div>
        <table>
            <tr>
                <td>a</td>
            </tr>
        </table>
        <div>
            <table>
                <tr>
                    <td>b</td>
                </tr>
            </table>
        </div>
    </div>
</body>

I tested the next scenarios:

  • If i remove the internal div (body/div/div). It works
  • If i remove the content on the second table, It works
  • If I separate the tables with an empty div, <div></div>, I works
  • If i separate the tables with an empty paragraph (<span>&#xa0;</span>), It works

Thanks,
Lisandro

@Lisandro.Ronconi I was unable to reproduce the issue. Please, ZIP and attach the source code and source document here we will check the issue and provide you more information.

Thanks @Vadim.Saltykov for your response
Yes sure… I attached and project example here

Test.Aspose.Words.zip (5.1 KB)

Thanks you

@Lisandro.Ronconi Thank you for additional information. I have managed to reproduce the problem on my side and logged it as WORDSNET-24029. We will keep you informed and let you know once it is resolved.
The problem is caused by the following option set in your code:

htmlLoadOptions.BlockImportMode = BlockImportMode.Preserve;

@alexey.noskov Thanks for your response
Yes, we can not remove this line because we want to preserve the DIVs on the documents
Let me know if you have any news about the fix

Thanks you

@Lisandro.Ronconi Thank you for additional information. The issue is currently is in a queue for analysis. We will be sure to update you once the issue is fixed or we have more information for you.

Hi

I saw that the issue was moved to “Status : Planned”, Were you able to find a solution to the problem?
Do you have a possible release date for this fix?

Thanks for your answers

@Lisandro.Ronconi Currently the issue is scheduled to be addressed in 22.10 (October 2022) version of Aspose.Words. But please note, this is a rough estimate, it can be shifted and you cannot 100% rely in it.

Hi, Is there any update on the above mentioned issue. Would you please let us know the estimated date for the resolution of this issue?

@kainat123 We already implemented a draft of fix. The code is currently in review. If everything goes smoothly the fi will be included into the next version of Aspose.Words.

Hi,
Has the issue been fixed in latest release?

@kainat123 The issue is already resolved in the current codebase. The fix will be included into the next 22.11 version of Aspose.Words, which is going to be released in a week or too. We will be sure to let you know once it is available.

The issues you have found earlier (filed as WORDSNET-24029) have been fixed in this Aspose.Words for .NET 22.11 update also available on NuGet.