We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

HTML to Docx error on tables

Hi

I was trying to convert a HTML file to docx with Aspose.Words 22.5, and I received the next error

Aspose.Words.FileCorruptedException -> The document appears to be corrupted and cannot be loaded.
    inner exception -> Object reference not set to an instance of an object.
        at  ​ .(Paragraph ,     , Boolean )
        at  ​ .(Table )
        at    . ( ​  )
        at    .( ​  , Boolean ,     )
        at    . ( ​  , Boolean )
        at    .( ​  , Boolean )
        at    .(Stream , Encoding , DocumentBuilder )
        at  ​ .     ()
        at Aspose.Words.Document.(Stream , LoadOptions )

After did some test, I found that the issues is on a consecutive tables with next structure (the second one is on a <div></div>)

<body style="font: 10pt Times New Roman, Times, Serif">
    <div>
        <table>
            <tr>
                <td>a</td>
            </tr>
        </table>
        <div>
            <table>
                <tr>
                    <td>b</td>
                </tr>
            </table>
        </div>
    </div>
</body>

I tested the next scenarios:

  • If i remove the internal div (body/div/div). It works
  • If i remove the content on the second table, It works
  • If I separate the tables with an empty div, <div></div>, I works
  • If i separate the tables with an empty paragraph (<span>&#xa0;</span>), It works

Thanks,
Lisandro

@Lisandro.Ronconi I was unable to reproduce the issue. Please, ZIP and attach the source code and source document here we will check the issue and provide you more information.

Thanks @Vadim.Saltykov for your response
Yes sure… I attached and project example here

Test.Aspose.Words.zip (5.1 KB)

Thanks you

@Lisandro.Ronconi Thank you for additional information. I have managed to reproduce the problem on my side and logged it as WORDSNET-24029. We will keep you informed and let you know once it is resolved.
The problem is caused by the following option set in your code:

htmlLoadOptions.BlockImportMode = BlockImportMode.Preserve;

@alexey.noskov Thanks for your response
Yes, we can not remove this line because we want to preserve the DIVs on the documents
Let me know if you have any news about the fix

Thanks you

@Lisandro.Ronconi Thank you for additional information. The issue is currently is in a queue for analysis. We will be sure to update you once the issue is fixed or we have more information for you.