Aspose.Words.Document throws FileCorruptedException while loading HTML using C#

Hi, I am facing an issue when converting a document from Html to Word (docx) with Aspose.Words for .Net nuget package version (19.11.0) and seeing this error being logged for hundreds of documents:

Message: The document appears to be corrupted and cannot be loaded.,
InnerException: Specified argument was out of the range of valid values.
Parameter name: distanceFromText,
StackTrace: at Aspose.Words.Document.(Stream , LoadOptions )
at Aspose.Words.Document.(Stream , LoadOptions )

I am using the following code to convert documents:

    var htmlLoadOptions = new HtmlLoadOptions();
    htmlLoadOptions.PreferredControlType = HtmlControlType.StructuredDocumentTag;

    // Setting the Enconding
    htmlLoadOptions.Encoding = Encoding.UTF8;

    // Create a new class implementing IWarningCallback which collect any warnings produced during document save.
    var callback = new HandleDocumentWarnings();

    // We assign the callback to the appropriate save options class. In this case, we are going to save to Word
    // so we create a HtmlLoadOptions class and assign the callback there.
    htmlLoadOptions.WarningCallback = callback;

    // Load the Html document into memory
    var document = new Document(info.File, htmlLoadOptions);
    
    foreach (Table table in document.GetChildNodes(NodeType.Table, true))
    {
        foreach (Row row in table.Rows)
        {
            row.RowFormat.AllowBreakAcrossPages = false;
        }
    }

    info.File.Close();

    // Convert the document to a different format and save to stream
    var streamResult = new MemoryStream();

    var options = SaveOptions.CreateSaveOptions(SaveFormat.Docx);
    options.TempFolder = WorkingDirectory;
    document.Save(streamResult, options);

I have reason to believe that the HTML files being loaded are not corrupt and it is hard for us to keep track of file validity / correctness when they are hundreds and thousands to convert.

Can you please check if this is a bug in your component or the data?. A sample is attched for your reference.
Thanks in advance.aspose_error_details.zip (89.4 KB)

@zallauddin

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-19888. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Hi @tahir.manzoor,

Any rough idea about the resolution ETA for this particular issue?

@zallauddin

We try our best to deal with every customer request in a timely fashion, we unfortunately cannot guarantee a delivery date to every customer issue. We work on issues on a first come, first served basis. We feel this is the fairest and most appropriate way to satisfy the needs of the majority of our customers.

Currently, your issue is pending for analysis and is in the queue. Once we complete the analysis of your issue, we will then be able to provide you an estimate.

@zallauddin

Please use the latest version of Aspose.Words for .NET 20.3 to avoid the shared exception. Hope this helps you.

The issues you have found earlier (filed as WORDSNET-19888) have been fixed in this Aspose.Words for .NET 20.7 update and this Aspose.Words for Java 20.7 update.