HTML to DOCX conversion hangs using C#

Hi,

I am facing an issue when converting Html to Word (docx) documents using Aspose.Words nuget package version (19.11.0) in .Net application and the conversion process stops / hangs and does not resume unless the calling code ignores it or times out.

No error is thrown as the conversion process halts and does not recover.

I am using the following code to convert document(s):

        var htmlLoadOptions = new HtmlLoadOptions();
        htmlLoadOptions.PreferredControlType = HtmlControlType.StructuredDocumentTag;

        // Setting the Enconding
        htmlLoadOptions.Encoding = Encoding.UTF8;

        // Create a new class implementing IWarningCallback which collect any warnings produced during document save.
        var callback = new HandleDocumentWarnings();

        // We assign the callback to the appropriate save options class. In this case, we are going to save to Word
        // so we create a HtmlLoadOptions class and assign the callback there.
        htmlLoadOptions.WarningCallback = callback;

        // Load the Html document into memory
        var document = new Document(info.File, htmlLoadOptions);
        
        foreach (Table table in document.GetChildNodes(NodeType.Table, true))
        {
            foreach (Row row in table.Rows)
            {
                row.RowFormat.AllowBreakAcrossPages = false;
            }
        }

        info.File.Close();

        // Convert the document to a different format and save to stream
        var streamResult = new MemoryStream();

        var options = SaveOptions.CreateSaveOptions(SaveFormat.Docx);
        options.TempFolder = WorkingDirectory;
        document.Save(streamResult, options);
            
            streamResult.Position = 0;

            return new ConvertedDoc() { Content = streamResult, CountPages = document.PageCount };

Can you please examine the data to check why this is the case as a lot of delay is being added due to such conversions that keep the calling process busy until it times out (usually after an hour of waiting). A sample file is attached for your reference to reproduce the issue.

Thanks.
sample1.zip (35.7 KB)

@zallauddin

We have tested the scenario using the latest version of Aspose.Words for .NET 20.2 and have not found the shared issue. So, please use Aspose.Words for .NET 20.2.

Hi @tahir.manzoor,
I tried as per your recommendation but the latest version (20.2) of Aspose.Words has the same effect and reproduces the error I was having.
I have traced the issue to a property “PageCount” of Document class, a read reference can be observed in the last line where code is returning. Can you please check why reading this property after saving the document the process just hangs?

@zallauddin

Please make sure that you are using the same code and document at your end. We tested the scenario at Windows 10 and have not found the shared issue. Could you please share what .NET framework you are using?

Hi @tahir.manzoor,

I was trying the conversion on Windows 10 and my .Net Framework version is 4.7.2 but I don’t think its an issue with the environment and framework. The process gets stuck each time I read the property “PageCount” after saving the document for the document provided.

@zallauddin

Thanks for sharing the detail. We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-20010. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

The issues you have found earlier (filed as WORDSNET-20010) have been fixed in this Aspose.Words for .NET 21.1 update and this Aspose.Words for Java 21.1 update.