Convert HTML to DocX- File Corrupt. Convert HTML to Doc- file doesn't fit

We are currently attempting to generate a word document from one of our HTML generated reports in .NET 3.5 using Aspose.Words 9.4.0.0. I have attached a .txt file containing the HTML that is generated through the RenderControl.

Case 1:
When we attempt to convert to Docx through our own response stream, we can generate the docx file but we receive the error that the file is corrupt upon opening the file in Word.

The exact error is:
*The file “filename” cannot be opened because there are problems with the contents.
Details - The file is corrupt and cannot be opened.

Word found unreadable content in “filename”. Do you want to recover the contents of this document? If you trust the source of this docuement, click Yes.*

The code we use for this case is:

Dim strWriter As New System.IO.StringWriter
Dim htmWriter As New System.Web.UI.HtmlTextWriter(strWriter)
Dim fileName As String = "Report" & lblContractNumber.Text
Me.RenderControl(htmWriter)

ExportToDocX(fileName, strWriter.ToString(), Response, 1)
'in another file/class
Public Sub ExportToDocX(ByRef fileName As String, ByRef htmlString As String, ByRef response As System.Web.HttpResponse, ByVal type As Integer)
    Select Case type
        Case 1
            ExportDocXTest(fileName, response, htmlString)
        Case 2
            Dim outStream As New System.IO.MemoryStream
            ExportToDocX(htmlString, outStream)
            OutputDocument(response, outStream, fileName, "docx")
    End Select
End Sub

Private Sub ExportDocXTest(ByRef fileName As String, ByRef response As System.Web.HttpResponse, ByRef htmlString As String)
    Dim docx As Document = New Document()
    Dim options As New Aspose.Words.Saving.OoxmlSaveOptions(Aspose.Words.SaveFormat.Docx)
    options.PrettyFormat = True

    Dim builder As DocumentBuilder = New DocumentBuilder(docx)
    builder.ParagraphFormat.Borders(Aspose.Words.BorderType.Top).LineStyle = Aspose.Words.LineStyle.Single
    builder.ParagraphFormat.Alignment = ParagraphAlignment.Center
    builder.InsertHtml(htmlString)

    docx.Save(response, fileName & ".docx", Words.ContentDisposition.Attachment, options)
End Sub

Case 2
When we convert to Docx through passing the response into the docx.Save method from Aspose we get the server error:
Server Error in ‘/’ Application.
A page can have only one server-side Form tag.

The code we use for this case is:

Dim strWriter As New System.IO.StringWriter
Dim htmWriter As New System.Web.UI.HtmlTextWriter(strWriter)
Dim fileName As String = "Report" & lblContractNumber.Text
Me.RenderControl(htmWriter)

ExportToDocX(fileName, strWriter.ToString(), Response, 2)
'in another file/class
Public Sub ExportToDocX(ByRef fileName As String, ByRef htmlString As String, ByRef response As System.Web.HttpResponse, ByVal type As Integer)
    Select Case type
        Case 1
            ExportDocXTest(fileName, response, htmlString)
        Case 2
            Dim outStream As New System.IO.MemoryStream
            ExportToDocX(htmlString, outStream)
            OutputDocument(response, outStream, fileName, "docx")
    End Select
End Sub

Private Sub ExportToDocX(ByRef htmlString As String, ByRef outStream As System.IO.MemoryStream)
    Dim docx As Document = New Document()
    Dim builder As DocumentBuilder = New DocumentBuilder(docx)
    builder.ParagraphFormat.Borders(Aspose.Words.BorderType.Top).LineStyle = Aspose.Words.LineStyle.Single
    builder.ParagraphFormat.Alignment = ParagraphAlignment.Center
    builder.InsertHtml(htmlString)
    docx.UpdateTableLayout()
    docx.Save(outStream, Words.SaveFormat.Docx)
End Sub

Private Function OutputDocument(ByRef response As System.Web.HttpResponse, ByRef outStream As System.IO.MemoryStream, ByRef fileName As String, ByVal OutputType As String)
    Dim outBuf() As Byte = outStream.GetBuffer()

    response.Expires = 0
    response.Buffer = True
    response.Cache.SetCacheability(HttpCacheability.NoCache)
    response.ClearHeaders()
    response.AddHeader("content-disposition", "attachment; filename=" & fileName & "." & OutputType)
    response.ClearContent()
    Select Case OutputType
        Case "doc"
            response.ContentType = "application/msword" 'Word 97-2003
        Case "docx"
            response.ContentType = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
        Case "pdf"
            response.ContentType = "application/x-pdf" 'Adobe PDF (download)
    End Select
    'Download Output 
    response.BinaryWrite(outBuf)
    response.End()
    outStream.Close()
End Function

What is causing these errors and/or file corruptions? And is there a way we can tell the converted html to fit inside the resulting document?

Hi
Thanks for your request. I cannot reproduce the problem on my side, using the latest version of Aspose.Words (9.5.0) and the following code:

' Get your HTML string
Dim html As String = File.ReadAllText("C:\Temp\htmlForDoc.htm")
Dim docBytes As Byte() = System.Text.Encoding.UTF8.GetBytes(html)
'Create MemoryStream from byte array
Dim docStream As New MemoryStream(docBytes)
'Now we can open Word document from stream
Dim doc As New Document(docStream)
doc.Save("C:\Temp\out.docx")

Best regards,

We have upgraded all Aspose DLLs to their most recent releases as of today.

After trying your sample code we do get an output file at the C:\Temp location. Here is our code from your example:

Private Sub AsposeVersion()
    HideButtons()
    Dim doc As New IMSGAspose

    Dim strWriter As New System.IO.StringWriter
    Dim htmWriter As New System.Web.UI.HtmlTextWriter(strWriter)
    Dim fileName As String = "Report" & lblContractNumber.Text
    Me.RenderControl(htmWriter)

    doc.ExportToDocXNew(fileName & ".docx", strWriter.ToString())
End Sub

Public Sub ExportToDocXNew(ByRef fileName As String, ByRef htmlString As String)
    Dim docBytes As Byte() = System.Text.Encoding.UTF8.GetBytes(htmlString)

    'Create MemoryStream from byte array
    Dim docStream As New MemoryStream(docBytes)

    'Open Word Doc From Stream
    Dim docx As New Document(docStream)
    docx.Save("C:\Temp" & fileName)
End Sub

However the server still gives the error:

Server Error in ‘/’ Application.
A page can have only one server-side Form tag.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.Web.HttpException: A page can have only one server-side Form tag.

Source Error: An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.

Stack Trace:

[HttpException (0x80004005): A page can have only one server-side Form tag.]
   System.Web.UI.Page.OnFormRender() +8700412
   System.Web.UI.HtmlControls.HtmlForm.RenderChildren(HtmlTextWriter writer) +33
   System.Web.UI.HtmlControls.HtmlContainerControl.Render(HtmlTextWriter writer) +32
   System.Web.UI.HtmlControls.HtmlForm.Render(HtmlTextWriter output) +51
   System.Web.UI.Control.RenderControlInternal(HtmlTextWriter writer, ControlAdapter adapter) +27
   System.Web.UI.Control.RenderControl(HtmlTextWriter writer, ControlAdapter adapter) +99
   System.Web.UI.HtmlControls.HtmlForm.RenderControl(HtmlTextWriter writer) +40
   System.Web.UI.Control.RenderChildrenInternal(HtmlTextWriter writer, ICollection children) +134
   System.Web.UI.Control.RenderChildren(HtmlTextWriter writer) +19
   System.Web.UI.Page.Render(HtmlTextWriter writer) +29
   System.Web.UI.Control.RenderControlInternal(HtmlTextWriter writer, ControlAdapter adapter) +27
   System.Web.UI.Control.RenderControl(HtmlTextWriter writer, ControlAdapter adapter) +99
   System.Web.UI.Control.RenderControl(HtmlTextWriter writer) +25
   System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint) +1266

Version Information: Microsoft .NET Framework Version:2.0.50727.4206; ASP.NET Version:2.0.50727.4205

Also, we do not need the file to be output to a specific location, we need it to popup a download dialog from the browser for the user to choose to Open/Save the file.

When the test file that was output was opened it didn’t produce any errors and the output fit into the word document nicely, so that is an improvement.

We have also provided a newer, more current HTML source file as well attached to this post.

Hi
Thanks for your inquiry. I cannot reproduce the problem on my side. Could you please try creating a simple application which will allow me to reproduce the problem on my side?
Please see the following link to learn how to send the document to the client browser:
https://reference.aspose.com/words/net/aspose.words/document/save/
Best regards,

Hi there,
Thanks for this additional information.
Does this error occur even with the Aspose code removed? This appears to be an error with the webpage design itself. Please do a quick google of the error for details.
Regarding sending the file to be opened in a dialog, you can use the Save method here to achieve this.
Thanks,

The server error still does. It is rather old code we are trying to add this onto, recently converted to .NET 3.5.

However we are still getting the file corrupt when we use the latest htmlForDocNewfile.txt file as the starting point and simply outputting via the method you provided (with the ContentDisposition as Attachment and the options set. When we simply save the output to a file instead of a dialog box the file is not corrupted.

Here is an example of us saving to a file:

Public Sub ExportToDocXNew(ByRef fileName As String, ByRef htmlString As String)
    Dim docx As New Document("C:\\TEMP\\htmlForDocxNewfile.txt")
    docx.Save("C:\\Temp" & fileName)
End Sub

This generates a docx file that is not corrupted.

However when we do this:

Public Sub ExportToDocXNew2(ByRef fileName As String, ByRef response As System.Web.HttpResponse)
    Dim outStream As New System.IO.MemoryStream
    Dim docx As New Document("C:\TEMP\htmlForDocxNewfile.txt")
    Dim options As New Aspose.Words.Saving.OoxmlSaveOptions(Aspose.Words.SaveFormat.Docx)
    options.PrettyFormat = True
    docx.Save(response, fileName & ".docx", Words.ContentDisposition.Attachment, options)
End Sub

The dialog box pops up and if we choose to Open in Word or to Save to disc the resulting file is corrupted as in our previous examples.

Is this something you can replicate? As the same file will be used for the initial document to convert.

Hi there,
Thanks for your inquiry.
Please try adding “Response.End()” directly after you save the document and see if that fixes it.
Thanks,

Adding Response.End() did eliminate the corrupt file issue.

Once thing we have noticed. When we build the file by doing a builder.InsertHtml(html) the resulting document doesn’t always fit into the normal parameters of a Word document.

When we build the file by declaring a new document using an existing text file with the html in it the resulting Word file has everything fitting neatly inside the document as expected.

Is there a way to get the two to behave similarly? Or is it best just to go through a temporary file in order to achieve the result we need (which is the properly indented and printable Word document)?

Hi there,
Thanks for this additional information.
You are correct, the behaviour of these two methods are different. I have logged this as an issue. We will inform you when a fix is avaliable.
In the mean time you can still use the constructor to load your HTML document by first loading your HTML string into a MemoryStream and then passing this to the Document constructor. Please see the code below.

// The string containing the HTML.
string html;
// Get the bytes from this string and load them into a new MemoryStream object.
MemoryStream stream = new MemoryStream(Encoding.UTF8.GetBytes(html));
// Pass this MemoryStream object to be loaded into a new Document object.
Document doc = new Document(stream);

Thanks,

Hi there.

I’m having a very similar problem to this one.

We use Aspose to generate docx documents.One of the requirements of the document generator is that it must be able to retrieve documents from the database, and attach it to the document being build.

The trouble is that the documents in the database is in doc format.

I can save a doc copy of the database document using ASPOSE, and open it without any problems. However, I need to convert it to docx before saving. Using the advice from https://docs.aspose.com/words/net/convert-a-document/, I’ve written the following code:

#region AttachExistingDocument
public override void AttachExistingDocument(Dictionary<string, byte[]> documentNameAndContent)
{
    Document _internalDoc = new Document();
    string fileName;
    foreach (KeyValuePair<string, byte[]> doc in documentNameAndContent)
    {
        MemoryStream docStream = new MemoryStream(doc.Value);
        _internalDoc = new Document(docStream);
        fileName = GlobalSettings.TemporaryFolder + Guid.NewGuid().ToString() + doc.Key;
        _internalDoc.Save(fileName, Aspose.Words.SaveFormat.Docx);
        _internalDoc = new Document(fileName);
        if (_internalDoc != null)
        {
            foreach (Section section in _internalDoc.Sections)
            {
                Section newSection = (Section)_document.ImportNode(section, true, ImportFormatMode.KeepSourceFormatting);
                _document.Sections.Add(newSection);
                _docBuilder.MoveToDocumentEnd();
                // AddSection(appendNode, false);
            }
        }

    }
}
#endregion

_document and _docBuilder is defined someplace else, and represents the document being constructed.

The trouble is, when I try to open the document in docx format, I get the following errors:

The file XXX.doc cannot be opened because there are problems with the contents. Details: The file is corrupt and cannot be opened. Location: Part: /word/header3.xml, Line1, Column 3005.

When I click ok, I get the following message:

Word found unreadable content in XXX.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes.

When I click yes, the document opens, appearing exactly as it should. I think the conversion is not functioning properly. When I try to open the document created on this line:

_internalDoc.Save(fileName,Aspose.Words.SaveFormat.Docx);

I get the same error.

The files version on Aspose.Words.dll is 9.3.0.0

Regards

Hanno

Hello.

Thanks for your request. Could you please attach you input and output documents here for testing? I will check them and provide some feedback.

Best regards,

Hi Vladimir.

Unfortunately, I cannot provide you with the input document, as it only exist as a bit array in our database.

However, When I render it as a .doc file using ASPOSE, the file opens without any issues, so I have to assume that it is a reasonably exact copy of what I have in the database. (Attached as test.doc)

Also attached is test.docx, which has been converted from doc by ASPOSE, and which gives the error.

Regards

Hello
Thank you for reporting this problem to us and for the additional information. I managed to reproduce the problem on my side. Your request has been linked to the appropriate issue. You will be notified as soon as it is fixed.
Best regards,

The issues you have found earlier (filed as WORDSNET-5218) have been fixed in this .NET update and in this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

Thank you. It seem to have resolved the issue.

Regards

The issues you have found earlier (filed as WORDSNET-2057) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(3)