Issue after convert HTML to Docx

ms.docx is cutting from the right side and “Page 1 of 1” is coming to left while in html it is on right.

@Sudrashya ms.docx has been produced by MS Word. Aspose.Words in most case mimic MS Word upon conversion from HTML to Word.

@alexey.noskov : I tried your solution but it is not working with html elements exists in right side. it is taking all the “float:right” elements to the left side of the document.
I have one image and “Page 1 of 1” in right side but after conversion those items came to left side. Please find the input file with image. Please try it and let me know.

Test.zip (4.7 KB)

@alexey.noskov: Please refer the out.docx also. last text “Page 1 of 1” is coming in the left side whereas in the html it is on right side.

@Sudrashya This is the expected behavior of Aspose.Words. You should note that Aspose.Words is designed to work with MS Word documents. There is no analog of DIV elements in MS Word documents, so the DIV s are converted to paragraphs in Aspose.Words DOM. In this case Aspose.Words behaves the same way as MS Word does.

@alexey.noskov: So we can’t convert the html document to word with same style?
Any other product to achieve the same as we have license for ASPOSE total.

@Sudrashya Originally, you have used Aspose.HTML for HTML to Word conversion. My colleagues from Aspose.HTML team will answer you shortly. I am from Aspose.Words team.

@alexey.noskov: I can use ASPOSE word as well. If you provide me the solution as I have ASPOSE Total license. So just confirm me with ASPOSE word it is not possible to convert html document to word with same style. “as we see in HTML”.
attaching again the html input file for your reference.

Test.zip (4.7 KB)

@Sudrashya As I have mentioned Aspose.Words is designed to work with MS Word documents and upon loading HTML document, the document is converted to Aspose.Words DOM. The significant different in HTML and MS Word documents models does not allow to provide 100% fidelity after conversion. So, I am afraid, it is not possible to get MS Word document that looks exactly the same as HTML opened in the browser using Aspose.Words.

@Sudrashya

We used below code snippet and Aspose.PDF for .NET to convert your HTML into DOCX. Please check the attached output file and let us know in case you still find issues in it:

Document doc = new Document(dataDir + "input.html", new HtmlLoadOptions());

DocSaveOptions saveOptions = new DocSaveOptions()
{
 Format = DocSaveOptions.DocFormat.DocX,
 Mode = DocSaveOptions.RecognitionMode.EnhancedFlow
};

doc.Save(dataDir + "output.docx", saveOptions);

output.docx (63.8 KB)

@asad.ali: No it is not as per the expectation, If you see the alignment of text is not as per HTML document. Please try with attached HTML. you will see bad alignment if document contains image as well.
Test.zip (4.7 KB)

@Sudrashya

We now tested the case with Aspose.HTML for .NET 23.1 and noticed the similar issue that you mentioned about image alignment. We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): **HTMLNET-4246**

We will let you know once the ticket is resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.

@alexey.noskov: Now we are using ASPOSE Word to covert HTML to Word document.
When ASPOSE convert HTML to WORD document it leaves some space on top of the document for header section.
Is it possible to start the conversion from the top of the page OR any way to remove the header section. So that conversion can start from the top of document.
We are using the code below:

Aspose.Words.Document doc = new Aspose.Words.Document(@"C:\Temp\in.html");

// table in the source HTML are too wide to fit the page.
// Fix this by autofiting the table to window.
foreach (Aspose.Words.Tables.Table t in doc.GetChildNodes(NodeType.Table, true))
    t.AutoFit(AutoFitBehavior.AutoFitToWindow);

doc.Save(@"C:\Temp\out.docx");

@Sudrashya You can reset page margins to zero:

Aspose.Words.Document doc = new Aspose.Words.Document(@"C:\Temp\in.html");

// Reset section margins.
foreach (Aspose.Words.Section s in doc.Sections)
{
    s.PageSetup.TopMargin = 0;
    s.PageSetup.BottomMargin = 0;
    s.PageSetup.LeftMargin = 0;
    s.PageSetup.RightMargin = 0;
}

// table in the source HTML are too wide to fit the page.
// Fix this by autofiting the table to window.
foreach (Aspose.Words.Tables.Table t in doc.GetChildNodes(NodeType.Table, true))
    t.AutoFit(AutoFitBehavior.AutoFitToWindow);

doc.Save(@"C:\Temp\out.docx");

@alexey.noskov Thanks for response. I have one issue with table conversion from HTML to WORD using ASPOSE.WORD. Table header is not consistent with data in the table. We can’t change AutoFitBehavior.FixedColumnWidths. Using below code for conversion

Aspose.Words.Document document = new Aspose.Words.Document(new MemoryStream(Encoding.UTF8.GetBytes(html)));
foreach (Aspose.Words.Section s in document.Sections)
{
    s.PageSetup.TopMargin = 10;
    s.PageSetup.BottomMargin = 10;
    s.PageSetup.LeftMargin = 50;
    s.PageSetup.RightMargin = 0;
}
// table in the source HTML are too wide to fit the page.
// Fix this by autofiting the table to window.
foreach (Aspose.Words.Tables.Table t in document.GetChildNodes(NodeType.Table, true))
{
    t.AutoFit(AutoFitBehavior.FixedColumnWidths);
}
document.Save(outputFilePath);

Please let me know what to do? html is attached.

template.zip (597 Bytes)

@Sudrashya In this particular case the document is imported properly without table postprocessing:
out_without_table_postprocessing.docx (7.8 KB)

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-24918

You can obtain Paid Support services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@alexey.noskov: Is this issue because of “colspan”. when HTML table has colspan property, table header and data get distorted (alignment issue).

How (Code) you have generated the docx file? it seems correct to me.

@Sudrashya I have used the following code:

Aspose.Words.Document document = new Aspose.Words.Document(new MemoryStream(Encoding.UTF8.GetBytes(html)));

foreach (Aspose.Words.Section s in document.Sections)
{
    s.PageSetup.TopMargin = 10;
    s.PageSetup.BottomMargin = 10;
    s.PageSetup.LeftMargin = 50;
    s.PageSetup.RightMargin = 0;
}

document.Save(outputFilePath);

@alexey.noskov But we can not remove the following code as other data in the page requires the below code

foreach (Aspose.Words.Tables.Table t in document.GetChildNodes(NodeType.Table, true))
{
    t.AutoFit(AutoFitBehavior.FixedColumnWidths);
}

Do you have any solution with this by any manipulating the HTML or with colspan.

@Sudrashya Unfortunately, at the moment I cannot suggest you a workaround of this issue. Our development team will analyze the problem and then we will be able to provide you more information or workaround.