Docx/Doc file to html string conversion is not happening correctly using Aspose.Words in .Net

Hello,

As per my client requirement I have to convert word document to html string for that I m using Aspose.Words library . It is a trial version.
I m using following code for converting word doc to html string . It is converting 95 % doc content correctly . I have attached the sample doc . Please check at your end also.
After conversion I m loosing some formatting of word document.
Whether it is the limitation of Aspose.Words library or It is happening because using trial version-ed library.

string htmlString=string.Empty;
Aspose.Words.Saving.HtmlSaveOptions htmlSaveOptions = new Aspose.Words.Saving.HtmlSaveOptions();
htmlSaveOptions.ExportImagesAsBase64 = true;
htmlSaveOptions.ExportHeadersFooters = true;
htmlSaveOptions.ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.PerSection;
htmlSaveOptions.SaveFormat = Aspose.Words.SaveFormat.Html;
htmlSaveOptions.PrettyFormat = true;
Aspose.Words.Document doc = new Aspose.Words.Document(sourceFilePath);
string destinationFileName = sourceFilePath.Split(’.’)[0] + ".html";
// Save the document in html format.
doc.Save(destinationFileName, htmlSaveOptions);
htmlString = System.IO.File.ReadAllText(destinationFileName);

Confirmation is needed on this so that I can inform same to my manager whether to purchase or not?

I will be waiting for your reply.

Thank you.

Hi there,

Thanks for your inquiry.

Raju Arge:
After conversion I m loosing some formatting of word document.

Could you please share some detail about formatting issue which you are facing in output html? We will investigate the issue on our side and provide you more information.

Thanks for reply. I have attached template file with the question. You can also check from your side.

Got the following text after conversion of doc to html string , you can check same content of “Article 8 - Claims” in the original attached word document . In the following text you can see that 8.2 point displayed twice ,formatting also not correct and moreover left side there is rectangular having some information inside that , rectangular also not displaying correctly.
======================================
Article 8 - Claims
8.1 The claims can be submitted through our form on our website (submit link). This form will be sent to the financial department of BI.
8.2
8
9
10
11
6.2
7
8
9
10
11
8.2
8.2 Payment of Speaker claims will take place within 30 days after receipt of your claims.
8.3 BI is not responsible for any VAT payments owed. The Speaker must ensure that VAT is handled correctly and paid.
Article 9 - Other provisions
9.1 If there are regulations within the Speaker’s institution or the organisation to which he/she is affiliated that require the explicit permission of the organisation or institution for the provision of service, then the Speaker shall obtain the statement enclosed to this agreement as Appendix 2, duly signed by a representative of his/her institution/organisation. If no such regulation exists within the institution or the organisation, then it will suffice to mention this on the form.
9.2 BI and the HCP guarantee that the Service provided under this agreement is lawful and the terms and conditions hereby agreed on comply with all applicable laws and regulations. This includes, but is not limited to:

=============================================

I m requesting you to provide the solution for this problem if you can .

Please do let me know whether it is a limitation of Aspose.Words library or issue with Evaluation library.

I will be waiting for your reply.

Hi there,

Thanks for sharing the detail. Please note that Aspose.Words mimics the same behavior as MS Word does. If you convert your document to Html using MS Word, you will get the same output.

To workaround this issue, please remove the hidden paragraphs and list items without text from the document. Please check following code example for your kind reference. Regarding shape position issue, we suggest you please change its position from left to right to fix this issue. Hope this helps you.

Document doc = new Document(MyDir + "in.doc");
foreach (Paragraph par in doc.GetChildNodes(NodeType.Paragraph, true))
{
    par.ParagraphFormat.LeftIndent = par.ParagraphFormat.LeftIndent + 40;
    if ((par.ParagraphBreakFont.Hidden && par.ToString(SaveFormat.Text).Trim() == "")
    || (par.IsListItem && par.ToString(SaveFormat.Text).Trim() == ""))
    {
        par.Remove();
        continue;
    }
    foreach (Run run in par.GetChildNodes(NodeType.Run, true).ToArray())
    {
        if (run.Font.Hidden)
            run.Remove();
    }
}
doc.Save(MyDir + "Out.html");

Thanks Tahir.
I tried posted code , up to some extent it is working fine but not converting word to html string with 100% accuracy. If this problem remains there after purchasing also it will problem to me.

I was converting word to html string conversion using OpenXML , with this I faced the same issue. Because of this reason we have planned to purchase third party tool for same purpose But Third Party tools also having limitation like ASPOSE.WORDS :).

This I have to discuss with my manager. If they accept it then We will purchase it.

Do you have any other solution for this issue.

If you have any other solution which converts word to html string with 100% accuracy please let me know.

Hi there,

Thanks for your inquiry. Please note that this is not an issue. Aspose.Words mimics the same behavior as MS Word does.

It would be great if you please share your expected output Html here for our reference. We will investigate as to how you want your final output be generated like. We will then provide you more information on this along with code.

Thanks for reply.

I have attached the notepad file which contains the expected output html when we do the conversion from word doc to html.

Please let me know if you have queries. I will be waiting for your positive response.

Thank you.

Hi there,

Thanks for sharing the expected output html. Please open this html in browser and check the output. Aspose.Words generates the same output.

Thanks for your reply.

Attached document contains html which is modified by me. Aspose.Words not generated exactly same.There are some formatting issues with html which is generated by Aspose.Words.

Please try to understand the issue. I m expecting same html content which I attached earlier when I convert word to html string.

If you can do something on this please let me know.

Thank you.

Hi there,

Thanks for your inquiry.

Raju Arge:
Issue 1 : 8.2 point displayed twice
Issue 2 : rectangular also not displaying correctly

For these two issues, please check my this post.

Raju Arge:
formatting also not correct

We have noticed that the highlighted color shared in attached images is not same in input and output document. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-12734. You will be notified via this forum thread once this issue is resolved. We apologize for your inconvenience.

If you are facing any other formatting issue, please share the screenshot of problematic sections of output document. We will investigate the issue on our side and provide you more information.

Hi there,

Thanks for your patience. It is to inform you that our product team has completed the work on this issue (WORDSNET-12734) and has come to a conclusion that this issue and the undesired behavior you’re observing is actually not a bug in Aspose.Words. So, we have closed this issue as ‘Not a Bug’. We are quoting developer’s comments here for your reference.

In customer’s document paragraph is list item and has highlight color in it’s rPr properties. When exporting to HTML with HtmlSaveOptions.ExportListLabels = ExportListLabels.ByHtmlTags we apply this highlight color as li’s background-color, so background color occupies all space to the right of the page. When exporting to HTML with HtmlSaveOptions.ExportListLabels = ExportListLabels.AsInlineText we apply highlight color as background-color only to span which represents list bullet.

Please use ExportListLabels.AsInlineText as shown in following code example to get the desired output.

Document doc = new Document(MyDir + “in.doc”);
Aspose.Words.Saving.HtmlSaveOptions options = new Aspose.Words.Saving.HtmlSaveOptions();
options.ExportListLabels = ExportListLabels.AsInlineText;
doc.Save(MyDir + “Out.html”, options);

Please let us know if you have any more queries.

Hi , still I m facing issues with Aspose.Words .
=============================================
Original content :
Introduction …3
Purpose. 3 …3
Overview/Process. 3 …3
After conversion into html string:
Introduction
Purpose
Overview/Process

=======================================================

Aspose.Words .net library is not converting table content exactly into html string.

If you have solution for this problem please let me know.

Hi there,

Thanks for your inquiry. HtmlSaveOptions.ExportTocPageNumbers property specifies whether to write page numbers to table of contents when saving HTML, MHTML and EPUB. Default value is false. Please set the value of this property to true to get the required output.

Aspose.Words.Saving.HtmlSaveOptions options = new Aspose.Words.Saving.HtmlSaveOptions();
options.ExportListLabels = ExportListLabels.AsInlineText;
options.ExportTocPageNumbers = true;
doc.Save(MyDir + " Out.html", options);

Thanks for reply. Its working but i have other issues .
==================================

original text:

1.1
(“Services”)
as requested by the Company from time to time.

1.2 Term. This Agreement will be in effect for a period of six (6) months or until terminated in accordance with this Agreement (the “Term”).
1.3 Performance and Time Commitment.
===================================================================
After converting word doc to html string
===================================================================

1.1Scope. Consultant agrees to provide [DESCRIBE NATURE OF SERVICES] (“Services”) as requested by the Company from time to time.
1.2Term. This Agreement will be in effect for a period of six (6) months or until terminated in accordance with this Agreement (the “Term”).
1.3Performance and Time Commitment

=============================================

Aspose.Words library is converting number list to normal text.

Can you please help me out with this issue.

Thank you.

Hi there,

Thanks for your inquiry. HtmlSaveOptions.ExportListLabels property controls how list labels are output to HTML, MHTML or EPUB.

AsInlineText : Outputs all list labels as inline text.
ByHtmlTags : Outputs all list labels as HTML native elements.

If you want to get the list labels as HTML native elements, please use ExportListLabels.ByHtmlTags. However, by using this option, the highlighted color issue appears. Please check this post for detail.

Please note that Aspose.Words mimics the same behavior as MS Word does. Upon processing HTML, some features of HTML might be lost. You can find a list of limitations upon HTML exporting/importing here:
Load in the HTML (.HTML, .XHTML, .MHTML) Format
Save in the HTML (.HTML, .XHTML, .MHTML) Format