Importing Word document and converting to HTML

Hi,

I am using Aspose Words for .NET to import a Word document and save it as HTML.

I have specified the following settings for save options:

HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.Html);
options.SaveFormat = SaveFormat.Html;
options.CssStyleSheetType = CssStyleSheetType.Embedded;

However, when the document is imported, the CSS styles are imported, but the HTML tags are not being imported with their attributes.

For example, in the Word document, a comment is

<span class=msoIns><ins cite="mailto:User1" datetime="2014-01-22T15:34">added in ms word</ins></span>

which is imported in HTML as:

<ins><span class="msoIns">added in ms word</span></ins>

Note that the attributes of “ins” tag are not imported.

Is there any options setting which can import all the metadata content (including attributes, styles, etc) when saving as HTML?

Hi,

Thanks for your inquiry. Could you please attach your input Word document and output Html file here for testing? We will investigate the issue on our end and provide you more information.

Best regards,

Hi,

Thanks for the response.

I have attached a zip file with the following documents:

  1. Sample content.docx - The Word docx file which is imported using Aspose Words .NET - This document has some content added / deleted. Also, there is a comment entered against one of the content text.

  2. Aspose Conversion.html - The converted HTML

  3. Code snippet for import.txt - The code used for import

  4. revisions.jpg - This highlights the class within Aspose.Words.Document which contains the revisions information (i.e. which user inserted / deleted the content). However, this information is not being inported in HTML.

  5. comments.jpg - This highlights the class within Aspose.Words.Document which contains the comments information (i.e. a separate comment entered by highlighting some content and entering comment against it).

So basically both the revisions and comments metadata information is present in Aspose.Words.Document.

How can it be linked with its corresponding content?

Is there any method which would get all metadata information when importing as HTML?

Hello,

Can you please give me an update on this?

Thank you.

Hi,

Thanks for the additional information. First off, I would suggest you please read the following article on how to manage track changes:
https://docs.aspose.com/words/net/track-changes-in-a-document/

In your case, to mimic the Microsoft Word 2013 behavior, you can use the following code:

Document doc = new Document(@"C:\Temp\Sample content.docx ");
doc.AcceptAllRevisions(); 
HtmlSaveOptions so = new HtmlSaveOptions(SaveFormat.Html);

doc.Save(@"C:\Temp\out.html", so);

Regarding the Word comments being not exported to Html format, I have logged this issue in our bug tracking system. The ID of this issue is WORDSNET-9585. Your thread has also been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Best regards,

Hi Awais,

Thanks for the reply.

I tried the code as suggested. However, when calling the following code:

doc.AcceptAllRevisions();

the revisions in the document are accepted.

This means I now cannot see which user made what changes to the content.

So calling this code is not solving the issue of linking the exact user information who made the changes to the content where the changes were made.

I want to display changes made by all users within the document, when it is imported.

Hi,

Thanks for your inquiry. First off, please refer to the attached “out-msw-2013.htm” file which was generated using Microsoft Word 2013 on my side. You can observe that revision’s data (and metadata) is not exported to Html even by Microsoft Word. Aspose.Words generated Html output i.e. produced after calling AcceptAllRevisions method is similar to what Microsoft Word has generated. However, producing Html without calling AcceptAllRevisions will export revisions’ texts only.

Please refer to the Annotation Features Supported on HTML Export section.

Please let me know if I can be of any further assistance.

Best regards,