Additional information while converting word to html

Hi,

we are using Aspose.Words.dll version 10.0.0.0 for converting word document to html string using below code.

string htmlText = string.Empty;
Aspose.Words.Document doc = new Aspose.Words.Document(filepath); Aspose.Words.Saving.HtmlSaveOptions saveOptions = new Aspose.Words.Saving.HtmlSaveOptions(); 
saveOptions.ImagesFolder = tempDir;
saveOptions.CssStyleSheetType =
Aspose.Words.Saving.CssStyleSheetType.Embedded; 
saveOptions.SaveFormat=SaveFormat.Html;

MemoryStream htmlStream = new MemoryStream();
doc.Save(htmlStream, saveOptions);
htmlText = Encoding.UTF8.GetString(htmlStream.GetBuffer());
htmlStream.Close();

when we get html it doesn’t contain following additional information
Progid or any information in image tag which show this image is OLE object or embedding object in word document.
Height, width in image tag not include unit like px,pt or inches.
Table tag not contain it layout information like table layout property.
Is there any way, which is used in aspose which provide additional information in html or I missing anything while converting word to html.
We have to fix this issue ASAP so your quick help on this will be appreciated.

Thanks
Samanvay

Hi
Thank you for your interest in Aspose.Words. First of all, you should note that Aspose.Words was originally designed to work with MS Word documents. That is why upon processing HTML some features of HTML might be lost. You can find a list of limitation upon exporting to HTML here:
https://docs.aspose.com/words/net/save-in-html-xhtml-mhtml-formats/
Regarding the issues you mentioned:

  1. Currently, there is no way to export information about OLE objects upon exporting to HTML. However, you can try to work this problem around yourself. You can use the following approach:
    a) Get all shapes form your document.
    b) Loop through all shapes and find OLE objects.
    c) Save information about OLE objects in some collection that will be used later and replace OLE objects (shapes) with placeholders that you will be able to find in the output HTML.
    d) Save the document as HTML.
    e) Read HTML as string and replace placeholders with tags that will represent OLE objects in your HTML.
  2. Aspose.Words output size of image to HTML using width and height attributes of img tag. According to HTML specification values in these attributes are either pixels of percents. If nothing is specified value is in pixels:
    http://www.w3schools.com/tags/tag_img.asp
  3. I am not quite sure that you mean. Could you please clarify what layout information should be there? An example would be great.
    Best regards,