Document Conversion from DOC to HTML and Vice Versa

Dear Mates,


When i am trying to convert the DOC file to HTML some of its properties are getting removed automatically and when i convert it again to doc from HTML it does loose more properties like header footer, Hyperlinks to notes etc…

If somebody can provide the code or proper properties to be set for right conversion.

Currently used code is given below :

string path = CommonPath + “.docx”;


Document docObj = new Document(path);
List imgUris = new List();
NodeCollection shapes = docObj.GetChildNodes(NodeType.Shape, true);
foreach (Shape shape in shapes)
{
if ((shape.HasImage || shape.CanHaveImage) && (!shape.Name.ToString().Contains(“WaterMark”) && !shape.Name.ToString().Contains(“Text Box”)))
{
}
}
string newURL = path.Remove(path.Length - 4) + “html”;

/****** Save to normal HTML with resolved shape and diagram issue but regular expression problem will remain **********/
HtmlSaveOptions so = new HtmlSaveOptions();
so.SaveFormat = SaveFormat.Html;
so.ExportHeadersFootersMode = ExportHeadersFootersMode.FirstPageHeaderFooterPerSection;
so.ExportImagesAsBase64 = true;
so.ExportRoundtripInformation = true;
so.ExportDocumentProperties = true;
so.Encoding = System.Text.Encoding.Unicode;
so.ExportXhtmlTransitional = true;
so.ExportPageSetup = true;
so.ExportTocPageNumbers = true;
so.ExportHeadersFooters = true;
so.ImageSavingCallback = new HandleImageSaving();

docObj.Save(newURL, so);

Hi there,


Thanks for your inquiry. It would be great if you please share following detail for investigation purposes.

  • Please attach your input Word document.
  • Please attach the output Word/Html file that shows the undesired behavior.
  • Please share the problematic sections of output document.

Unfortunately,
it is difficult to say what the problem is without the documents. We need your documents to
reproduce the problem. As soon as you get these pieces of information to
us we’ll start our investigation into your issue.