Dear Mates,
When i am trying to convert the DOC file to HTML some of its properties are getting removed automatically and when i convert it again to doc from HTML it does loose more properties like header footer, Hyperlinks to notes etc…
If somebody can provide the code or proper properties to be set for right conversion.
Currently used code is given below :
string path = CommonPath + “.docx”;
Document docObj = new Document(path);
List imgUris = new List();
NodeCollection shapes = docObj.GetChildNodes(NodeType.Shape, true);
foreach (Shape shape in shapes)
{
if ((shape.HasImage || shape.CanHaveImage) && (!shape.Name.ToString().Contains(“WaterMark”) && !shape.Name.ToString().Contains(“Text Box”)))
{
}
}
string newURL = path.Remove(path.Length - 4) + “html”;
/****** Save to normal HTML with resolved shape and diagram issue but regular expression problem will remain **********/
HtmlSaveOptions so = new HtmlSaveOptions();
so.SaveFormat = SaveFormat.Html;
so.ExportHeadersFootersMode = ExportHeadersFootersMode.FirstPageHeaderFooterPerSection;
so.ExportImagesAsBase64 = true;
so.ExportRoundtripInformation = true;
so.ExportDocumentProperties = true;
so.Encoding = System.Text.Encoding.Unicode;
so.ExportXhtmlTransitional = true;
so.ExportPageSetup = true;
so.ExportTocPageNumbers = true;
so.ExportHeadersFooters = true;
so.ImageSavingCallback = new HandleImageSaving();
docObj.Save(newURL, so);