Embed Images as Base64 Strings during Word DOCX to HTML Conversion and Save HTML File Back to DOCX Roundtrip C# .NET

Hi there,
I need to develop a Word-html-Word editor, I use CK-Editor and Aspose.Word to do this. I found the format of result document is different from the original. The unacceptable part is some paragraph is higher than the source, although my user accepted these two shouldn’t be the same.
The export to html file code as following:
if (!IsPostBack)
{
string fileName = Request[“srcFile”];
hidFileName.Value = fileName;
Document doc = new Document(“D:\Temp\”+ fileName);
Aspose.Words.Saving.HtmlSaveOptions hso = new Aspose.Words.Saving.HtmlSaveOptions();
hso.ExportImagesAsBase64 = true;
hso.ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.None;
hso.SaveFormat = Aspose.Words.SaveFormat.Html;
hso.PrettyFormat = true;
hso.MetafileFormat = Aspose.Words.Saving.HtmlMetafileFormat.EmfOrWmf;
hso.ExportListLabels = Aspose.Words.Saving.ExportListLabels.AsInlineText;
hso.ExportRoundtripInformation = true;
MemoryStream ms = new MemoryStream();
doc.Save(ms, hso);
ms.Position = 0;
StreamReader sr = new StreamReader(ms);
string strContent = sr.ReadToEnd();
int removeStart = strContent.IndexOf("", StringComparison.OrdinalIgnoreCase);
int removeEnd = strContent.IndexOf("", removeStart + 1, StringComparison.OrdinalIgnoreCase);
myEditor.InnerHtml = strContent.Substring(0, removeStart - 1) + strContent.Substring(removeEnd + 8);
}

The output file code is

        Aspose.Words.Document doc = new Aspose.Words.Document();
        Aspose.Words.DocumentBuilder builder = new Aspose.Words.DocumentBuilder(doc);

        // Apply the paragraph style to the current paragraph in the document and add some text.
        builder.StartBookmark("MyBookmark");
        builder.InsertHtml(myEditor.InnerText);
        builder.EndBookmark("MyBookmark");

        string fileName = hidFileName.Value.Replace("Temp", "");
        doc.Save("D:\\Temp\\"+fileName, SaveFormat.Docx);


Do you have any idea about how to fix it? <a class="attachment" href="/uploads/default/15789">Temp.zip</a> (36.9 KB)

The TempXXX.docx is the source file, the result file created by Aspose.Word is “991107XXX.doc” in the attachment.

@Tyan,

Have you tried the latest version of Aspose.Words for .NET i.e. 18.6?

In case the problem still remains, please ZIP and attach the following resources here for testing:

  • Your simplified input Word document
  • Aspose.Words generated intermediate HTML output file
  • Aspose.Words generated final Word document showing the undesired behavior.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

Hi there,
I used the lastest version of Aspose.Words for .Net 18.6.0 to do my job. The problem still remains.
Temp.zip (36.9 KB)

@Tyan,

Both DOCX files in your Temp.zip were generated by using Aspose.Words for .NET 18.6.

But we need the following information to reproduce the issue with Aspose.Words on our end:

  • Your simplified input Word document
  • Aspose.Words generated intermediate HTML output file (please save the HTML string into a HTML file)
  • Aspose.Words generated final Word document showing the undesired behavior.
  • Please also create a comparison screenshot highlighting (encircle) the problematic areas in final Aspose.Words generated DOCX file and attach it here for our reference.

Thanks for your cooperation.

Hi There,
Thx for your reply.

Temp.zip (1.5 MB)

@Tyan,

We converted your “Template.docx” to HTML by using the following code:

Document doc = new Document("D:\\temp\\Template.docx");

Aspose.Words.Saving.HtmlSaveOptions hso = new Aspose.Words.Saving.HtmlSaveOptions();
hso.ExportImagesAsBase64 = true;
hso.ExportHeadersFootersMode = Aspose.Words.Saving.ExportHeadersFootersMode.None;
hso.SaveFormat = Aspose.Words.SaveFormat.Html;
hso.PrettyFormat = true;
hso.MetafileFormat = Aspose.Words.Saving.HtmlMetafileFormat.EmfOrWmf;
hso.ExportListLabels = Aspose.Words.Saving.ExportListLabels.AsInlineText;
hso.ExportRoundtripInformation = true;

doc.Save("D:\\Temp\\18.6.html", hso);

After that we again converted the intermediate HTML file to DOCX by using the following code:

Document doc = new Document("D:\\temp\\18.6.html");            
doc.Save("D:\\Temp\\18.6-final.docx");

See Word-HTML-Word.zip (48.4 KB)

We have spotted a few problems and logged an issue in our issue tracking system. The ID of this issue is WORDSNET-17031.

Please also create a comparison screenshot highlighting (encircle) the problematic areas between source document (Template.docx) and this final Aspose.Words generated DOCX file (18.6-final.docx) and attach it here for our reference. Thanks for your cooperation.

Hi There,
Thx for your help. The attachment is the screenshot of the difference between two word document.

BRs,
Tyandifference.zip (576.4 KB)

@Tyan,

Thanks for the additional information. We will inform you via this thread as soon as this issue is resolved.