Conversion of Microsoft Word .doc to .mhtml

Hello,

We were experimenting with converting a Word document to HTML. We have been successfully converting .doc to .pdf for years now using Aspose, but may have a need for HTML as well. We have been doing this from C++, as follows:

LPDISPATCH disp = m_ComHelper.Open(strDocFileName);
if (!disp)
throw pExc;

_AsposeDocument wordDoc(disp);

wordDoc.Save(strFileName); // this is a .pdf file name

For conversion to HTML, I changed strFileName to be a .mhtml file.

The conversion succeeded, but there were problems with the header. I have attached the original .doc file, and the resulting .mhtml file. (I had to rename the .mhtml to be .txt in order to be able to attach the file).
Are there some additional settings I must make to get it to convert correctly?

We are using Aspose.Words version 11.7.0.

Thank you.
Tom Edwards
DR Systems, Inc.

Hi Tom,

Thanks for your inquiry.

It is hard to meaningfully output headers and footers to HTML because HTML based formats are not paginated. By default, Aspose.Words exports only primary headers and footers at the beginning and the end of each section. But, you can achieve required results by exporting just primary header of the first section at the beginning and primary footer of last section at the end of MHTML. Please see the following code:

Document doc = new Document(@"C:\Temp\2013113LVDPQJZL.doc");
HtmlSaveOptions so = new HtmlSaveOptions(SaveFormat.Mhtml);
so.ExportHeadersFootersMode = ExportHeadersFootersMode.FirstSectionHeaderLastSectionFooter;
doc.Save(@"C:\Temp\out.mhtml", so);

I hope, this helps.

Best regards,

Hello Awais,

I got pulled onto other projects, so did not get to try this until now.

We use the C++ API, not C#, so I had to modify the code you provided. The results unfortunately were the same. Here is my code:

LPDISPATCH disp = m_ComHelper.Open( "c:\2013113LVDPQJZL.doc");
_Document wordDoc(disp);
CString strMhtmlFileName = "c:\2013113LVDPQJZL.mhtml";
_HtmlSaveOptions *pso = new _HtmlSaveOptions();
pso->SetSaveFormat(12);
pso->SetExportHeadersFootersMode(2);
wordDoc.Save_3(strMhtmlFileName, *pso);

I looked at your online docs to find that 12 is the SaveFormat enumeration value for html (there was none listed for mhtml). I also found that 2 was the ExportHeadersFootersMode enumeration value for FirstSectionHeaderLastSectionFooter.

I did this using Aspose version 13.12.0.

I have attached the .doc and .mhtml for your reference. (I had to change the .mhtml extension to .txt so it could be sent, so just rename it back to .mhtml).

Any ideas?

Thank you.
Tom Edwards
DR Systems, Inc.

Hi Tom,

Thanks for your inquiry. The SaveFormat value for HTML is 50 and for MHTML is 51:
https://reference.aspose.com/words/net/aspose.words/saveformat/

Also, you can find ExportHeadersFootersMode Enumeration values in the following link. In this case, I think, you should use ExportHeadersFootersMode.None option:
https://reference.aspose.com/words/net/aspose.words.saving/exportheadersfootersmode/

I hope, this helps.

Best regards,