Converting .doc to mhtml email

We are building an application that will convert an uploaded word document into an email in mhtml formt. We are planning on using your product to handle server side conversion. However.

We find that the converting of a .doc to email in our testing is giving a significantly different result than does MS word on the same task.

I will upload the original doc, and two screen captures to show. We are wondering if…

  1. We are doing something wrong?

  2. Can we kick the application to give a better result?

  3. Can anything be done your end to improve the performance to mhtml?

The attached files are ziped, showing the difference.

If you think your product will not be capable of giving the conversion performance we seek, could you be so kind as to indicate this so we can move onto another solution.

Hello!

Thank you for your inquiry.

I’ll take a look and provide you more information on this case shortly.

Regards,

Hello!

Please ensure you are using the latest version of Aspose.Words. I’m getting different results. MHTML conversion involves HTML and it has some restrictions. We can help you get better results with some document refactoring but that’s difficult to promise high fidelity on any input documents. Let’s consider particular issues to go further.

Regards,

Thanks for the reply. I will give you a good example with the attached document.

We find that when we use this one, we get the test font and spacing between the font different in ASPOSE than the original, yet word keeps the fony style in it’s conversion.

We find aspse changes it to Times New Roman looking font. Any suggestions.?

Regards,
http://www.interactivewebs.com

Hello!

Thank you for the new materials. I was unable to reproduce font problem that you are describing. I can send you my conversion results. The output is very nice. You can either give me an e-mail address or allow attaching it here.

Regards,

My name is david

Would you mind sending the result to us AT interactivewebs.com.au - (please interporate the address)

Hello David.

Thank you for your cooperation.

I’m attaching (to an e-mail) MHTML file and two raster pictures useful for visual comparison. I see that Verdana font looks slightly different but I got the same results with MS Word when saved the DOC as HTML and opened in the browser. I tried with Internet Explorer 7.0 and Avant Browser 11.5 (they really use the same engine). We cannot force fonts to be absolutely identical in MS Word and other applications. That’s one of the major considerations why we cannot guarantee 100% fidelity of MS Word documents and what we get after conversion to HTML, MHTML and PDF. These formats are not “native” for MS Word and live by other rules.

I found that bulleted list in the MHTML has less line spacing and less spacing after bullets. The other difference is that our MHTML shows on the left of the browser window but one was produced by MS Word is centered. These differences can be considered but they need some MS Word specific “magic” in HTML code. Our design principle is to avoid it wherever possible because of complaints on non-standard techniques in HTML produced by MS Word. It exports almost everything that can be in documents but you can see how it is achieved if you open their HTML as text. In particular, there is no attributes or style modifiers in HTML to control indent after list bullets.

Please let me know if we can help you further.

Regards,

Thanks for the email and the information. Our goal is building a process for word mail merge that runs server side without needing word to be installed on the server. Naturally our goal is to produce as close as possible the same result from the server as MS word would give locally.

I understand that you goal is a little more refined, in that correct end code is more important than blind replication.

So I would like to check a few things.

  1. Any way to tweek the product to reproduce more exactly the end result that matches word’s end result?

  2. If not, can you suggest any other items that are likely to cause issues, perhaps fonts and formatting that you know have a differing result?

  3. Any other tips or suggestions you have?

Hello David!

This is one of the most typical tasks for Aspose.Words. I mean mail merge and saving to miscellaneous formats on server. We try to produce output closer to what we get from MS Word. But some MS Word features don’t map directly to “non-native” formats. I already brought examples related to HTML/MHTML. In HTML standard some features don’t even define a-must behavior and delegate degrees of freedom to visual agents (browsers). Of course you know that different browsers can render same pages differently.

I can take a look on what happens with line spacing. But we cannot control spacing after bullets unless using MS Word approach to output lists. That’s what we don’t like. Regarding these issues I cannot recommend any parameter tweaks. Perhaps you can get closer fidelity simulating that list by non-list paragraphs with tabs. But I think it’s unnecessarily complex and inconveniently.

We don’t have full information on what items potentially cause issues. Here is a spreadsheet showing level of import and export to HTML and PDF (link below). But it states only that something is supported or partially supported without notes on rendering fidelity. We plan to improve this part of documentation and will think about fidelity. The better way is experimenting with real documents and discussing particular differences here in the forum.

Regards,

Thanks for that info. It is helpful. - Now we are a developer license client, so the support has taken us that far.

We can see that most of the failures we get also fail in word, or look other than expected in MS word - mhtml.

This is a test 5 Attached. - standard MS word 2007 template saved by word to MHTML (good result) and my result as an attached msg file is what we have out of aspose. Not what we expected on this one.

I also notice that messages converted from .docx files are including some strange text at the beginning of the message "

Any comment on that?

Hello!

Thank you for your inquiry.

I see that msg file is not MHTML at all. You should probably specify SaveFormat.Mhtml. To be absolutely sure you can share your code here for inspection.

Floating objects are not supported in HTML/MHTML export. But your source document is designed this way: everything is put into floating boxes. This issue is known in our defect database as #1001. Currently it is considered as by-design behavior. As a workaround you can switch to using flow content.

Please clarify regarding other things. What do you mean under “also fail in Word”? We cannot promise you better functional coverage than MS Word does. That would be great if you bring any samples.

I also cannot guess exactly what happens to documents when you are converting from DOCX. You wrote that some unexpected characters appear at the beginning. Would you explain how you reproduce this?

Regards,

Thanks for your reply. We are still testing.

Here is a strange one. This test was from an MS word document. When sent via our email module application it worked fine.

One email address received fine in outlook 2007.

The other as you will see has STRIKE THROUGH on about 50% of the email.

Any idea on the cause of that?

Hello!

Thank you for these materials. But I cannot guess what format it is: neither MHTML nor DOC. Please share the code you are using to obtain these documents and explain what you are getting next.

Regards,

The attachments were in MS email format. Any recent MS email application should open them.

In any case, here is a strange one.

The attached word documents in both .doc or .docx add a strange



to the beginning of the converted mhtml messages. Can you suggest a fix for this?

Incidentally we notice the MS word conversion to MHTML for the same documents do not produce this.

Hello!

Thank you for additional materials.

Here is what I get after conversion (beginning):

MIME-Version: 1.0
Content-Type: text/html;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Location: document.html
=EF=BB=BF

<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8" =
/>
<p style=3D"margin-left:0pt; margin-right:0pt; margin-top:0pt; margin-bo=
ttom:10pt; line-height:115%; font-size:11pt; "><span style=3D"font-family:'=
Calibri'; font-size:11pt; ">Here is your profile information that might be =
useful

The file itself has no Byte Order Mark (BOM) since it is always in ASCII. But encoded HTML is in UTF-8 by default and is preceded by UTF-8 BOM. I made it bold in the snippet. Do you mean this? It’s a part of standard and should be properly treated by consumer applications. I’m sorry if your application doesn’t treat BOM as BOM, does it? You can remove this sequence after conversion.

Please let me know if I misunderstand your question.

Regards,

Thanks for your help so far.
We have a particular issue. If we create a .doc or docx file that includes references to inserted images that are rotating banner images in .gif format.
For example. http://www.sdia.org.au/ has some banners that rotate.

Copy and paste the banner’s URL into a word document, one of these images. - when we do a mail merge in word and send, the images are preserved as rotating .gif files.

When we send via a aspose conversion, the images are converted into static .gif files showing only the first image in the rotation.

Is there anything we can do to preserve the animation of .gif files?

Hi

Thanks for your inquiry. I managed to reproduce this issue on my side. I will consult with our developers and provide you more information.

Best regards.

Hi

I created new issue #5917 in our defect database. The problem occurs because GIFs are stored in Aspose Document model as PNG so animation is lost.

Best regards.

Any indication on a fix time or even if it will be fixed?

Any way as an enterprise client we can push this one through… it is kind of an important one for us?

oh… and thanks for the great help this far!

Hi

Thanks for your inquiry. Currently I can’t provide you any estimate. But I can tell you that we can’t fix this shortly. We must to figure out how GIFs are stored in Word document and do the same in Aspose.Words. This is not so easy to achieve as it sounds.

Best regards.