We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

UTF-8 encoding in MHT is not detected when charset declaration is split across two lines

We are using Aspose to convert emails to PDFs, using intermediate MHT files.

Using Aspose.Words for .NET 14.2 we have noticed that in some cases the encoding of MHT files is not detected, and the default Windows-1252 encoding is used, resulting in garbled text in the output PDF.

This is the code we use to generate PDFs from MHTs:

var doc = new Document(mhtInputPath);
doc.Save(pdfOutputPath, new Aspose.Words.Saving.PdfSaveOptions());

I have attached two MHT files - one that is rendered to PDF correctly and one that is not rendered to PDF correctly. I have also attached the resulting PDF files.

The difference between the two MHTs lies in the way the charset declaration in the meta tag is split across two lines.

In the MHT file that is rendered successfully, the meta tag appears as follows:

<meta http-equiv=3D"Content-Type" content=3D"text/html; chars=
et=3Dutf-8">

In the MHT file that is not rendered successfully, the meta tag appears as follows:

<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Du=
tf-8">


I believe that they should both be considered valid. Internet Explorer can open both of the MHTs successfully.

Thanks,

Reuben

Hi Reuben,

Thanks for your inquiry.

I have tested the scenario and have managed to reproduce the same issue at my side. For the sake of correction, I have logged this problem in our issue tracking system as WORDSNET-9898. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

The issues you have found earlier (filed as WORDSNET-9898) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.