Convert MSG File to PDF & Preserve Formatting of Text Images using C# .NET | MSG to MHTML | MHTML to PDF

Hi,

We are trying to convert .msg file into PDF
We firstly extract files from .msg file. And then convert the mhtml file into PDF using the following code:

Aspose.Words.Document document = new Aspose.Words.Document(srcStream);
System.IO.MemoryStream mStream = new System.IO.MemoryStream();
document.Save(mStream, Aspose.Words.SaveFormat.Pdf);

It works OK most of the time but it seems this sample file has issues with formatting.
I am attaching the sample .msg file and the output of the file. Can you testdoc9.zip (258.5 KB)
take a look? I had to change the .msg file extension to zip as the website doesn’t allow uploading .msg file. Please change to .msg file after you download it.

Thanks, REF1234_20200624_160106.PDF (142.6 KB)

Ryan.

@ryanL,

We have converted your .msg file to .mhtml format by using the latest (20.5) version of Aspose.Email for .NET and attached it here for your reference:

And the code we used to generated above MHTML is as follows:

Aspose.Email.MailMessage mailMsg = Aspose.Email.MailMessage.Load("E:\\Temp\\testdoc9.msg");
mailMsg.Save("E:\\Temp\\testdoc9.mhtml", Aspose.Email.SaveOptions.DefaultMhtml);

You can see that the text of last two Paragraphs in Table in this MHTML are wider than the standard Page width (8.5 inches) of MS Word documents. You can check this by opening this MHtML with MS Word 2019. So, to fit the Table within the page bounds, you need to increase the Page width.

Aspose.Words.LoadOptions loadOptions = new Aspose.Words.LoadOptions();
loadOptions.LoadFormat = Aspose.Words.LoadFormat.Mhtml;
Aspose.Words.Document document = new Aspose.Words.Document("E:\\Temp\\testdoc9.mhtml", loadOptions);
document.FirstSection.PageSetup.PaperSize = PaperSize.A3;
document.FirstSection.PageSetup.LeftMargin = 9;
document.FirstSection.PageSetup.RightMargin = 9;
document.Save("E:\\Temp\\testdoc9.pdf", SaveFormat.Pdf);

The PDF produced by using the above code on our end is attached here for your reference:

However, the images in this PDF are overlapping the text content. For the sake of correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSNET-20677. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

@ryanL,

Regarding WORDSNET-20677, we have completed the work on this issue and concluded to close this issue with “Not a Bug” status. Please see the following analysis details:

The source document contains two versions of the images: VML and PNG, and positions of the PNG versions are for some reason completely different from positions of the VML counterparts. MS Office applications and Internet Explorer use the VML versions and load images correctly. Aspose.Words by default loads the PNG versions and, as a result, produces a document with images at wrong positions. By specifying HtmlLoadOptions.SupportVml to true, you can instruct Aspose.Words to use VML versions of images and produce a better looking document.

So, this is not a bug in Aspose.Words API but rather an issue with images in that particular source document.

Thanks for the helpful information. I got it working with A3 size setting.
Just wondering, is it possible to set the document size as A4 and make the lower paragraph do word-wrap?

@ryanL,

When setting the document size as A4, Aspose.Words should produce output similar to What MS Word would have produced. The following code shows how you can change document size to A4:

Aspose.Words.Document document = new Aspose.Words.Document("E:\\Temp\\testdoc9.mhtml", loadOptions);
foreach (Section sec in document.Sections)
{
    sec.PageSetup.PaperSize = Aspose.Words.PaperSize.A4;
}
document.Save("E:\\Temp\\20.6.pdf", SaveFormat.Pdf);