Missing body text on email body to PDF conversion (using html)

Hi, we are upgrading our software from Aspose.Total 18.8 to 19.8 and found some formating issues that where working to our satisfaction in 18.8 but are not acceptable to our customers in 19.8.

The attached files demonstrate this, the message in the attachment is missing a big part of the body text. In 18.8 this was also an issue, but in 19.8 even more of the text is missing. Outlook would also crop the text, but does it a little better than Aspose.

I included the pdf that 18.8 generated als wel the PDF that 19.8 is generating, also the original message and the PDF that Outlook would print.

The code we use is:

Aspose.Email.License license = new Aspose.Email.License();
license.SetLicense(@"Q:\Install\Ontwikkeling\Componenten\Aspose Total for .NET\Aspose.Total.lic");

MailMessage theMessage = MailMessage.Load(@"D:\Temp\Aspose\Original.msg");

foreach (Attachment attachment in theMessage.Attachments)
{
    string filename = string.Join("_", attachment.Name.Split(Path.GetInvalidFileNameChars()));

    if (attachment.ContentType.MediaType.Equals("message/rfc822", StringComparison.OrdinalIgnoreCase))
    {
        if (!Path.GetExtension(filename).Equals(".msg", StringComparison.OrdinalIgnoreCase))
        {
            filename += ".msg";
        }
    }

    filename = Path.Combine(@"D:\Temp\Aspose\Output", filename);

    if (attachment.ContentType.MediaType.Equals("message/rfc822", StringComparison.OrdinalIgnoreCase))
    {
        MemoryStream ms = new MemoryStream();
        attachment.Save(ms);
        MailMessage attMsg = MailMessage.Load(ms);

        SaveOptions options = new MsgSaveOptions(MailMessageSaveType.OutlookMessageFormatUnicode);
        attMsg.Save(filename, options);
    }
    else
    {
        attachment.Save(filename);
    }
}

Bol.zip (293.9 KB)

@andreas4e47b,

I have observed your comments. Can you please share intermediate HTML file that is generated using Aspose.Email so that we may further investigate to help you out.

Here are the mhtml files generated by 18.8 and 19.8.

mhtmls.zip (32.9 KB)

I dit some more tests and this seems to be happening in Aspose.Word and not EMail. When I upgrade my project to the latest version of EMail but keep Word to 19.7 the output is like in 18.8. That is still not correct bet beter in 19.8 or 19.9.

I also tested the mhtml output with MS Word and Internet Explorer and it actualy looks perfect when printed from IE. Is there any possible way to get the PDF like IE would print it?

@andreas4e47b,

We have checked the output document generated by Aspose.Words 19.8 and it looks good. The only issue we noticed is that image’s position is changed on the first page. Could you please share the screenshot of problematic sections of output document? We will investigate the issue and provide you more information about your query.

See the comparison of the versions/outlook and how it should look like in our oppinion:

Comparison.png (987.2 KB)

@andreas4e47b,

Please call Document.UpdateTableLayout method before saving the PDF as shown below. We have attached the output PDF with this post for your kind reference. 19.9.pdf (128.7 KB)

Hope this helps you.

Document doc = new Document(MyDir + "Generated with Aspose 19.8.mhtml");
doc.UpdateTableLayout();
doc.Save(MyDir + "19.9.pdf");

Thank you, that does indeed fix the layout and generates the pdf like we would expect it.

I have an unrelated question about the same files, in the bottom of the second page there is an image taht is not displayed (just a red cross in the pdf). When I open the mhtml file in Internet Explorer or Word the image is displayed correctly. The image source is: http://www.bol.com/nl/cms/images/navigation/mhp/footer.gif. Is there an option I need to enable to display this image in the resulting PDF?

@andreas4e47b,

We have not found this issue at our end. Perhaps, the image is not accessible at your end. You may use HtmlLoadOptions.WebRequestTimeout property to set the web request time out value. The default value of this property is 100000 milliseconds (100 seconds).

The image is also accessible at our end, when I open the message or the mthml file the image shows correctly. Only in the generated PDF the image is missing. Al the other external images in the message are displayed correctly in the PDF.

@andreas4e47b,

Have you tried the latest version of Aspose.Words for .NET 19.9?

If you still face problem, please share the problematic output PDF file generated by the latest version of Aspose.Words for .NET 19.9.

Please also share your working environment e.g. operating system, .NET Framework etc. along with code example that you are using. We will investigate the issue and provide you more information about your query.

Yes, I also tested this in Aspose.Words for .NET 19.9 with the same result.

My environment is Windows 10 Pro (with the latest updates) with the following .net Frameworks

Microsoft (R) .NET CLR Version Tool  Version 4.6.1055.0
Copyright (c) Microsoft Corporation.  All rights reserved.

Versions installed on the machine:
v2.0.50727
v4.0.30319

The code is:

MemoryStream stream = new MemoryStream();

MhtSaveOptions opt = Aspose.Email.SaveOptions.DefaultMhtml;
opt.SaveAttachments = false;
theMessage.Save(stream, opt);

string htmlFileName = Path.Combine(outputPath, Path.ChangeExtension(Path.GetFileName(inputFileName), ".mhtml"));
theMessage.Save(htmlFileName, opt);

stream.Seek(0, SeekOrigin.Begin);

Aspose.Words.LoadOptions loadOptions = new Aspose.Words.LoadOptions();
loadOptions.LoadFormat = Aspose.Words.LoadFormat.Mhtml;

Aspose.Words.Document doc = new Aspose.Words.Document(stream, loadOptions);

doc.UpdateTableLayout();

PageSetup ps = doc.FirstSection.PageSetup;
double effectiveWidth = ps.PageWidth - (ps.LeftMargin + ps.RightMargin);

NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
foreach (Shape shape in shapes)
	shape.Width = (shape.Width > effectiveWidth) ? effectiveWidth : shape.Width;

Aspose.Words.Saving.PdfSaveOptions options = new Aspose.Words.Saving.PdfSaveOptions();
options.Compliance = PdfCompliance.Pdf15;

string outputFileName = Path.Combine(outputPath, Path.ChangeExtension(Path.GetFileName(inputFileName), ".pdf"));

doc.Save(outputFileName, options);

@andreas4e47b,

We have tested the scenario and have not found the shared issue. Please check the attached output PDF. out.pdf (128.7 KB)

You may implement IResourceLoadingCallback interface and use LoadOptions.ResourceLoadingCallback property to control how external resources (images, style sheets) are loaded when a document is imported from HTML, MHTML.

Please check the following code snippet. Hope this helps you.

public class HandleResourceLoading : IResourceLoadingCallback
{
    public ResourceLoadingAction ResourceLoading(ResourceLoadingArgs args)
    {
        String url = args.OriginalUri;
        Console.WriteLine(url);
        if (args.ResourceType == ResourceType.Image)
        {
            System.Net.WebClient wc = new System.Net.WebClient();
            args.SetData(wc.DownloadData(url));
        }

        return ResourceLoadingAction.Default;
    }
}

Thanks for the info, I tested this and found out al litte bit more. When I use you code I get an exception when loding the mhtml file into Aspose.Words.Document.

http://www.bol.com/nl/cms/images/navigation/mhp/footer.gif.

System.Net.WebException: The request was aborted: **Could not create SSL/TLS secure channel**.
   at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)
   at System.Net.WebClient.DownloadData(Uri address)
   at System.Net.WebClient.DownloadData(String address)

That pointed out that the url of this image is actualy to an https url but uses http instead of https. The brouwsers and Word are hadling this correctly. Is ther a way that I can handle this?

I also get all the inline images into this handler and that wil result into unnessesary internet requests.

cid:imagee15eb3.PNG@fc61f744.4db32c39.
System.NotSupportedException: The URI prefix is not recognized.
   at System.Net.WebRequest.Create(Uri requestUri, Boolean useUriBase)
   at System.Net.WebRequest.Create(Uri requestUri)
   at System.Net.WebClient.GetWebRequest(Uri address)
   at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)

Is there a way to detect inline images, apart from checking for the leading cid:?

@andreas4e47b,

We have tested the scenario at Windows 10 and have not found this issue at our end.

You can use following code snippet to avoid this issue.

if (args.ResourceType == ResourceType.Image && url.StartsWith("http"))
{
    Console.WriteLine(url);
    System.Net.WebClient wc = new System.Net.WebClient();
    args.SetData(wc.DownloadData(url));
}

I fixed the problem on our side. Our project uses .NET Framework 2.0 and the Tls12 SecurityProtocol is not known to .NET 2.0, so by setting the default SecurityProtocol to the correct version the image is downloaded without any extra code.

The issue is solved and we wil anyway migrade to a highe .NET Framework.

Thanks for the help.

@andreas4e47b,

It is nice to hear from you that your problem has been resolved. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.