Hi,
We’re using several Aspose libraries to convert different file formats to pdf. Recently we’ve noticed that converting from .msg takes significantly longer than other formats. Please see the attached project - on startup, it converts a sample msg file into mhtml, docx, and xlsx to allow for an apples-to-apples comparison between converting similar file in different formats. It then converts each of the four formats(msg, mhtml, xlsx, and docx) to .pdf, displaying time elapsed for each, in a loop to account for warmup in managed code.
Converting from .msg (as per instructions in Saving an Email as PDF|Documentation, first to mhtml stream and then to pdf with Aspose.Words) consistently takes the longest among the for formats. It is closely followed by Aspose.Words loading directly from a mhtml in the preparation step, so it seems that the main difficulty is loading the mhtml generated by Aspose.Email into Aspose.Words.
Are there any ways we could change our code to speed the conversion up? Or better yet, would it be possible to add a direct conversion to pdf into Aspose.Email and avoid the in-between step with mhtml and Aspose.Words entirely?
ConvertSpeedComparison.zip (9.9 KB)
@acturisaspose
It sounds like you’re experiencing slower conversion times when processing .msg files compared to other formats. This issue is often due to the additional steps involved in converting .msg to MHTML and then to PDF, particularly the loading of the MHTML content into Aspose.Words.
To address this, I recommend the following solutions:
-
Optimize MHTML Generation: Ensure that the MHTML generation from the .msg file is as efficient as possible. Review the settings and options used during this conversion to identify any potential optimizations.
-
Direct Conversion Suggestion: Currently, Aspose.Email does not support direct conversion from .msg to PDF. However, you can suggest this feature to the Aspose team, as they often consider user feedback for future updates.
-
Profile Your Code: Utilize profiling tools to identify bottlenecks in your conversion process. This will help you determine whether the delay is occurring during MHTML generation or PDF conversion.
-
Implement Parallel Processing: If you are converting multiple files, consider implementing parallel processing to handle multiple conversions simultaneously. This can significantly reduce overall processing time.
-
Manage Memory Efficiently: Ensure that your application is managing memory effectively. Slow performance can sometimes be attributed to memory issues, especially when dealing with large files.
Here’s a minimal code snippet to illustrate how you might structure your conversion process:
// Load the .msg file
var mailMessage = MailMessage.Load("path/to/message.msg");
// Convert to MHTML
using (var mhtmlStream = new MemoryStream())
{
mailMessage.Save(mhtmlStream, SaveOptions.DefaultMhtml);
mhtmlStream.Position = 0; // Reset stream position for reading
// Load MHTML into Aspose.Words
var document = new Document(mhtmlStream);
// Save as PDF
document.Save("output.pdf", SaveFormat.Pdf);
}
Please implement these suggestions and let me know if they improve your conversion times or if you have any further questions!
Hi,
Apologies for returning to this after such a long time. As you mentioned that I should optimize MHTML generation, could you please let me know which settings can impact the performance of the subsequent loading the mhtml into Aspose.Words? I’ve had a go at finding the ones that might make the file easier to read for Aspose.Words, but no luck so far, it still remains the slowest of all the file loading operations.
Thanks
Hello @acturisaspose,
Thank you for the detailed analysis and the sample project.
Your observations are correct.
What you are seeing is expected behavior. The .msg format is a complex MAPI-based binary format, and converting it to PDF inherently requires several expensive steps. In the recommended workflow (MSG → MHTML → PDF), most of the processing time is spent on parsing and laying out the MHTML content inside Aspose.Words rather than on the MSG parsing itself.
This is why you see similar timings when loading the generated MHTML directly into Aspose.Words.
At the moment, Aspose.Email does not provide a direct .msg → PDF conversion API. Internally, such a conversion would still require transforming the message into an HTML/MHTML-like representation before rendering it to PDF, so the intermediate step cannot be fully avoided.
That said, there is one alternative you may want to try, which could slightly reduce the portion of the processing handled by Aspose.Email. Instead of converting the MSG file to MHTML, you can load it directly as a MapiMessage and save it as HTML:
var pdfStream = new MemoryStream();
using (var mailMessage = MapiMessage.Load(orig))
{
mailMessage.Save(pdfStream, SaveOptions.DefaultHtml);
}
var document = new Document(pdfStream, new Aspose.Words.Loading.HtmlLoadOptions { LoadFormat = LoadFormat.Html, WebRequestTimeout = 0});
var saveOptions = new Aspose.Words.Saving.PdfSaveOptions();
document.Save("Test.pdf", saveOptions);
In some cases, using HTML instead of MHTML can be a bit faster and lighter for Aspose.Words to parse during the PDF conversion stage.