Samples.zip (1.8 MB)
Aspose Team,
We use the Aspose Email java package to extract the body text of msg files and found out that the extraction method is extremely slow with some msg files with images to external urls. Some of the client files with size between 5 and 9 MB take from 45 minutes to 2.5 hours to finish the extraction.
Following is the sample code that reproduces the problem and attached are two non-client sample files (Sample1.msg and Sample2.msg). Sample2.msg was created by merging multiple copies of Sample1.msg. Thus it has repeated contents. Body text extracton of Sample2.msg takes about 4 seconds. It is much shorter than that with the client sample files. I hope the sample files can help you to analyze the problem.
Because the sample files has html body we use the MailMessage:getHtmlBodyText() to extract the body text. It is this method that takes long time. Can you tell us what happens inside the method and why it takes long time?
We tried the MailMessage:getBody() method and it is very fast (less than one second). However, the extracted text is slight different from that with the MailMessage:getHtmlBodyText(). What method should be better for msg files with html body?
We also tried the MapiMessage:getBody() method, and it is fast and it has the same result as MailMessage:getBody().
We noticed that MailMessage:getHtmlBodyText() is deprecated. What is the recommended method to replace it?
The operating system is Ubuntu 18.04. Java version is 1.8. Aspose PDF java package is 21.3.
import com.aspose.email.*;
import java.time.LocalDateTime;
public class GetMsgBodyText {
public static void main(String[] args) {
System.out.println(LocalDateTime.now() + " --- Start");
try {
String filepath = "/home/ubuntu/testdirs/testdir_msg_with_links/Sample1.msg";
System.out.println(LocalDateTime.now() + " --- load mapiMessage");
MapiMessage mapiMessage = MapiMessage.load(filepath, new MsgLoadOptions());
System.out.println(LocalDateTime.now() + " --- load mailMessage");
MailMessage mailMessage = MailMessage.load(filepath, new MsgLoadOptions());
String bodyText;
if(mailMessage.isBodyHtml()){
System.out.println(LocalDateTime.now() + " --- begin getHtmlBodyText");
bodyText = mailMessage.getHtmlBodyText();
System.out.println(LocalDateTime.now() + " --- finished getHtmlBodyText");
}
else {
System.out.println(LocalDateTime.now() + " --- begin getBody");
bodyText = mailMessage.getBody();
System.out.println(LocalDateTime.now() + " --- finished getBody");
}
} catch (Exception ex) {
ex.printStackTrace();
}
System.out.println(LocalDateTime.now() + " --- Done");
}
}