How to get the HTMLBodyCharSet or PlainBodyCharSet in MapiMessage object

anshulmehta · February 14, 2023, 1:04am

Use Case:

We are using the below code to for reading emails within PST and getting a MapiMessage object. Below is the Java code.

PersonalStorage asposePstFile = PersonalStorage.fromFile(path_to_pst);
FolderInfo folderInfo = asposePstFile.getRootFolder();
FolderInfoCollection folderInfoCollection = folderInfo .getSubFolders(FolderKind.Normal);
for (FolderInfo childFolder : childFolders) {
MessageInfoCollection messageInfoCollection = folder.getContents();
for (MessageInfo messageInfo : messageInfoCollection) {
MapiMessage message = asposePstFile.extractMessage(messageInfo);
//QUERY - HERE NEED TO FETCH THE CHARSET
}
}

Once the MapiMessage object is obtained, how can we get the HTML Body charset and the plainbody charset. We need the charset for further processing.
In java pstlib api, we use to have the methods like message.getBodyHTMLCharset() or message.getBodyCharset(). What are the equivalent method in MapiMessage object.

Can you please help in this regards, its kind of urgent?

Thanks & Regards

dmitry.samodurov · February 14, 2023, 10:27am

Hello @anshulmehta,

Thank you for contacting us. We’ll look into your issue in more detail and get back to you soon.

sergey.vivsiuk · February 14, 2023, 2:09pm

Hello @anshulmehta,

You can use the PidTagInternetCodepage MapiMessage property.
(PidTagInternetCodepage Canonical Property | Microsoft Learn)

Indicates the code page used for PR_BODY (PidTagBody) or PR_BODY_HTML (PidTagBodyHtml) properties.

Code sample:

MapiMessage msg = pst.extractMessage(message);
System.out.println(msg.getProperties().get_Item(KnownPropertyList.INTERNET_CODEPAGE).toString());

anshulmehta · February 14, 2023, 2:45pm

Hello Sergey,

Thanks for the update. But I am afraid using this gives me an long value…for e.g. 50220, 936.
Could you please let me know what does this value indicate.
I want the charset in the format like utf-8 or “iso-2022-jp” etc.

Thanks & Regards

dmitry.samodurov · February 14, 2023, 2:58pm

@anshulmehta,

Please refer this table: Code Page Identifiers - Win32 apps | Microsoft Learn

anshulmehta · February 14, 2023, 4:00pm

Hi Dmitry,

Apologies, I had already done a quick google search and found this link.
So thanks for your time. You are a life saviour.
One more query from my side, if you could please answer the below query:

Since the MapiMessage object(or effectively an email) can contain various types of body, like plain text body, html body, rtf body etc, and accordingly MapiMessage object contains methods to fetch the plaintext body(message.getBody()), html body(message.getBodyHtml()) and rtf body(message.getBodyRtf()), so do the above property INTERNET_CODEPAGE returns the charset for HTML body only. If this assumption of mine is correct that this property only returns charset of HTML body, then how should we fetch the charset of plain text and rtf body.
Or How does Aspose java library internally handle this.

Thanks & Regards,
Anshul Mehta

sergey.vivsiuk · February 14, 2023, 5:29pm

@anshulmehta,

A standard RTF can only consist of 7-bit ASCII characters, but can use escape sequences to encode other characters.
(Rich Text Format - Wikipedia)
The INTERNET_CODEPAGE propery indicates the code page used for PR_BODY (PidTagBody) or PR_BODY_HTML (PidTagBodyHtml) properties.

anshulmehta · February 15, 2023, 5:50am

Hello Sergey,

By your explanation, can we safely assume that INTERNEY_CODEPAGE property would always indicate the charset of both text/plain and text/html bodies.

Thanks & Regards

sergey.vivsiuk · February 15, 2023, 10:43am

@anshulmehta,

According to the Microsoft documentation, the INTERNEY_CODEPAGE property will indicate the charset of both text/plain and text/html bodies.
Do you have encoding issues when getting TEXT or HTML body content using the Aspose API?