Issues Parsing Email With Korean Autodetect Codepage


#1

There seem to be a problem processing emails with codepage 50949. Several Api are not
working on several emails.

Here is an example:

Running the following code:

    String filename = "/tmp/autodetect_korean_codepage.msg";

    MapiMessage mapiMessage = MapiMessage.fromFile(filename);

    System.out.println("This email has code page: "+mapiMessage.getProperties().get_Item(MapiPropertyTag.PR_INTERNET_CPID).getLong());

    for (MapiRecipient mapiRecipient : mapiMessage.getRecipients()) {
        System.out.println("This recipient display name is incorrect : "+mapiRecipient.getDisplayName());
    }

On the following file:

I get the following output:

This email has code page: 50949
This recipient display name is incorrect : 8?훑?
This recipient display name is incorrect : t푭컈?
This recipient display name is incorrect : \濫??

The recipients display names are incorrect


#2

@russ.nichols,

I have worked with the MSG file shared by you and have been able to observe the issue specified. An issue with ID EMAILJAVA-34546 has been created in our issue tracking system to investigate and resolve the issue. This thread has been linked with the issue so that you may be notified once the issue will be fixed.


#3

Noticed some changes in version 19.5:

The following code :

    String filename = "/tmp/autodetect_korean_codepage.msg";

    MapiMessage mapiMessage = MapiMessage.fromFile(filename);

    System.out.println("This email has code page: "+mapiMessage.getProperties().get_Item(MapiPropertyTag.PR_INTERNET_CPID).getLong());

    for (MapiRecipient mapiRecipient : mapiMessage.getRecipients()) {
        System.out.println("This recipient display name is : "+mapiRecipient.getDisplayName());
    }

    System.out.println("Body Type is : "+mapiMessage.getBodyType());
    System.out.println("Body is : "+mapiMessage.getBody());
    System.out.println("Body Html is : "+mapiMessage.getBodyHtml());
    System.out.println("Body Rtf is : "+mapiMessage.getBodyRtf());

OUTPUTS:

This email has code page: 50949
This recipient display name is : 문정훈
This recipient display name is : 이남국
This recipient display name is : 한경민

com.aspose.email.system.exceptions.NotSupportedException: No data is available for encoding 50949.

at com.aspose.email.internal.ae.zl.c(Unknown Source)
at com.aspose.email.zlm.a(SourceFile:66)
at com.aspose.email.MapiMessageItemBase.d(SourceFile:1761)
at com.aspose.email.MapiMessageItemBase.getBodyType(SourceFile:554)

getBodyHtml and getBodyRtf throw the same exceptions

So the problem seems to have moved to other API

PS> mm.toMailMessage(new MailConversionOptions()) also throws


#4

@russ.nichols,

I have worked with the sample code shared by you and have been able to observe the issue specified. An issue with ID EMAILJAVA-34547 has been created in our issue tracking system to further investigate and resolve the issue. This thread has been linked with the issue so that you may be notified once the issue will be fixed.


#5

Just wanted to provide another example where a Korean code-page email does not
get the correct sender name

String filename = “/tmp/unicode2-just-email.msg”;

    MapiMessage mapiMessage = MapiMessage.fromFile(filename);
    System.out.println("This email has code page: "+mapiMessage.getProperties().get_Item(MapiPropertyTag.PR_INTERNET_CPID).getLong());
    System.out.println("Sender Name is : "+mapiMessage.getSenderName());

Outputs:

This email has code page: 51949
Sender Name Type is : \濫??

For file


#6

@russ.nichols,

Thank you for sharing the details. I have associated the MSG file with EMAILJAVA-34546 and will share feedback with you as soon as the issue will be fixed.