I can't not extract messages in some mbox files

mbox.zip (18.2 KB)
mbox.zip (18.2 KB)

Can you figure out the problem?
I used same sample code that your website provides

@HM_Company,

Can you please share the source code that you have used on your end along with issue details. Moreover, you have also shared an issue related to OST files extraction. Please do verify before sharing with us that if the issue related to MBOX file is of same nature as that of OST.

Here is my code.

private void button5_Click(object sender, EventArgs e)
    {
        OpenFileDialog docBrowse1 = new OpenFileDialog();
        if (docBrowse1.ShowDialog() == DialogResult.OK)
        {
            string path = docBrowse1.FileName;
            string folderpath = "\\\\?\\" + Path.GetDirectoryName(path) + "\\Extracted→" + Path.GetFileName(path);
            Directory.CreateDirectory(folderpath);

            MboxrdStorageReader reader = new MboxrdStorageReader(path, true);
            //Actually, this file have messages but it said "0"
            MessageBox.Show("Total number of messages in Mbox file: " + reader.GetTotalItemsCount(), "dgSearch");

            // Start reading messages
            Aspose.Email.MailMessage message = reader.ReadNextMessage();

            // Read all messages in a loop
            while (message != null)
            {

                // Save this message in EML or MSG format
                message.Save(folderpath + "\\" + GetFileName(message.Subject, message.Date) + ".eml", Aspose.Email.SaveOptions.DefaultEml);

                // Get the next message
                message = reader.ReadNextMessage();
            }
            // Close the streams
            reader.Dispose();



        }
        MessageBox.Show("Complete", "dgSearch");

    }
    private static string GetFileName(string subject, DateTime time)
    {
        Random r = new Random();
        string fileName = "";

        if (subject == null || subject.Length == 0)
        {
             fileName = "NoSubject";
            return fileName + "_" + r.Next(1, 1000); 
        }
        else
        {
            if(time != null)
                fileName = time.ToString("yyyy-MM-dd HHmmss") + "_";

            for (int i = 0; i < subject.Length; i++)
            {
                if (subject[i] > 31 && subject[i] < 127)
                {
                    fileName += subject[i];
                }
            }
            
            fileName = fileName.Replace("\\", "_");
            fileName = fileName.Replace("/", "_");
            fileName = fileName.Replace(":", "_");
            fileName = fileName.Replace("*", "_");
            fileName = fileName.Replace("?", "_");
            fileName = fileName.Replace("\"", "_");
            fileName = fileName.Replace("<", "_");
            fileName = fileName.Replace(">", "_");
            fileName = fileName.Replace("|", "_");
            fileName = fileName.Replace("\n", "");
            fileName = fileName.Replace("\r", "");
            fileName = fileName.Replace("\t", "");
            fileName = fileName.Replace("\u000e", "");
            fileName = Regex.Replace(fileName, "[ďż˝*?<>/:@,\\.\";'\\\\đź”´]", "_"); //Your code added

            return fileName + "_" + r.Next(1, 100000);//; 
        }
    }

The point is that reader.GetTotalItemsCount() say “0 messages in Mbox”.

I think it couldn’t read mbox file properly

I will wait your answer Thank you.

@HM_Company,

I have worked with the sample files shared by you and it seems to be an issue while reading MBOX file contents. A ticket with ID EMAILNET-39858 has been created in our issue tracking system to further investigate and resolve the issue. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

1 Like

The issues you have found earlier (filed as EMAILNET-39858) have been fixed in this update.

I tested the same file in version 20.6.0, but the problem was not fixed at all. Are you sure it’s been fixed in this update?

@HM_Company,

Actually, the sample mbox files that you provided have been created by the Eudora mail client.
This format called as MBOXO is a modification of the MBOX format and was not supported by Aspose.Email.

So, we have added support for MBOXO file format used by Eudora email client.

An MboxoStorageReader class has been added to the API:

MboxoStorageReader reader = new MboxoStorageReader(fileName, true);

An MboxStorageReader.CreateReader factory method has been added for more convenient use.
It automatically detects a modification of mbox format and creates a corresponding reader instance.

Thus in order to solve the issue described, the code sample should be as following:

//Use the factory method to get the right instance of the reader.
MboxStorageReader reader = MboxStorageReader.CreateReader(path, true);

Console.WriteLine("Total number of messages in Mbox file: " + reader.GetTotalItemsCount(), "dgSearch");

// Start reading messages
Aspose.Email.MailMessage message = reader.ReadNextMessage();

// Read all messages in a loop
while (message != null)
{
    // Save this message in EML or MSG format
    message.Save(folderpath + "\\" + GetFileName(message.Subject, message.Date) + ".eml", Aspose.Email.SaveOptions.DefaultEml);

    // Get the next message
    message = reader.ReadNextMessage();
}

// Close the streams
reader.Dispose();

Hello,

The CreateReader() method doesn’t take “true” as a second argument, but a MboxLoadOptions object (Aspose.Email 23.11.0).

What is the correct code to “unlock” the reading of all types and “under-types” of MBOX files (mboxo etc.)?

Best regards.

Hello @anthonypr ,

In order to work with different kinds of MBOX files, use the CreateReader method:

var loadOptions = new MboxLoadOptions()
{
   // set the properties you need here, for example
   LeaveOpen = true;
   PreferredTextEncoding = Encoding.UTF8;
};

var mboxReader = MboxStorageReader.CreateReader(path, loadOptions));

Thank you for your answer.

The CreateReader() method doesn’t throw an exception when using a non-“mbox and derivatives” file.

After tests the only method to ensure a mbox file was used is to try to read a message using: Aspose.Email.MailMessage message = reader.ReadNextMessage(); then check if the message == null.

Is there another way?

Best regards.

Another question: when extracting an EML (as a file or stream) from a mbox-derivative file like the Eudora mboxo, for example using MboxStorageReader then the message.Save(path, Aspose.Email.SaveOptions.DefaultEml) method, is the latter a “standard” EML (like if being extracted from a genuine mbox file) or is it a different kind of EML?

Best regards.

Hello @anthonypr,

You can use the FileFormatUtil.DetectFileFormat method, to check the type of file or stream:

var fileInfo = FileFormatUtil.DetectFileFormat(filePath);

if (fileInfo.FileFormatType == FileFormatType.Mbox)
{
    // use MboxStorageReader.CreateReader...
}

Thanks.

Yes, you will save the standard eml.

Are mbox-derivative files like mboxo also recognized when using if (fileInfo.FileFormatType == FileFormatType.Mbox)?

No, just check the object type:

if (mboxReader is MboxoStorageReader)
{
    // mboxo
}

if (mboxReader is MboxrdStorageReader)
{
    // mboxrd
}

Thank you very much for all your answers, which are really important for a right integration of the Apose.Email API into our project.

Does the MboxStorageReader class (or another class) also manages Apple Mac OS X Mail format?

Please note Apple uses “.mbox” as the package suffix for the folders that hold email message files, but thoses message files are not standardized mbox file.

Unfortunately I don’t have a sample to hand to test by myself.

Best regards.

Hello @anthonypr,

If I understand correctly, Apple Mail stores each email as a separate .emlx file in a folder with an .mbox extension?
So, this is different from the standard mbox format, which combines all emails into a single file.
In this case, you can use MailMessage to read each email file in the folder:

var eml = MailMessage.Load(fileName, new EmlxLoadOptions())

If you do encounter issues reading such files, you can always contact us, but it is preferred that you provide a sample file so that we can investigate it.

Thank you.

Hello,

Apple Mail enables to export emails using the regular mbox format from the interface, but in practice users often copy-paste the source email files from the “installation” folder. In this case the emails are stored in what seems to be a Apple proprietary email database format.

The question was about the Aspose.Email API capability to parse that latter non-mbox file.

I am sorry I don’t have any sample at hand.

Best regards.

Hello,

In any case, if you get a sample you can share it for investigation.
As for Mac OS email formats, besides MBOX, Aspose.Email can handle EMLX as well as OLM, which is Outlook for Mac storage.

Thank you for your clarification about the Aspose.Email supported formats.