Free Support Forum - aspose.com

Corrupted PST file

We are using the latest Aspose 5.7.0 and had two serious issue with it.
1. We had corrupted PST files which could not be loaded by outlook. And scanpst.exe could not fix it.
We added 100K messages to the PST file under ONE sub-folder. The folder was loaded by outlook. And it’s shown 100K messages in it. But when clicked on the folder, outlook complain that the file was corrupted and suggest to use scanpst.exe to repair it.
scanpst.exe could not repair the file.

2. The PST file size is way bigger than the total size of original message.
We had 100K message with total size of 365MB. After added to PST, the PST size was over 3GB.
And the size grow faster after 90K messages added. From 2GB to 3.4GB in the last 10K messages.
When we were using aspose.email 5.2 and 5.3, we didn’t have this issue. The PST file’s size was close to original message’s total size.
Also, if we compress the PST file, we will got a file less than 400mb. From my previous experience, the PST file size was usually same to the compressed file. less tha 10% difference.

I think that the PST missing file patch messed up and introduced a bug.
Could you please investigate that part of code and check whether you did anything wrong?

I will also try to create application which can reproduce the issue. But it’s very hard because it took TOO LONG to create a large PST file now. Especially when there are lots of small message files.
aspose.email’s performance was way better than the new 5.6.1 and 5.7 version.

Please investigate the issue ASAP. We just applied a premium support service. I will open a new topic use our company account tomorrow.

Thanks,

Ying

Hi Ying,


Thank you for posting your inquiry.

1. Since creating such a large PST file is time taking activity, I request you to please share your sample files with us so that we can run the test with the original data and observe the issue for reporting it to our product team. This will help us save your time by investigating the issue at once and forwarding the same to our product team.

2. When messages are added to a PST, memory is allocated per the requirement of blocks. Once the memory block is near to completion, next block is generated in memory for next messages. In such case, if only one message is added to the newly allocated space, the remaining space counts towards the PST size and thus large PST size as expected to the original messages size. We can still test the issue with our sample messages but if you could share the original sample files, we shall start testing the issue at once for investigation at our end.

We created an application which can 100% reproduce the bug.
It’s using fake email generated by random string.
So you don’t need to use any sample email to reproduce it.
Actually,
we sent the similar code to you early this month to reproduce the
missing message bug. Just made a small change which reproduced the
corrupted file bug.

Please include org.apache.commons.lang in your pom.xml.
Also, add an licene file Aspose.Email.lic in the project.


commons-lang
commons-lang
2.5


Usage: java -jar runablejarfile pathtooutputpstfile 100000
Please make sure to add 100K message to the PST folder. The PST file wont’ corrupt with fewer messages.

Keep in mind that we are in the process of signing up for Aspose priority support.
I will create a new ticket in the priority forum once we have the access.

Please look into the issue ASAP because I am sure it’s not just us who have this issue.
Any customer who tried to add 100K+ messages to one PST file under one sub-folder will have the same issue. It’s a serious bug which should get highest attention. And should be fixed ASAP.

Thanks,

Ying

Hi Ying,


Thank you for providing sample code. We are running tests with different versions for comparison and it may take little time to complete the process. Please spare us little time and I will write back here as soon as analysis is completed.

Hi Ying,


Sample code is executed and generated PST file is opened in Outlook. It is observed that this PST is corrupted and Outlook fails to open it. Similarly if we try to repair this file, it fails. This issue is logged in our issue tracking system under Id: EMAILJAVA-33542 for further investigation by the product team.

Regarding the size issue, it is observed that each message size is (on average) 100 KB, generated by the sample code. In this way the size of 100000 messages is about 9.53 GB. I have tried this code using Aspose.Email for Java 5.3 and Aspose.Email for Java 5.7. Size of PST with AE 5.3 is 8.8 GB where as using AE 5.7, it is 9.7 GB. This comparison is quite different than yours. Could you please let us know if MapiConversionOptions.setUseBodyCompression(true) is used or not during the testing when PST file of size 400MB was generated?

Hi Muhammad, we didn’t set body compression. We only have issue with real messages for the size problem. Basically, the original raw messages was 450MB. but after added to PST, it was 2GB. Total 100K messages.

Hi Ying,


We have tested the PST file size issue with our sample 100K messages. The total messages size of these messages is 9.8 GB on disc. The sample messages were added to a new PST and the resultant file size is almost the same as that of the size on disc.Thus, it seems that PST size issue is specific to the sample messages you own. I request you to please share these sample messages with us so that we can investigate the issue at our end and assist you further.

Hi Muhammad,

Most of our real messages are small. around 1kb.

I will check whether I can reproduce the issue using fake message.
We could not send the real message to you for legacy policy.

So I just updated the test application I sent to you to create 1kb file.
Added 10K message, the original message’s size is 1kb each. Total size is: 10.9MB.
And the PST file’s size is: 104MB. 10x original message’s size.

So each message will take 10K space in the PST file. If it’s how it suppose is, then please ignore the file size issue.

Thanks,

Ying

Hi Ying,



We are sorry for a delayed response. The behaviour seems to be an expected one as mentioned earlier but we shall still investigate it further and share our feedback with you soon.

Hi Ying,

I have tested the following code using a 6K sample message. The resultant PST file is about 17 MB where as total 10K messages size is 59.8 MB. Could you please test this code and let us know the feedback?

PersonalStorage pst = PersonalStorage.create(6K.pst”, FileFormatVersion.Unicode);
FolderInfo folder = pst.createPredefinedFolder(“TestFolder”, StandardIpmFolder.Inbox);
MapiMessage message = MapiMessage.fromFile(6K.msg”);
for(int i = 0; i < 10000; i++)
{
System.out.println(i);
folder.addMessage(message);
}

It’s not a good idea to use the SAME message to test.
I modified the code I sent to you to create random 1k message for testing and reproduced the issue.

Created PST file:
10/06/2015 10:05 AM 80,266,240 1k.pst

The total size of raw files:
10000 File(s) 6,181,382 bytes

So 10K messages, total 6MB, the generated PST file is 80MB which is more than 10 times the original size.

Attached is the test code:
parameter:
–pstFile C:\Users\ying\Desktop\lv\1k.pst --count 10000 --outputdir C:\Users\ying\Desktop\lv\outdir

Hi Ying,


Thank you for providing the sample file.

We have run your sample application at our end. Somehow, the file size on disc of the total messages (for 1K messages) is 3.9MB and the PST generated is having size 5.7 MB which is not that much difference and is an expected behavior as per the details shared above.

Are you sure the PST file size is 7.8M, not 78M?

I just created another PST file: 10K messages
pst file: 76.5 MB (80,266,240 bytes)
original messages: 5.88 MB (6,174,400 bytes) 10,000 Files


I also tried:
MapiConversionOptions options = new MapiConversionOptions();
options.setUseBodyCompression(true);
options.setFormat(MapiConversionOptions.getUnicodeFormat().getFormat());

The result is similar.

Hi Ying,


This is really strange behavior as I have generated PST using your sample code again. In parallel to adding the messages in PST, I saved the MapiMessage on disc as well. It is observed that total size of un-compressed messages (10K) on disc is 302 MB where as the size of PST is about 78 MB. This can be seen in the attached image as well.

I am attaching the modified code again for your reference. I have used following parameters.

File pstFile = <span style=“font-size:9.0pt;font-family:“Courier New”;mso-fareast-font-family:“Times New Roman”;
color:navy”>new
File(<span style=“font-size:9.0pt;font-family:“Courier New”;mso-fareast-font-family:“Times New Roman”;
color:green”>“output.pst”
);

File outputDir =
<span style=“font-size:9.0pt;font-family:“Courier New”;
mso-fareast-font-family:“Times New Roman”;color:navy”>null
;

<span style=“font-size:9.0pt;font-family:“Courier New”;mso-fareast-font-family:
“Times New Roman”;color:navy”>int
maxMessages
=
<span style=“font-size:9.0pt;font-family:“Courier New”;mso-fareast-font-family:
“Times New Roman”;color:blue”>10000;

<span style=“font-size:9.0pt;font-family:“Courier New”;mso-fareast-font-family:
“Times New Roman”;color:navy”>int
baseSize
=
<span style=“font-size:9.0pt;font-family:“Courier New”;mso-fareast-font-family:
“Times New Roman”;color:blue”>1;

<span style=“font-size:9.0pt;font-family:“Courier New”;mso-fareast-font-family:
“Times New Roman”;color:navy”>boolean
verifyMessage
=
<span style=“font-size:9.0pt;font-family:“Courier New”;mso-fareast-font-family:
“Times New Roman”;color:navy”>false
;<o:p></o:p>


P.S. I have also compressed the 10 K messages and size of compressed file is 30 MB on disc.

Oh, I see.
You are comparing PST with outlook message format
I was comparing PST with email file(.eml)
When you run my test application, there is an option to write original message(.eml) file to disk.
And each file was less than 1kb.
And total size of 10K message is: 5.93 MB (6,220,158 bytes)

Anyway, based on testing, the PST is only 10 times larger than original .eml file when the .eml file was small. (<5kb). When the .eml file’s size was bigger, then the difference is small.

Hi Ying,


Thank you for sharing your feedback with us and feel free to write to us if you have any other query related to the API.