Mail Merge file size is exploding because of background image

Hi,

Using Aspose 6.2.0.0, I’m doing a mail merge with a background image in the template document. The merged document’s file size is exploding to many many megs. It seems obvious that the background image is not being reused but recopied page after page. Is there any way to have the background image reused to save disk space? The background image is around 300 KB.

Update:
My objective is to save the document to PDF with SaveToPdf(), so reusing the same background image only in the final .pdf and not the .doc is fine by me…

Thanks,

dstj.

All places where the same image can be reused are optimized in Aspose.Words. What you are seeing could be a defect that I need to reproduce. Please attach your template document and tell me what sort of a mail merge do you do.

It is safe to attach because only you and Aspose staff can access it.

Hi,

Here’s is the .dot template I use and the code :

DataTable dt = GetData();

Aspose.Words.License license = new Aspose.Words.License();
license.SetLicense(Constants.AsposeLicenseFile);

// Load template
string path;
path = Constants.AppBinFolder + “\Reports\Settings\” + “TM_LettreCarte.dot”;
Aspose.Words.Document doc = new Aspose.Words.Document(path);

doc.MailMerge.Execute(dt);

response.Clear();
doc.Save(filename, Aspose.Words.SaveFormat.Doc, Aspose.Words.SaveType.OpenInWord, response);
response.End();

You’ll notice that it very very close to this other thread

dstj.

If the file does not get twice as big when you execute mail merge for 2 records instead of 1 then yes, it is the JPEG->PNG issue. Wait closer to end of March there will be a release that exports JPEG as JPEG to PDF and the size will be what you want.

I’m unsure about your last comment.

The file should not be twice as big for 2 records then it is for 1 record if the background image is being reused. The only difference is some more text. The document should be a few KB bigger maybe, but definitely not twice the size… no?

Yes, the file should not become twice as big for 2 records. We have tests for our files for this.

I have not tried this on your file and just asking you instead. If the file does not become twice as big, then there is no problem to fix. Just wait for the release that fixes the JPEG->PNG size blowout.

I understand now, here are the results :

Saving as .doc :
1 record : 213 KB
2 records : 403 KB
3 records : 783 KB
4 records : 974 KB
50 records : 12.6 MB
100 records : 26 MB

Saving as .pdf :
response.Clear();
doc.Save(filename, Aspose.Words.SaveFormat.Pdf, Aspose.Words.SaveType.OpenInWord, response);
response.End();
1 record : 614 KB
2 records : 617 KB
3 records : 623 KB
4 records : 626 KB
50 records : 804 KB.
100 records : 980 KB

Saving as .pdf:

MemoryStream memStream = new MemoryStream();
doc.SaveToPdf(0, doc.PageCount, memStream, null);
byte[] bytes = memStream.GetBuffer();

response.Clear();
response.ContentType = MimeType;
response.AddHeader(“content-disposition”,“attachment; filename=” + filename);
response.BinaryWrite(bytes);
response.End();
1 record : 662 KB
2 records : 662 KB
3 records : 662 KB
4 records : 662 KB
50 records : 1.29 MB
100 records : 1.29 MB

So the JPG->PNG problem is there with PDF. I’ll wait for the fix. But with DOC, there seems to be another problem…

Also, I don’t understand why the two methods for saving PDF do not give the same file size results. See this thread for our previous discussion.

dstj.

You should not use MemoryStream.GetBuffer() like this. It returns you a buffer that it currently uses and that buffer could be more than what the actual length of the stream. You must take only MemoryStream.Length bytes from the buffer, not the whole buffer. You've got zeroes or gargabe at the end.

And I also found why the size of the DOC file grows that much.

As it turns out the DOC format supports of reusing image bytes only for floating images. Your image is inline and it is has to be written every time. That's the limitation of the DOC format.

In fact, you can try that in Microsoft Word. Open your template in MS Word and copy/paste it 10 times and save. You will get a file that is 1.89mb in size. When you do mail merge with Aspose.Words in this document for 10 records you will get 1.88mb size. So there is nothing to fix again.

If you save as DOCX that will be completely different story. Both in MS Word and in Aspose.Words.

Thanks for the precisions. I hadn’t though of that.

For the sake of completeness, here’s working code :

MemoryStream memStream = new MemoryStream();
doc.SaveToPdf(0, doc.PageCount, memStream, null);
byte[] bytes = new byte[memStream.Length];
memStream.Seek(0, SeekOrigin.Begin);
memStream.Read(bytes, 0, (int)memStream.Length);


response.Clear();
response.ContentType = MimeType;
response.AddHeader(“content-disposition”,“attachment; filename=” + filename);
response.BinaryWrite(bytes);
response.End();

1. Why you don't just use

Save(string fileName, SaveFormat fileFormat, SaveType saveType, HttpResponse response)

It will save in the PDF format alright and add that header and even more.

2. Optimization to your code. To avoid copying into a byte array do this:

response.OutputStream.Write(memStream.GetBuffer(), 0, (int)memStream.Length);

You’re right, I just thought about that after writing it… Thanks.