PdfFileEditor.Append is adding an exponentially increasing amount of extra bytes with each call

We are attaching files to an existing pdf and getting some very strange memory usage. Here is sample code that reproduces the issue:


using System;
using System.IO;
using Aspose.Pdf;
using Aspose.Pdf.Facades;

namespace Aspose.Test
{
public static class Append
{
public static void Main() { TestAppend(); }

private static void TestAppend()
{
const int N = 16;

var license = new License();
license.SetLicense(“Aspose.Total.lic”);

var baseStream = new MemoryStream();
var baseDoc = new Document();
baseDoc.Save(baseStream);

var attachStream = new MemoryStream();
var attachDoc = new Document(“loop.pdf”);
attachDoc.Save(attachStream);


Console.WriteLine(" Base: " + baseStream.Length);
Console.WriteLine(“Attach: " + attachStream.Length);

long attachSize = attachStream.Length;
long lastExtra = 0;

for (int i = 0; i < N; i++)
{
var output = new MemoryStream();
Console.WriteLine(”----------------------------“);
Console.WriteLine(“Before: " + baseStream.Length);

var editor = new PdfFileEditor {AllowConcatenateExceptions = true};
editor.Append(baseStream, attachStream, 1, 1, output);

long expected = baseStream.Length + attachSize;
long extra = output.Length - expected;
Console.WriteLine(“Expect: " + expected);
Console.WriteLine(” After: " + output.Length);
Console.WriteLine(” Extra: " + extra);
Console.WriteLine(“Change: " + (extra - lastExtra));
Console.WriteLine(”----------------------------”);
Console.WriteLine();

lastExtra = extra;
baseStream = output;
}

Console.WriteLine(“Expected Final Size: " + attachSize * N);
Console.WriteLine(” Actual Final Size: " + baseStream.Length);


var outputDoc = new Document(baseStream);
outputDoc.Save(“Output.pdf”);
}
}
}

Here is the output from the sample code:
Base: 1928
Attach: 3458
----------------------------
Before: 1928
Expect: 5386
After: 5207
Extra: -179
Change: -179
----------------------------
----------------------------
Before: 5207
Expect: 8665
After: 8517
Extra: -148
Change: 31
----------------------------
----------------------------
Before: 8517
Expect: 11975
After: 11892
Extra: -83
Change: 65
----------------------------
----------------------------
Before: 11892
Expect: 15350
After: 15394
Extra: 44
Change: 127
----------------------------
----------------------------
Before: 15394
Expect: 18852
After: 19152
Extra: 300
Change: 256
----------------------------
----------------------------
Before: 19152
Expect: 22610
After: 23422
Extra: 812
Change: 512
----------------------------
----------------------------
Before: 23422
Expect: 26880
After: 28716
Extra: 1836
Change: 1024
----------------------------
----------------------------
Before: 28716
Expect: 32174
After: 36058
Extra: 3884
Change: 2048
----------------------------
----------------------------
Before: 36058
Expect: 39516
After: 47496
Extra: 7980
Change: 4096
----------------------------
----------------------------
Before: 47496
Expect: 50954
After: 67128
Extra: 16174
Change: 8194
----------------------------
----------------------------
Before: 67128
Expect: 70586
After: 103159
Extra: 32573
Change: 16399
----------------------------
----------------------------
Before: 103159
Expect: 106617
After: 171956
Extra: 65339
Change: 32766
----------------------------
----------------------------
Before: 171956
Expect: 175414
After: 306289
Extra: 130875
Change: 65536
----------------------------
----------------------------
Before: 306289
Expect: 309747
After: 571694
Extra: 261947
Change: 131072
----------------------------
----------------------------
Before: 571694
Expect: 575152
After: 1099243
Extra: 524091
Change: 262144
----------------------------
----------------------------
Before: 1099243
Expect: 1102701
After: 2151080
Extra: 1048379
Change: 524288
----------------------------
Expected Final Size: 55328
Actual Final Size: 2151080

Each time the loop runs, the amount of “Extra” data added is twice the amount of the previous pass through the loop. They are almost all exactly powers of two. This extra chunk of data doubling each Append call causes the size of the stream to explode out of control very quickly. For example after 25 attachment calls, there is an extra 256MB of data getting shoved around that as far as we can tell is useless and simply shouldn’t be there. This also takes a silly amount of time to process up in those numbers of attachments. This behavior doesn’t seem to change even when saving the output each loop iteration to an actual pdf file and reloading through Document() either.

Where is this data coming from and how can I get rid of it?

Solved:


This behavior no longer occurs in Aspose.PDF 7.0.0.0

Hi Michael,


Thanks for contacting support and sorry for the delayed response.

I am glad to hear that your problem seems to be resolved with later version. However I would still recommend you to please try using the latest release of Aspose.Pdf for .NET 8.1.0 and till this release, we have made quite some improvements in terms of memory utilization and processing optimization. In case you still encounter the similar issue or you have any further query, please feel free to contact.