PDF grows in size after iterating annotations

If I iterate over the pages and then the annotations in the attached PDF and then save the file, the file increases in size (by 34K in this particular example.)


However, if I use the Optimize method I do not see the size increase. I see this in the 10.x and 11.0 versions.

Sample code:

Aspose.Pdf.Document doc = new Aspose.Pdf.Document([File]);

int numPages = doc.Pages.Count;
for (int i = 1; i <= numPages; i++)
{
Aspose.Pdf.Page pg = doc.Pages[i];
int numAnnotations = pg.Annotations.Count;
for (int j = 1; j <= numAnnotations; j++)
{
//don’t do anything
}
}
doc.Optimize();
doc.Save(@"[NewFile]");

Hi,


Thanks for contacting support.

I have tested the scenario using Aspose.Pdf for .NET 11.0.0 where I have used a valid license file and I am unable to notice any issue. As per my observations, the size of resultant file is same as input PDF (264 KB). Can you please share some further details regarding your working environment. We are sorry for your inconvenience.

Did you remark out the Optimize method? If it is in place, then the size of the file does not increase. I had left it in there to show my full process.


Thanks

Hi,


When commenting out the doc.Optimize(); code line, the size only increases by 5KB. Can you please try using the latest release and share your findings.

I can confirm that it grows by 5k with the latest version of the assembly. Why does it grow at all if I haven’t done anything with the file? If I simply Load and Save the PDF, I actually see it get a little smaller.

Hi,


Thanks for contacting support.

The size is slightly increased because Aspose.Pdf for .NET adds some meta information while saving the document. Furthermore, please note that Optimize(…) method enables fast web view option for PDF file and in order to reduce file size, please try using following code snippet.

For your reference, I have also attached the output file generated over my end.

[C#]

Aspose.Pdf.Document
doc = new Aspose.Pdf.Document(“c:/pdftest/Document+Mastery+You+Can+Rely+On+_+Microsystems_source.pdf”);<o:p></o:p>

int numPages = doc.Pages.Count;

for (int i = 1; i <= numPages; i++)

{

Aspose.Pdf.Page pg = doc.Pages[i];

int numAnnotations = pg.Annotations.Count;

for (int j = 1; j <= numAnnotations; j++)

{

//don't do anything

}

}

doc.OptimizeResources(new Document.OptimizationOptions()

{

AllowReusePageContent = true,

CompressImages = true,

LinkDuplcateStreams = true,

RemoveUnusedObjects = true,

RemoveUnusedStreams = true });

doc.OptimizeSize = true;

doc.Save(@“c:/pdftest/Document+Mastery+You+Can+Rely+On+_+Microsystems.pdf”);

I can confirm that setting the Document.OptimizationOptions does prevent the PDF from growing in size. Can you provide any documentation on the following properties?

  • AllowReusePageContent
  • LinkDuplicateStreams
  • RemoveUnusedObjects
  • RemoveUnusedStreams
Also, we continue to research this issue. We see that the more times the code accesses properties on an annotation, then the larger the file gets on the save. Is this expected/intended?
Freedom Solutions:
I can confirm that setting the Document.OptimizationOptions does prevent the PDF from growing in size. Can you provide any documentation on the following properties?
  • AllowReusePageContent
  • LinkDuplicateStreams
  • RemoveUnusedObjects
  • RemoveUnusedStreams
Hi,

Thanks for your patience.

  • AllowReusePageContent - If value is set as true, the page contents will be reused when document is optimized for equal pages.
  • LinkDuplicateStreams - If this flag is set to true, Resource streams will be analyzed. If duplicate streams are found (i.e. if stream contents is equal), then these streams will be stored as one object. This allows to decrease document size in some cases (for example, when same document was concatenated multiple times).
  • RemoveUnusedObjects - If this flag is set to true, all document objects will be checked and unused objects (i.e. objects which does not have any reference) are removed from document.
  • RemoveUnusedStreams - If this flag set to true, every resource is checked on it's usage. If resource is never used, then resources is removed. This may decrease document size for example when pages were extracted from document.
Freedom Solutions:
Also, we continue to research this issue. We see that the more times the code accesses properties on an annotation, then the larger the file gets on the save. Is this expected/intended?
Do you mean executing the same code for multiple times ? Can you please share some further details, so that we can further investigate it in our environment. We are sorry for the delay and inconvenience.

Again, I can confirm that the settings prevent the file from growing in size. However, the settings do seem to have an impact on the processing time. We are seeing additional seconds added to the file save.


What I meant by accessing multiple times is that if we have a for loop that goes over the comments 2 or more times, we see the file grow for each iteration.

Freedom Solutions:
Again, I can confirm that the settings prevent the file from growing in size. However, the settings do seem to have an impact on the processing time. We are seeing additional seconds added to the file save.
Hi,

Thanks for contacting support.

The time is consumed because API performs the steps to compress PDF file contents. However if you are facing a significant increase in time, please share the resource files.

Freedom Solutions:
What I meant by accessing multiple times is that if we have a for loop that goes over the comments 2 or more times, we see the file grow for each iteration.
Are you using the file shared in first post in a loop or performing operation using different files. Please share the details and code snippet, so that we can test the scenario in our environment. We are sorry for your inconvenience.