.NET Aspose PDF Optimizations not Working

Hello, I am trying to optimize my organization’s PDF sizes from those generated by Aspose.PDF .NET.

We currently are licensed on 19.9.0, but were testing out the 20.7.0 to see if the optimizations would make the PDFs smaller.

From a found forum post, I currently am applying these optimizations to any PDF document we convert.

            var optimizationOptions = new OptimizationOptions
            {
                RemoveUnusedObjects = true,
                RemoveUnusedStreams = true,
                ImageCompressionOptions =
                {
                    CompressImages = true,
                    ImageQuality = 50,
                    ResizeImages = true,
                    MaxResolution = 100,
                    Version = ImageCompressionVersion.Standard
                },
                ImageEncoding = ImageEncoding.Jpeg
            };
            doc.OptimizeResources(optimizationOptions);

We expected this to dramatically decrease the size of our PDF documents, but it actually is making them much larger. I will provide and example for you, of a sample PPTX document we are converting to a PDF with the applied optimizations.

Original Powerpoint: Dropbox - bigPowerPoint.pptx - Simplify your life
Aspose Optimized PDF: Dropbox - optimized.pdf - Simplify your life
Microsoft Optimized PDF: Dropbox - bigPowerPoint.pdf - Simplify your life

You can see that the size of the Powerpoint is ~13MB and the size of the Optimized PDF is ~15MB. This is not isolated to Powerpoint, but even taking a regular PDF, applying the optimizations and saving seems to make the file larger as well. I have also attached the Microsoft Optimized PDF that was made within Powerpoint and saving that PDF, it comes in at around 3MB after optimization.

If there is any step that we are missing, I would appreciate any help that you can offer to me.

Thank you for your time.

@callen97

Would you kindly share how you are converting your PPTX files into PDF. We will further proceed to assist you accordingly.

Of course, thank you for getting back to me.

var file = "path/to//bigPowerPoint.pptx";
var fileBytes = File.ReadAllBytes(file);
using (var memoryStream = new MemoryStream(fileBytes))
{
    using (var document = new Presentation(memoryStream))
    {
        using (var outStream = new MemoryStream())
        {
            document.Save(outStream, SaveFormat.Pdf);
            var pdfDocument = new Document(outStream);
            pdfDocument.ApplyOptimizations(); //This method contains the exact logic I sent to you above
            pdfDocument.Save(outStream);
            var pdfBytes = outStream.ToArray();
        }
    }
}
File.WriteAllBytes("C:\\temp\\optimized.pdf", pdfBytes);

Here is another way that I was taking a PDF on my machine, running it through the API and then testing to see if the size was smaller.

using Document = Aspose.Pdf.Document;

var blob = File.ReadAllBytes("path\\to\\pdfwithimages.pdf");
var doc = new Document(new MemoryStream(blob));
doc.ApplyOptimizations();
doc.Save("path\\to\\smallerpdfwithimages.pdf");

@callen97

Would you please try to save the output PDF first and then re-initialize Aspose.Pdf.Document object with saved PDF file to optimize it. We tested the scenario in our environment by following same approach and output PDF size was reduced to 8MB (original PDF was of 14MB).

output.pdf (7.8 MB)

Aspose.Slides.Presentation pres = new Aspose.Slides.Presentation(dataDir + "bigPowerPoint.pptx");
pres.Save(dataDir + "saved.pdf", Aspose.Slides.Export.SaveFormat.Pdf);
// re-initialize document object
Document doc = new Document(dataDir + "saved.pdf");
// optimize it
//////
// Used similar code that you shared in your first post
/////
// save the document
doc.Save(dataDir + "output.pdf");

Saving the Presentation to the disk and then re-reading it would not work for the organization I work for. We convert millions of documents every day, and doing this double work would be very inhibitive to the system we have in place.

As you saw in my above post, we convert the bytes to go between the different file formats, which allows us to avoid duplicate actions as well as reading/writing to the disk.

@callen97

You can also implement the suggested approach using memory stream as this way you would not have to save the file to disk:

MemoryStream ms = new MemoryStream();
pres.Save(ms, Aspose.Slides.Export.SaveFormat.Pdf);
ms.Seek(0, SeekOrigin.Begin);
Document doc = new Document(ms);

Please let us know in case it also does not suit your needs. We will further proceed to assist you accordingly.

@asad.ali

We are now achieving the optimizations you were seeing. Using the MemoryStream path, we just needed to add a second memory to place the saved PDF into. Thank you for all of your time and responses!

@asad.ali

The level of compression that we are seeing does not seem to be as comparable to other forms of PDF compression. In our system we combine PDFs often and did a test example, here is one that we compressed using Aspose and one that we compressed using Adobe Acrobat.

Aspose Optimized: Dropbox - combined.pdf - Simplify your life ~22MB
Acrobat Optimized: Dropbox - combined._acrobatpdf.pdf - Simplify your life ~700KB

As you can see in the example PDF, there is many combined documents to make a bigger one. Are there any other optimizations we can be placing on the PDF to get to a level as low as the Acrobat One? Getting the smallest filesize while maintaining PDF quality is of our highest importance currently. Thank you.

@callen97

Would you please also provide respective input file (PDF document without optimization generated at your end) so that we can test the scenario in our environment and compare the file size differences before and after the optimization.

@asad.ali
Here is the original input file.

@callen97

Please check complete code snippet which we used to optimize your original input file and size got reduced to 623KB:

Document doc = new Document(dataDir + @"originalcombined.pdf");
var optimizationOptions = new Aspose.Pdf.Optimization.OptimizationOptions();
optimizationOptions.RemoveUnusedObjects = true;
optimizationOptions.RemoveUnusedStreams = true;
optimizationOptions.AllowReusePageContent = true;
optimizationOptions.LinkDuplcateStreams = true;
optimizationOptions.UnembedFonts = true;
optimizationOptions.ImageCompressionOptions.CompressImages = true;
optimizationOptions.ImageCompressionOptions.ImageQuality = 30;
optimizationOptions.ImageCompressionOptions.ResizeImages = true;
optimizationOptions.ImageCompressionOptions.Version = Aspose.Pdf.Optimization.ImageCompressionVersion.Fast;
optimizationOptions.ImageCompressionOptions.MaxResolution = 72;
doc.OptimizeResources(optimizationOptions);
doc.Save(dataDir + @"ExampleOptimized_20.7.pdf");