High Memory Utilization -- Maybe Leak?

HI again,

I am using Aspose 23.1.1 to compress thousands of documents. I am running compression in a thread, and the number of threads is variable.

When I run the program, the memory utilization goes through the roof. I have it tuned to not-quite-reach out of memory, but it gets close.

Here’s the code the compresses a file:

        private static void CompressIt(DocumentToStitch docToCompress)
    {
        var totalPages = docToCompress.Images.Count;
        var maxPages = AppSettings.MaxPages;
        var isParts = (totalPages > maxPages);
        int[] pages = {0,0};
        var parts = (totalPages / maxPages);
        parts += ((totalPages - (parts * maxPages)) > 0) ? 1 : 0;

        for (int i = 0; i < parts; i++)
        {
            IEnumerable<string> images  = new List<string>();
            int low = i * maxPages;
            int number = Math.Min(maxPages, (totalPages - (i * maxPages)));
            pages[0] = low + 1;
            pages[1] = low + number;
            images = docToCompress.Images.GetRange(low, number);

            var optimizeOptions = new Aspose.Pdf.Optimization.OptimizationOptions();
            optimizeOptions.ImageCompressionOptions.CompressImages = true;
            optimizeOptions.ImageCompressionOptions.ImageQuality = AppSettings.ImageQuality;
            optimizeOptions.ImageCompressionOptions.MaxResolution = AppSettings.Resolution;
            optimizeOptions.ImageCompressionOptions.ResizeImages = true;
            optimizeOptions.ImageCompressionOptions.Version = Aspose.Pdf.Optimization.ImageCompressionVersion.Standard;
            optimizeOptions.RemoveUnusedObjects = true;
            optimizeOptions.RemoveUnusedStreams = true;
            var fileToCompress = docToCompress.OutputPDF;
            if (isParts)
            {
                fileToCompress = fileToCompress.Replace("_Complete", string.Format("_Pages {0:D4} to {1:D4}", pages[0], pages[1]));
            }
            if (File.Exists(fileToCompress))
            {
                var outputFile = fileToCompress + ".output.pdf";
                try
                {
                    using (var pdfDoc = new Document(fileToCompress))
                    {
                        pdfDoc.OptimizeResources(optimizeOptions);
                        pdfDoc.OptimizeSize = true;
                        pdfDoc.Save(outputFile);
                        docToCompress.Compressed = true;
                    }
                    File.Delete(fileToCompress);
                    File.Move(outputFile, fileToCompress, false);
                }
                catch (Exception e)
                {
                    Logger.LogWrite($"Compression File: {fileToCompress}; DocumentInfo_id: {docToCompress.DocumentInfoId}");
                    Logger.LogWrite(e.ToString());
                    return;
                }
            }
            GC.Collect();
        }
    }

Note that the document may have been broken into 50 page chunks. The documents are just a set of TIF’s or JPG’s stitched together on a another thread. (Right now, I’m not stitching, just compressing.)

Do you see anything that might be a leak?

Thanks,
Bob

@BobFlanders

Is it possible for you to share a sample console application showing exact routine of your program along with some sample PDFs? It would help us in replicating the issue in our environment and address it accordingly.

Here’s a stripped down version of the code. Using the latest Aspose PDF (23.1.1) and Dotnet 7.0.

using System.Collections.Concurrent;

using Microsoft.Data.Sqlite;
using Aspose.Pdf;
using System.Diagnostics;

namespace Stitchem
{
    // essentially the program loads the files to compress, and starts a number of threads based on a value in configuration. (8 threads)
    // It monitors memory used, and only starts new threads when the memory used is below the number of GB in configuration. (36GB)
    // The production machine has about 70GB free memory when the program starts. There is a sqlite database that notes completed tasks, but
    // has been removed from this code. This code won't compile. it is a subset of the full application.
    
    class Program
    {
        #region Statics
        public static readonly QueueStat CompressQueue = new QueueStat("CompressQ");
        public static Settings AppSettings = new Settings();
        public static Thread CompressThread = new Thread(Program.ServiceCompressQueue);
        #endregion

        #region Main
        public static void Main()
        {
           try
            {
                SetAsposeLicense();
                BuildWorkQueuesQueue();
                CompressThread.Start();
    
                while (CompressRunning)
                {
                    Thread.Sleep(500);
                }
            }
            finally
            {
                Console.WriteLine("Done.");
            }
        }
        #endregion

        #region Compression
        private static void ServiceCompressQueue(object? obj)
        {
            while (!CompressQueue.IsEmpty)
            {
                var maxThreads = Math.Min(CompressQueue.Count, AppSettings.CompressThreads);
                var memUsed = Process.GetCurrentProcess().PrivateMemorySize64 / 1024.0 / 1024.0 / 1024.0;
                var tooMuchMem = (memUsed >= AppSettings.MaxMemory);

                if ((threadList.Count < maxThreads) && !CompressQueue.IsEmpty && !tooMuchMem)
                {
		    var compressThread = new Thread(Program.CompressADocument);
		    compressThread.Start();
		    threadList.Add(compressThread);
		    CompressThreads = threadList.Count;
                }
                for (int i = threadList.Count; --i >= 0;)
                {
                    if (!threadList[i].IsAlive)
                    {
                        if (threadList[i].Join(0))
                        {
                            threadList.RemoveAt(i);
                            CompressCount.Add(1);
                        }
                    }
                }
                Thread.Sleep(0);
            }
            CompressRunning = false;
        }

        private static void CompressADocument()
        {
            DocumentToStitch? docToCompress;
            if (!CompressQueue.IsEmpty && CompressQueue.TryDequeue(out docToCompress))
            {
                if (AppSettings.Optimize && docToCompress is not null)
                {
                    CompressIt(docToCompress);
                }
            }
        }

        private static void CompressIt(DocumentToStitch docToCompress)
        {
            var totalPages = docToCompress.Images.Count;
            var maxPages = AppSettings.MaxPages;
            var isParts = (totalPages > maxPages);
            int[] pages = {0,0};
            var parts = (totalPages / maxPages);
            parts += ((totalPages - (parts * maxPages)) > 0) ? 1 : 0;

            for (int i = 0; i < parts; i++)
            {
                IEnumerable<string> images  = new List<string>();
                int low = i * maxPages;
                int number = Math.Min(maxPages, (totalPages - (i * maxPages)));
                pages[0] = low + 1;
                pages[1] = low + number;
                images = docToCompress.Images.GetRange(low, number);

                var optimizeOptions = new Aspose.Pdf.Optimization.OptimizationOptions();
                optimizeOptions.ImageCompressionOptions.CompressImages = true;
                optimizeOptions.ImageCompressionOptions.ImageQuality = AppSettings.ImageQuality;
                optimizeOptions.ImageCompressionOptions.MaxResolution = AppSettings.Resolution;
                optimizeOptions.ImageCompressionOptions.ResizeImages = true;
                optimizeOptions.ImageCompressionOptions.Version = Aspose.Pdf.Optimization.ImageCompressionVersion.Standard;
                optimizeOptions.RemoveUnusedObjects = true;
                optimizeOptions.RemoveUnusedStreams = true;
                var fileToCompress = docToCompress.OutputPDF;
                if (isParts)
                {
                    fileToCompress = fileToCompress.Replace("_Complete", string.Format("_Pages {0:D4} to {1:D4}", pages[0], pages[1]));
                }
                if (File.Exists(fileToCompress))
                {
                    var outputFile = fileToCompress + ".output.pdf";
                    try
                    {
                        using (var pdfDoc = new Document(fileToCompress))
                        {
                            pdfDoc.OptimizeResources(optimizeOptions);
                            pdfDoc.OptimizeSize = true;
                            pdfDoc.Save(outputFile);
                            docToCompress.Compressed = true;
                        }
                        File.Delete(fileToCompress);
                        File.Move(outputFile, fileToCompress, false);
                    }
                    catch (Exception e)
                    {
                        Logger.LogWrite($"Compression File: {fileToCompress}; DocumentInfo_id: {docToCompress.DocumentInfoId}");
                        Logger.LogWrite(e.ToString());
                        return;
                    }
                }
                GC.Collect();
            }
        }
        #endregion


        #region BuildWorkQueues
        private static void BuildWorkQueues()
        {
		// loads CompressQueue with files to stitch and compress from a DB.
        }
        #endregion

        #region Helpers
        public static void SetAsposeLicense()
        {
            Aspose.Pdf.License license = new Aspose.Pdf.License();
            try
            {
                license.SetLicense("Aspose.Pdf.lic");
            }
            catch (Exception)
            {
                throw;
            }
        }
        #endregion
    }
}

I cannot give any of the files. They are confidential. In general, they are 50 pages each, and each stitched together from single page tiffs. The PDF’s are usually about 20-100mb pre-compression and mostly look like legal documents albeit on 8.5"x11" pages. Some of the original tif’s are very large with pictures, etc, but that is not typical.

image.png (17.8 KB)

This is how memory “looks” during a run.

Program output during run:

Loading Stitch Queue…
101545
101545p/9718MB; 0.2% Done: 288p/(19MB) !SCV(6) M:34.4GB 0d 0:01:09 elapsed; Est. Done: 2/16/2023 8:03:55 PM

StitchQ: Docs:0/0; Pages:0/0; MB:0/0
DoneQ: Docs:0/0; Pages:0/0; MB:0/0
CompressQ: Docs:2071/12; Pages:101545/576; MB:9718/44
0 Stitches Complete; 6 Compresses Complete

101545p/9718MB; 30.8% Done: 30540p/(2997MB) !SCV(6) M:54.7GB^ 0d 2:01:55 elapsed; Est. Done: 2/16/2023 4:47:36 PM

The M:54.7GB shows the memory used at that time.
the V(6) shows 6 compress threads running.

Thanks,
Bob Flanders

@BobFlanders

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes/investigation results according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-53702

You can obtain Paid Support services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Thank you, Asad. Hope something turns up.

Regards,
Bob

A bit of additional information. Memory goes way up with 4 or more threads. 3 is very stable. Also, is there a way to look at the Issue as it is being worked? Thanks!

@BobFlanders

The issue has recently been logged in our issue tracking system and will be investigated/resolved on a first come first serve basis. We are afraid that no progress has yet been made towards ticket resolution. However, we will keep you posted with the rectification status via this forum thread. Please spare us some time.

We are sorry for the inconvenience.