Trying to convert a big pdf to tiff in docker using linux containers may have a memory leak

Hi,

I’m trying to do the following conversion on docker linux containers, for a pdf with more than 600 pages

private List ConvertFromPdf(string filePath,
TiffConversionSetting conversionSetting)
{

        //create return object
        var images = new List<string>();
        //Set CompressionType
        TiffSettings tiffSettings = new TiffSettings();
        switch (conversionSetting.ImageFormat)
        {
            case ImageFormats.TiffFormat.None:
                tiffSettings.Compression = CompressionType.None;
                break;
            case ImageFormats.TiffFormat.Ccitt4:
                tiffSettings.Compression = CompressionType.CCITT4;
                break;
            case ImageFormats.TiffFormat.Lzw:
                tiffSettings.Compression = CompressionType.LZW;
                break;
            case ImageFormats.TiffFormat.Default:
                break;
        }
        //Set ConversionDepth
        switch (conversionSetting.BitsPerPixel)
        {
            case 1:
                tiffSettings.Depth = ColorDepth.Format1bpp;
                break;
            case 4:
                tiffSettings.Depth = ColorDepth.Format4bpp;
                break;
            case 8:
                tiffSettings.Depth = ColorDepth.Format8bpp;
                break;
            case 24:
                tiffSettings.Depth = ColorDepth.Default;
                break;
        }
        if (conversionSetting.Brightness <= 0)
            tiffSettings.Brightness = 0.33F;
        else
            tiffSettings.Brightness = conversionSetting.Brightness;
        //Resolution
        var dpi = new Resolution(conversionSetting.Resolution);
        
            //create PdfConverter object and bind input PDF file
            using (PdfConverter pdfConverter = new PdfConverter())
            {
                pdfConverter.Resolution = dpi;
                pdfConverter.BindPdf(filePath);
                pdfConverter.DoConvert();
                string fileDestinationPath = Guid.NewGuid().ToString();
                //convert to TIFF image
                pdfConverter.SaveAsTIFF(fileDestinationPath, tiffSettings);
                pdfConverter.Close();
                //add to list
                images.Add(fileDestinationPath);
            
        }

}

It will begin processing this line: pdfConverter.SaveAsTIFF(fileDestinationPath, tiffSettings);

but the memory consumption keeps going up and the container just crashes. Tested running locally on windows without docker and the memory keeps steady, I think there might be a memory leak somewhere

@GMarcucciBruce

Would you please share your docker file along with sample PDF document. We will test the scenario in our environment and address it accordingly. Also, please try using following code to convert your PDF with latest version of the API i.e. 20.8 and let us know about your feedback:

Document document = new Document(dataDir + "Input.pdf");
// Create Resolution object
Resolution resolution = new Resolution(300);
// Create TiffSettings object
TiffSettings tiffSettings = new TiffSettings()
{
 Compression = CompressionType.CCITT4,
 Depth = Aspose.Pdf.Devices.ColorDepth.Format8bpp,
 Shape = ShapeType.Portrait
};

TiffDevice tiffDevice = new TiffDevice(resolution, tiffSettings);
tiffDevice.Process(document, dataDir + "Input_pdf.tiff");

I am currently testing that code and still see irregular use of memory when compared to windows.

this is my dockerfile

FROM mcr.microsoft.com/dotnet/core/aspnet:3.1 AS base
WORKDIR /app


RUN echo "deb http://ftp.us.debian.org/debian stretch main contrib" >> /etc/apt/sources.list

# Generate the APT cache
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update \
   && apt-get install -y apt-utils wget


RUN apt-get install -y \
   uuid-dev uuid-runtime gcc g++ libc-dev-bin \
   linux-libc-dev libx11-6 libx11-dev libxt6 libc6-dev \ 
   libgdiplus \
   libxt-dev sqlite3 libsqlite3-dev libfreetype6 libfontconfig-dev

COPY  FileConversionService.RestApi/Fonts /usr/share/fonts/truetype/

# Install Microsoft fonts (http://askubuntu.com/a/25614)
RUN echo "ttf-mscorefonts-installer msttcorefonts/accepted-mscorefonts-eula select true" | debconf-set-selections
RUN apt-get install -y \
   fontconfig ttf-mscorefonts-installer

RUN fc-cache -fv


# Clean up APT cache
RUN rm -rf /var/lib/apt/lists/*
EXPOSE 80

## Installing tools for dotnet core
FROM mcr.microsoft.com/dotnet/core/sdk:3.1 As pre-build 
RUN dotnet tool install --global dotnet-sonarscanner
RUN dotnet tool install --global coverlet.console
RUN apt-get update && apt-get install -y openjdk-11-jre


# Add package source
RUN echo "deb http://ftp.us.debian.org/debian stretch main contrib" >> /etc/apt/sources.list

# Generate the APT cache
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update \
   && apt-get install -y apt-utils wget

RUN apt-get install -y \
   uuid-dev uuid-runtime gcc g++ libc-dev-bin \
   linux-libc-dev libx11-6 libx11-dev libxt6 libc6-dev \
   libgdiplus \
   libxt-dev sqlite3 libsqlite3-dev libfreetype6 libfontconfig-dev


# Install Microsoft fonts (http://askubuntu.com/a/25614)
RUN echo "ttf-mscorefonts-installer msttcorefonts/accepted-mscorefonts-eula select true" | debconf-set-selections
RUN apt-get install -y \
   fontconfig ttf-mscorefonts-installer


# Clean up APT cache
RUN rm -rf /var/lib/apt/lists/*

and I’m using this pdf to test https://www.hq.nasa.gov/alsj/a17/A17_FlightPlan.pdf

for the dockerfile I just copied the installed dependencies for you to see

@GMarcucciBruce

Thanks for providing requested details.

We have logged an investigation ticket as PDFNET-48713 in our issue tracking system for this case. We will further look into its details and keep you informed about status of the ticket resolution. Please be patient and give us some time.

We are sorry for the inconvenience.

Were you able to replicate this issue, I tried bumping my memory up to 3GB on my container, just running this goes over 3Gb and container goes down

var dpi = new Resolution(conversionSetting.Resolution);
using (Aspose.Pdf.Document pdfDoc = new Aspose.Pdf.Document(filePath))
{

                var device = new TiffDevice(dpi);

                var outputFile = Guid.NewGuid().ToString();
                device.Process(pdfDoc, outputFile);

}

specifically on that device.Process line.

@GMarcucciBruce

We were able to reproduce the issue in our environment and details have been logged under the ticket ID PDFNET-48713. We will inform you as soon as it is resolved. We have updated the ticket as per the recently provided information by you as well. Please spare us some time.

Ok additionally, I was testing with smaller pdf samples and saw another issue where container just stops and this is shown on the log **
ERROR:region.c:1155:GdipCombineRegionPath: assertion failed: (region->bitmap)

I found some sample pdfs to replicate this issue as well should I create a separate post for this?

@GMarcucciBruce

Would you kindly make sure that libgdiplus package is properly installed in the environment. In case issue still persists, please let us know.

It is since I can convert other pdfs and convert this pdf into multiple tiff files with a single image, the issue is when trying to generate a tiff with multiple images, which is my actual use case

@GMarcucciBruce

Would you please share a sample PDF with us for this use case. Also, please share how you are trying to generate TIFF with multiple images so that we can further proceed to assist you accordingly.

Sorry please disregard my post on ERROR:region.c:1155:GdipCombineRegionPath: assertion failed: (region->bitmap)

I fixed that during the weekend, I didn’t remember I also post this here :slight_smile: my only issue is the memory consumption

@GMarcucciBruce

It is nice to hear that your issue was resolved.

We will let you know as soon as we have some updates regarding the issue of memory consumption.

@asad.ali just checking if there was any new development for this issue?

@GMarcucciBruce

We are afraid that earlier logged ticket about memory consumption is not yet resolved. We will surely check it and provide a fix on first come first serve basis. As soon as we make some definite progress towards its resolution, we will inform you. Please give us some time.

We apologize for the inconvenience.

Has there been any progress on this issue, I have tested with latest library and I’m still seeing that the memory skyrockets.

@GMarcucciBruce

We are afraid that earlier logged issue is not yet resolved. Please note that the performance-related issues are complex in nature and require significant amount of time to get resolved. We will surely inform you as soon as we have some certain news about ticket resolution.

We apologize for the inconvenience caused.