Image compression filter on document optimization

Hi,

is it possible to select a another compression filter then DCTDecode for images when i use .OptimizeResources(new Aspose.Pdf.Document.OptimizationOptions()) on an instance of Aspose.Pdf.Document?

Whenever a use doc.Convert(logfile, PdfFormat.v_1_4, ConvertErrorAction.Delete) and afterwards doc.OptimizeResources(myOptimizationOptions) the resulting document has all images recompressed with DCTDecode. But my original PDF File had /Filter/FlateDecode used for the images and the images were about 10 times smaller in the original file.

Thanks for your help.

@JohannesKrackowizer,

Kindly send us your source PDF and code. Please also let us know which Aspose.PDF for .NET API version you are using. We will investigate and share our findings with you. Your response is awaited.

Hi Imran,

I’m sorry but i can not send you our original files for security reasons but I created a sample PDF Example.pdf. After converting it, it will only be compressed with /Filter/DCTDecode and much larger, or with higher compression values the quality of the picture become more and more degraded. My sample code to convert the file, is the following:

I tried Aspose.PDF.dll Version 18.1.0 and 18.3.0 with the same result.

using System;
using System.IO;
using Aspose.Pdf;

namespace PdfTools
{
  public class AsposeTests
  {
    public static void ConvertPdf()
    {
      var sourceFile = @"C:\Temp\PDFModify\Example.pdf";
      var outputFile = @"C:\Temp\PDFModify\Example_Output.pdf";

      var tempFile = Path.GetTempFileName ();
      try
      {
        using (var outStream = File.Open (tempFile, FileMode.Open, FileAccess.ReadWrite, FileShare.Read))
        using (var inStream = File.Open (sourceFile, FileMode.Open, FileAccess.ReadWrite, FileShare.Read))
        {
          using (var doc = new Document (inStream))
          {
            Convert (doc, PdfFormat.v_1_4);
            doc.Save (outStream);
          }
        }

        using (var inStream = File.Open (tempFile, FileMode.Open, FileAccess.ReadWrite, FileShare.Read))
        using (var doc = new Document (inStream))
        {
          Optimize (doc);
          Convert (doc, PdfFormat.PDF_A_1B);
          using (var outStream = new MemoryStream ())
          {
            doc.Save (outStream);
            File.WriteAllBytes (outputFile, outStream.ToArray ());
          }
        }
      }
      finally
      {
        File.Delete (tempFile);
      }
    }

    private static void Convert(Document doc, PdfFormat targetFormat)
    {
      var logFile = Path.GetTempFileName ();
      try
      {
        var success = doc.Convert (logFile, targetFormat, ConvertErrorAction.Delete);

        if (!success)
          throw new Exception ($"Cannot convert to {targetFormat}. " + File.ReadAllText (logFile));
      }
      finally
      {
        File.Delete (logFile);
      }
    }

    private static void Optimize(Document doc)
    {
      var optimizationOptions = new Document.OptimizationOptions ()
      {
        AllowReusePageContent = true,
        LinkDuplcateStreams = true,
        RemoveUnusedObjects = false,
        RemoveUnusedStreams = false,
        CompressImages = true,
        ImageQuality = 80
      };

      doc.OptimizeResources (optimizationOptions);
    }
  }
}

I played around with the Document.OptimizationOptions and tried different values for MaxResolution in combination with ResizeImages = true. But the resulting files stay larger than the originals or the quality of the images becomes much worse. We even have examples where the original is about 1MB and the converted files is around 12MB.

@JohannesKrackowizer,

In the code example, you are converting the source PDF to version 1.4, optimizing it, and then again converting to PDF/A-1B before saving. If we are not converting PDF 1.4 to PDF/A-1B, then the output PDF document size is only increasing with 1 KB. Please review and elaborate about the requirement.

Hi Imran,

thanks for the reply. PDF/A 1B is mandatory in my scenario.

@JohannesKrackowizer,

We have logged an investigation under the ticket ID PDFNET-44369 in our issue tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates.

Hi @imran.rafique,

can you give me an update on this issue?

@JohannesKrackowizer,

The linked ticket ID PDFNET-44369 is pending for the analysis and not resolved yet. We will investigate as per the development schedules and let you know once a significant progress has been made in this regard.

Hi @imran.rafique,

it’s a big problem for us when an image with /Filter/FlateDecode compression is recompressed with the much less effectiv methode /Filter/DCTDecode. It is not happening when we convert to PDF 1.4 but it happens if we switch to PDF/A 1B. But as far as I understand /Filter/FlateDecode is a vaild compression for PDF/A 1B. If I create a PDF/A 1B with embede /Flate/Decode images, for example with pdftron, it passes all validation checks with http://verapdf.org/ So I’m sure this has to be a bug in Aspose.Pdf.

Is there a timeline when this bug will be resolved?

@JohannesKrackowizer,

The linked ticket ID PDFNET-44369 is not resolved. It could take time because there are other high priority tickets in the queue and it is difficult for us to share an estimate before the completion of the analysis phase. We have recorded this information under the same ticket ID PDFNET-44369 in our issue tracking system.

Besides this, we recommend our clients to post their critical issues (or already logged ticket IDs) in the paid support forum. Please refer to this helping link: Aspose support options

Hi @imran.rafique,

how long would it take to fix PDFNET-44369 if we would purchase paid support?

@JohannesKrackowizer,

We are getting details of the linked ticket ID PDFNET-44369, and let you know about the estimate as soon as possible.

@JohannesKrackowizer,

We can view that the priority of the linked ticket ID PDFNET-44369 has been escalated to the paid support. We have plans to investigate this ticket in August, 2018. Furthermore, we will notify you once it is resolved.

The issues you have found earlier (filed as PDFNET-44369) have been fixed in Aspose.PDF for .NET 18.12.