Aspose.PDF set compression algorithm

Hi,


I’m testing several PDF libraries and I have the following question/doubt:

Is it possible to specify the compression algorithm? I’m comparing some PDF libraries and the algorithms used are different. Aspose seems to be using “DCT” and PDFNet seems to be using “LZW” and “Flate”. Is there a setting for handling this?

Basically what I’m doing is extracting a range of pages into a new Document, then this new document is optimized (calling OptimizeResources) and the last step re-attach the images for decrease the file size. This last step is needed because without it, the file size is the same than the original file (26M). But doing this, the file size is 2.5M. If I use other library, like PDFNet, the same range of pages has a size of 800K. I played with XImage.Save method changing the ImageFormat and resolution (this is for replacing an image) but the file size is always 2.5M.

Am I doing something wrong? or Do I need to use another approach?


Regards,

Luis.

Hi Luis,


Thanks for your inquiry. I’m afraid I’m unable to understand your workflow and requirements completely. Can you please share sample code and your input/output documents? So we look into the details and will suggest you accordingly.

We are sorry for the inconvenience caused.

Best Regards,

Hi Tilal,


Ok. Let me explain. We have a PDF test file which original size is 26M. This has 8 pages. Our process consist in split the original file in smaller files (2 pages PDF file). When I got the 2 pages PDF and call the method OptimizeResources, the final file size is 26M, same than original file. With this situation, I created a post asking about this, the link is <a href=". In this post I pasted the code that I’m using and a “workaround” that I implemented to decrease the file size, which consist in replacing the images. Applying this process, the final PDF has a size of 2.5M. If you see the code (from the link), when the app replaces the image, I’m using the Save method from XImage class, which allows to save the image in a stream and specify a resolution, I was playing changing the ImageFormat and several resolutions but the final size is 2.5M, so, there isn’t any difference changing ImageFormat and resolution.

Now, when I changed to other library, PDFNet, it has a class where you can set up some optimization settings for images (color, gray and mono), something similar to Acrobat Pro Optimizer:
* CompressionMode (similar to ImageFormat)
* Downscale (DPI)
If image is greater than some value, downscale to this other value

Configuring the optimizer settings and using PDFNet, the final file size is around 800K, which is a big difference comparing to 2.5M.

After this, I opened the PDF in a text editor and I saw that for example, PDFNet has this:
<</BitsPerComponent 8/ColorSpace 44 0 R/DecodeParms [<</Colors 3/Columns 1050/Predictor 2>>]/Filter [/LZWDecode]/Height 24/Length 3475/Subtype /Image/Width 1050>>

And Aspose this:
<</ColorSpace/DeviceRGB/Subtype/Image/Length 1252/BitsPerComponent 8/Type/XObject/Width 1050/Filter/DCTDecode/Height 24>>stream

Is there a way to configure some settings in order to downscale the image? Or other approach to get better results in the final PDF file size?

Let me know if you have any doubt or question.


Regards,

Luis.

Hi Luis,


Thanks for providing additional information. We are looking into the issue and will get back to you soon.

Best Regards,

Any updates on this? We have a customer who wants to send us 2500 page color PDFs and have us split them into two-page PDFs. The original file they send us is 35MB, and it blows up to 3 GB of files when we split it. Each file includes about 30 images at about 300 dpi. Reducing image resolution down to 150 dpi is only a partial solution, and going down any further would visibly degrade the images.

The compression algorithm that luis.lara mentioned above ("FLATE) is supported by the PDF standard.

Does or will Aspose natively support using different compression algorithms to improve file size optimization?

Hi Luis,


Please accept my apologies for the delayed response. I’m afraid currently we can’t specify image compression algorithm during optimization. We’ve logged a feature request as PDFNEWNET-36018 for the purpose. We will keep you updated about the issue progress via this forum thread.

Moreover, as a workaround can you please convert images to Jpeg while re-attaching the images? Hopefully it will also make some difference.


Document pdfDocument = new Aspose.Pdf.Document(myDir + “input.pdf”);

foreach (Page page in pdfDocument.Pages)
{
int idx = 1;
foreach (Aspose.Pdf.XImage image in page.Resources.Images)
{
using (var imageStream = new MemoryStream())
{
image.Save(imageStream, ImageFormat.Jpeg);
imageStream.Seek(0, SeekOrigin.Begin);
page.Resources.Images.Replace(idx, imageStream);
}
idx = idx + 1;
}

}
pdfDocument.Save(myDir+“Images_Compressed_output.pdf”);

We are sorry for the inconvenience caused.

Best Regards,

Hi Jay,

Thanks for your inquiry.

hancockjt:
Any updates on this? We have a customer who wants to send us 2500 page color PDFs and have us split them into two-page PDFs. The original file they send us is 35MB, and it blows up to 3 GB of files when we split it. Each file includes about 30 images at about 300 dpi. Reducing image resolution down to 150 dpi is only a partial solution, and going down any further would visibly degrade the images.

The compression algorithm that luis.lara mentioned above ("FLATE) is supported by the PDF standard.

Does or will Aspose natively support using different compression algorithms to improve file size optimization?

As mentioned above, We've logged an issue to specify compression algorithm during optimization. We will update you as soon as we make some progress towards issue resolution. For instance, you can try to replace images with Jpeg format and also use optimzeResource() method to optimize the document.

We are sorry for the inconvenience caused.

Best Regards,