Convert PDF to "Black and White"


#1

Aspose.Pdf Version 18.4

Hi,

is it possible to convert an existing PDF to black/white and ‘not’ greyscale?

If so, then how?

Please for info, thank you!
Thomas Eszterwitsch


#2

@ThomasEsz,

There is no way to convert the PDF document in the black and white color. We have already logged a feature request under the ticket ID PDFNET-35728 in our issue tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates.


#3

Hi, thanks for the information.

I then at least tried to convert the existing images into monochrome and replace the images in the resources:

        private void ConvertToBlackWhite(Document pdfDocument)
        {
            int index = 1;
            foreach (Page p in pdfDocument.Pages)
            {
                Console.WriteLine($"Page {index}, IsBlank(0.1) (10%): {p.IsBlank(0.1)}");

                int pdfImageCount = p.Resources.Images.Count;
                Console.WriteLine($"Page {index}, ImageCount: {pdfImageCount}");
                for (int imgCount = 1; imgCount < p.Resources.Images.Count + 1; imgCount++)
                {
                    using (Stream str = new MemoryStream())
                    {
                        p.Resources.Images[imgCount].Save(str);

                        using (Stream strc = new MemoryStream())
                        {
                            using (Bitmap img = this.SaveGIFWithNewColorTable(new Bitmap(str), 2, false))
                            //using (Bitmap img = new Bitmap(p.Resources.Images[imgCount].Grayscaled))
                            {
                                img.Save(strc, ImageFormat.Gif);
                                /*Aspose.Pdf.Image pdfImg = new Aspose.Pdf.Image() { ImageStream = str };
                                pdfImg.IsBlackWhite = true;
                                pdfImg.
                                p.Resources.Images.Replace(imgCount, pdfImg.ImageStream);*/
                                p.Resources.Images.Replace(imgCount, strc);
                            }
                        }
                    }
                }

                if (pdfImageCount > 0) Console.WriteLine($"Page {index}, All images set to 'black/white'...");
                else Console.WriteLine($"Page {index}, No images present on this page...");
            }
        }

However, the pdf then has more bytes than before, how can this be explained?

Here is the link to the bitmap conversion: https://support.microsoft.com/en-us/help/319061/how-to-save-a-gif-file-with-a-new-color-table-by-using-visual-c

grafik.png (10.4 KB)
As you can see, the GIF from 13KB(PdfImage.gif) has become smaller on 3kb (ConvertedImage.gif), but the PDF from 223KB(Test1.pdf) to 259kb(Test_Conv.pdf) is larger.


#4

@ThomasEsz,

You can optimize the size of PDF document by calling the OptimizeResources member of Document instance. Please refer to this helping article: Optimize PDF File Size. If this does not help, then kindly send us your source PDF and image files. We will investigate your scenario in our environment, and share our findings with you.


#5

Thank you for your help!
I have extended the code around this block, only with the following result:

Without specifying ‘ImageQuality’, it is the same result.
Specifying this property with 1% causes the PDF to have only 2kb less in the end than before with the colored image.

The expectations would have been the 10Kb that the image was qualitatively deteriorated.

I’m sorry, I can’t leave this PDF to you because it contains important information.

Here are the technical information about this image:
grafik.png (8.4 KB)


#6

@ThomasEsz,

We are working over your query and will get back to you soon.


#7

@ThomasEsz,

Thank you for the details. We receive documents for testing purposes only and do not share publicly and take care of the sensitive private documents. Furthermore, you can remove any sensitive information after replacing with dummy data, and send a ZIP of files through a private message or attach in this forum because we have marked this forum thread as private.

The SaveGIFWithNewColorTable method definition is not available on Microsoft sub-page/help topic, and navigating to the error: Sorry, page not found. Kindly also send the complete code. Your response is awaited.


#8

Hello, ok, here is the rest of the code:

    private Bitmap SaveGIFWithNewColorTable(System.Drawing.Image image, uint nColors, bool fTransparent)
    {
        // GIF codec supports 256 colors maximum, monochrome minimum.
        if (nColors > 256) nColors = 256;
        if (nColors < 2) nColors = 2;

        // Make a new 8-BPP indexed bitmap that is the same size as the source image.
        int width = image.Width;
        int height = image.Height;

        // Always use PixelFormat8bppIndexed because that is the color
        // table-based interface to the GIF codec.
        Bitmap bitmap = new Bitmap(width, height, PixelFormat.Format8bppIndexed);

        // Create a color palette big enough to hold the colors you want.
        ColorPalette pal = this.GetColorPalette(nColors);

        // Initialize a new color table with entries that are determined
        // by some optimal palette-finding algorithm; for demonstration 
        // purposes, use a grayscale.
        for (uint i = 0; i < nColors; i++)
        {
            uint alpha = 0xFF;                      // Colors are opaque.
            uint intensity = i * 0xFF / (nColors - 1);    // Even distribution. 

            // The GIF encoder makes the first entry in the palette
            // that has a ZERO alpha the transparent color in the GIF.
            // Pick the first one arbitrarily, for demonstration purposes.

            // Make this color index Transparent
            if (i == 0 && fTransparent) alpha = 0;

            // Create a gray scale for demonstration purposes.
            // Otherwise, use your favorite color reduction algorithm
            // and an optimum palette for that algorithm generated here.
            // For example, a color histogram, or a median cut palette.
            pal.Entries[i] = System.Drawing.Color.FromArgb((int)alpha, (int)intensity, (int)intensity, (int)intensity);
        }

        // Set the palette into the new Bitmap object.
        bitmap.Palette = pal;


        // Use GetPixel below to pull out the color data of Image.
        // Because GetPixel isn't defined on an Image, make a copy 
        // in a Bitmap instead. Make a new Bitmap that is the same size as the
        // image that you want to export. Or, try to
        // interpret the native pixel format of the image by using a LockBits
        // call. Use PixelFormat32BppARGB so you can wrap a Graphics  
        // around it.
        Bitmap bmpCopy = new Bitmap(width, height, PixelFormat.Format32bppArgb);
        {
            Graphics g = Graphics.FromImage(bmpCopy);
            g.PageUnit = GraphicsUnit.Pixel;
            // Transfer the Image to the Bitmap
            g.DrawImage(image, 0, 0, width, height);
            // g goes out of scope and is marked for garbage collection.
            // Force it, just to keep things clean.
            g.Dispose();
        }

        // Lock a rectangular portion of the bitmap for writing.
        BitmapData bitmapData;
        System.Drawing.Rectangle rect = new System.Drawing.Rectangle(0, 0, width, height);

        bitmapData = bitmap.LockBits(rect, ImageLockMode.WriteOnly, PixelFormat.Format8bppIndexed);

        // Write to the temporary buffer that is provided by LockBits.
        // Copy the pixels from the source image in this loop.
        // Because you want an index, convert RGB to the appropriate
        // palette index here.
        IntPtr pixels = bitmapData.Scan0;

        unsafe
        {
            // Get the pointer to the image bits.
            // This is the unsafe operation.
            byte* pBits;
            if (bitmapData.Stride > 0) pBits = (byte*)pixels.ToPointer();
            // If the Stide is negative, Scan0 points to the last 
            // scanline in the buffer. To normalize the loop, obtain
            // a pointer to the front of the buffer that is located 
            // (Height-1) scanlines previous.
            else pBits = (byte*)pixels.ToPointer() + bitmapData.Stride * (height - 1);

            uint stride = (uint)Math.Abs(bitmapData.Stride);

            for (uint row = 0; row < height; ++row)
            {
                for (uint col = 0; col < width; ++col)
                {
                    // Map palette indexes for a gray scale.
                    // If you use some other technique to color convert,
                    // put your favorite color reduction algorithm here.
                    System.Drawing.Color pixel;    // The source pixel.

                    // The destination pixel.
                    // The pointer to the color index byte of the
                    // destination; this real pointer causes this
                    // code to be considered unsafe.
                    byte* p8bppPixel = pBits + row * stride + col;

                    pixel = bmpCopy.GetPixel((int)col, (int)row);

                    // Use luminance/chrominance conversion to get grayscale.
                    // Basically, turn the image into black and white TV.
                    // Do not calculate Cr or Cb because you 
                    // discard the color anyway.
                    // Y = Red * 0.299 + Green * 0.587 + Blue * 0.114

                    // This expression is best as integer math for performance,
                    // however, because GetPixel listed earlier is the slowest 
                    // part of this loop, the expression is left as 
                    // floating point for clarity.

                    double luminance = (pixel.R * 0.299) + (pixel.G * 0.587) + (pixel.B * 0.114);

                    // Gray scale is an intensity map from black to white.
                    // Compute the index to the grayscale entry that
                    // approximates the luminance, and then round the index.
                    // Also, constrain the index choices by the number of
                    // colors to do, and then set that pixel's index to the 
                    // byte value.
                    *p8bppPixel = (byte)(luminance * (nColors - 1) / 255 + 0.5);

                } /* end loop for col */
            } /* end loop for row */
        } /* end unsafe */

        // To commit the changes, unlock the portion of the bitmap.  
        bitmap.UnlockBits(bitmapData);

        //bitmap.Save(filename, ImageFormat.Gif);

        // Bitmap goes out of scope here and is also marked for
        // garbage collection.
        // Pal is referenced by bitmap and goes away.
        // BmpCopy goes out of scope here and is marked for garbage
        // collection. Force it, because it is probably quite large.
        // The same applies to bitmap.
        bmpCopy.Dispose();
        //bitmap.Dispose();
        return bitmap;
    }

    private ColorPalette GetColorPalette(uint nColors)
    {
        // Assume monochrome image.
        PixelFormat bitscolordepth = PixelFormat.Format1bppIndexed;
        ColorPalette palette;    // The Palette we are stealing
        Bitmap bitmap;     // The source of the stolen palette

        // Determine number of colors.
        if (nColors > 2) bitscolordepth = PixelFormat.Format4bppIndexed;
        if (nColors > 16) bitscolordepth = PixelFormat.Format8bppIndexed;

        // Make a new Bitmap object to get its Palette.
        bitmap = new Bitmap(1, 1, bitscolordepth);
        palette = bitmap.Palette;   // Grab the palette
        bitmap.Dispose();           // cleanup the source Bitmap
        return palette;             // Send the palette back
    }

And here the last state of the first method:

    private void ConvertToBlackWhite(Document pdfDocument)
    {
        int index = 1;
        foreach (Page p in pdfDocument.Pages)
        {
            Console.WriteLine($"Page {index}, IsBlank(0.1) (10%): {p.IsBlank(0.1)}");

            int pdfImageCount = p.Resources.Images.Count;
            Console.WriteLine($"Page {index}, ImageCount: {pdfImageCount}");
            for (int imgCount = 1; imgCount < pdfImageCount + 1; imgCount++)
            {
                using (Stream str = new MemoryStream())
                {
                    p.Resources.Images[imgCount].Save(str);
                    Bitmap sourceBm = new Bitmap(str);
                    sourceBm.Save("PdfImage.gif", ImageFormat.Gif);

                    using (Stream strc = new MemoryStream())
                    {
                        using (Bitmap img = this.SaveGIFWithNewColorTable(sourceBm, 2, false))
                        {
                            img.Save("ConvertedImage.gif", ImageFormat.Gif);
                            img.Save(strc, ImageFormat.Gif);

                            p.Resources.Images.Replace(imgCount, strc);
                        }
                    }
                }
            }

            if (pdfImageCount > 0) Console.WriteLine($"Page {index}, All images set to 'black/white'...");
            else Console.WriteLine($"Page {index}, No images present on this page...");
        }

        pdfDocument.OptimizeResources(new Document.OptimizationOptions()
        {
            LinkDuplcateStreams = true,
            RemoveUnusedObjects = true,
            RemoveUnusedStreams = true,
            CompressImages = true,
            ImageQuality = 1
        });
    }

And at the end the PDF:
Test1.zip (211.9 KB)

The images and the converted PDF are saved with the code.


#9

@ThomasEsz,

We have tested your scenario with the latest version 18.5 of Aspose.PDF for .NET API, and the output PDF size is downgraded. This is the output PDF: Output.pdf (220.2 KB)


#10

Hello, that was also my explanation:

Now we’re back at the point before your test :wink:

grafik.png (3.1 KB)


#11

@ThomasEsz,

An investigation has been logged under the ticket ID PDFNET-44713 in our issue tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates.


#12

Good day,
wanted to inform me, if there is something new about this point?

My last finding about this problem is that the image format jpeg is always used internally. This means that if I want to swap an existing image with a gif, it will be converted back to a jpeg and the image will grow bigger.

We have a customer where this is required and we have not yet made any progress in it.

Please for information, thanks in advance!


#13

@ThomasEsz

Thank you for sharing your findings.

We are afraid PDFNET-44713 is pending for analysis owing to previously logged tickets. We have recorded your comments and will update you as soon as some significant updates will be available. We appreciate your patience and comprehension in this regard.


#14

Hello,

i have tested the latest Pdf.Net version, is here an mistake in the error description?

grafik.png (23.9 KB)


#15

@ThomasEsz

Thank you for highlighting it.

We will update the description as (0..100] or [1..100] to cover all possible values in the range. A ticket with ID PDFNET-45738 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.


#16

Hello,

with this last version Aspose.PDF 18.11.0 an improvement of ‘OptimizeResources’ seems to have been made. With our last tests, we have achieved excellent results even in color!
This is quite an acceptable solution that will also please the customer.

grafik.png (15.2 KB)

If GIF images with poor resolution can be used internally, for example, you could reduce the PDF size even further, but no longer has a higher priority with this version.

However, I still have one question, and that is the right course of action:

grafik.png (16.7 KB)

We convert a wide variety of files to pdf, which means that the pdf is created first in the memory. However, in order to make the deterioration of the quality (as above), this file must first be stored, then reopened. Only then will the ‘OptimizationOptions’ take effect.

Is that so correct?

With best thanks in advance, Thomas Eszterwitsch


#17

@ThomasEsz

Thank you for your kind feedback.

Regarding optimization, you do not necessarily need to save the file to the disk. However, you need to call Save method which you can use in combination with MemoryStream; and then perform optimization on saved stream as per your requirements.

We hope this will be helpful. Please feel free to contact us if you need any further assistance.


#18

Hi,
thank you for the feedback.

As you can see, I had done it that way at the end. Storing it to the disk was only for testing the result itself.

Unfortunately, I have now discovered another problem:
The basic task sheet is that we generate 1 PDF from images, PDF, Excel and Word files. As far as we’ve realized, everything works right except for putting a PDF together into the Merged.pdf.

I uploaded the package with the data. The Merged.pdf is the finished plugged together pdf of all content. The AB8637430000.PDF doesn’t appear correctly in this new PDF, though, the Adobe Reader even creates an error:
grafik.png (5.9 KB)
d02b4d3b-b595-408c-9edd-b38b9e794e3f.zip (952.3 KB)

As soon as this code is executed after the save:

    public static void ReducePdfQuality(Aspose.Pdf.Document pdfDocument, int imageQuality = 20, int dpi = 75)
    {
        Aspose.Pdf.Optimization.OptimizationOptions optimizationOptions = new Aspose.Pdf.Optimization.OptimizationOptions()
        {
            LinkDuplcateStreams = true,
            RemoveUnusedObjects = true,
            RemoveUnusedStreams = true,
        };
        optimizationOptions.ImageCompressionOptions.CompressImages = true;
        optimizationOptions.ImageCompressionOptions.ImageQuality = imageQuality;
        optimizationOptions.ImageCompressionOptions.MaxResolution = dpi;
        optimizationOptions.ImageCompressionOptions.ResizeImages = true;
        // Best resolution: Standard
        optimizationOptions.ImageCompressionOptions.Version = Aspose.Pdf.Optimization.ImageCompressionVersion.Standard;
        pdfDocument.OptimizeResources(optimizationOptions);
    }

imageQuality = 33
dpi = 200

This is the combining of PDF files:

public class PdfConverterPdf : PdfConverterBase
{
    public override bool GetPdfPages(string fileName, string fileExtension, Stream attachmentStream, PageCollection newPages)
    {
        using (Document attachmentPdf = new Document(attachmentStream))
        {
            if (attachmentPdf.Pages.Count > 0) newPages.Add(attachmentPdf.Pages);
        }
        return true;
    }
}

I tried to omit one after the other an option of OptimizationOptions in order to be able to reduce it to one, only unfortunately without success.

Only if OptimizationOptions is not executed, then the Merged.pdf is created correctly.

However, the whole scenario works if the source pdf only owns images as content.

Please for support, thank you in advance
Thomas Eszterwitsch


#19

@ThomasEsz

Please always create separate topics for separate issues.

We have been able to reproduce the issue with optimization in our environment. A ticket with ID PDFNET-45792 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.