Access PDF EXIF Data and Remove All Metadata from JPG

jamiekitson · August 10, 2021, 12:41pm

Hi,

I want to be able to quickly check and access the PDF EXIF Data of JPGs embedded into it. I can achieve this functionality by saving the image to a stream using pdfExtractor.GetNextImage, but it’s quite slow. Is there any way I can access the PDF EXIF Data more quickly and remove all Metadata from JPG, eg, by accessing the stream data directly without saving it?

Thanks, Jamie

asad.ali · August 10, 2021, 8:28pm

@jamiekitson

We need to investigate the requirement of accessing PDF EXIF Data and removing JPG Metadata. Could you please share a sample PDF along with sample code snippet that is already being used and is slow? We will log an investigation ticket and share the ID with you.

jamiekitson · August 11, 2021, 8:59am

Thanks

    private static System.Drawing.RotateFlipType CheckPdf(string filepath)
    {
        // Open input PDF
        using (PdfExtractor pdfExtractor = new PdfExtractor())
        {
            pdfExtractor.BindPdf(filepath);

            // Extract images
            pdfExtractor.ExtractImage();
            // Get all the extracted images
            while (pdfExtractor.HasNextImage())
            {
                // Read image into memory stream
                using (MemoryStream memoryStream = new MemoryStream())
                {
                    pdfExtractor.GetNextImage(memoryStream);
                    using (var img = System.Drawing.Image.FromStream(memoryStream))
                        return ExifRotate(img);
                    //log($"File '{filepath}' rotation: {rot}", rot == System.Drawing.RotateFlipType.RotateNoneFlipNone ? 2 : 0);
                }
            }
        } // */
        return System.Drawing.RotateFlipType.RotateNoneFlipNone;
    }

    private static System.Drawing.RotateFlipType ExifRotate(System.Drawing.Image img)
    {
        const int exifOrientationID = 0x112; //274

        var rot = System.Drawing.RotateFlipType.RotateNoneFlipNone;

        if (img.PropertyIdList.Contains(exifOrientationID))
        {
            var prop = img.GetPropertyItem(exifOrientationID);
            int val = BitConverter.ToUInt16(prop.Value, 0);

            if (val == 3 || val == 4)
                rot = System.Drawing.RotateFlipType.Rotate180FlipNone;
            else if (val == 5 || val == 6)
                rot = System.Drawing.RotateFlipType.Rotate90FlipNone;
            else if (val == 7 || val == 8)
                rot = System.Drawing.RotateFlipType.Rotate270FlipNone;

            if (val == 2 || val == 4 || val == 5 || val == 7)
                rot |= System.Drawing.RotateFlipType.RotateNoneFlipX;
        }

        return rot;
    }

jamiekitson · August 11, 2021, 9:00am

RotatedGuitar.pdf (1.7 MB)
RotatedWord.pdf (140.6 KB)

jamiekitson · August 11, 2021, 2:27pm

My next question is going to be whether we can remove this EXIF JPG metadata with Aspose, either in situ or otherwise.

asad.ali · August 11, 2021, 7:28pm

@jamiekitson

We have logged two following investigation tickets in our issue tracking system for your requirements;

PDFNET-50377 (Quickly access the EXIF data of embedded image)
PDFNET-50378 (Remove EXIF data of embedded image using Aspose.PDF)

We will surely look into details of these tickets and keep you posted with the status of their resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.

edtsoftware · April 16, 2024, 2:13am

We are also interested in these two tickets. Have there been any developments since this message was posted?

Thanks

asad.ali · April 16, 2024, 4:25pm

@edtsoftware

We are afraid that no progress has been made yet towards the tickets resolution. However, your concerns have been recorded and we will surely update you once we have some updates. Please spare us some time.