Aspose total, failed to extract image from pdf file

amdiei · August 5, 2021, 11:49am

Hi

I’m using aspose total and when trying the code below, aspose extract corrupted images (it also extract more then 50 images for the attached document).

can you please help

    private IEnumerable<Stream> GetImagesFromStream(Stream stream)
    {
        using (Document pdfDocument = new Document(stream))
        {
            ImagePlacementAbsorber imageAbsorber = new ImagePlacementAbsorber();
            // Accept the absorber for all the pages
            pdfDocument.Pages.Accept(imageAbsorber);

            foreach (ImagePlacement image in imageAbsorber.ImagePlacements)
            {
                Stream imageStream = image.Image.ToStream();
                imageStream.Seek(0, SeekOrigin.Begin);
                imageStreams.Add(imageStream);
            }
        }

        return imageStreams;
    }

asad.ali · August 5, 2021, 7:05pm

@amdiei

We could not find any document attached with your post. Can you please make sure to attach the sample PDF so that we can test the scenario in our environment and address it accordingly?

amdiei · August 8, 2021, 7:57am

mondial.pdf (1.3 MB)
thanks for your response, I have attached the file

asad.ali · August 9, 2021, 1:23pm

@amdiei

We have tested the scenario in our environment using 21.7 version of the API and noticed the similar results that you mentioned. We also checked the PDF in Adobe Reader. It seems like your PDF has multiple inline images in it as well and API is also extracting them as corrupted images. We have logged an issue as PDFNET-50360 in our issue tracking system for further investigation.

Would you please also share the expected output that you want to achieve from your PDF? We will further look into details of the logged ticket and let you know as soon as it is resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.