PdfExtractor not working with Illustrator PDFs

We are tring to use the PdfExtractor class from Pdf.Kit version 3.5 to pull the images out of a PDF file. This works fine for PDFs created with Adobe Photoshop (Save As PDF) and Adobe Acrobat, but it fails for PDFs that were created using Adobe Illustrator (Save As PDF).

It appears that Illustrator may embed the images differently than Photoshop or Acrobat, thus not making them available in the same way that you are expecting.

Please look into this and let me know if it can be fixed. The code we are using is listed below.

PdfExtractor extractor = new PdfExtractor();
FileStream inStream = null;
inStream = File.Open(FileManager.GetBasePath() + document.documentLocation, FileMode.Open);

extractor.BindPdf(inStream);

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

extractor.ExtractImage();

MemoryStream image = new MemoryStream();

extractor.GetNextImage(image);

Thanks!

Dan

Hi Dan,

Please share the PDF file you’re having problem with. We’ll investigate the issue at our end and update you accordingly.

We’re sorry for the inconvenience.
Regards,

Here is a test PDF that we created.

Dan

Hi Dan,

I have logged this issue as PDFKITNET-10313 in our issue tracking system. Our development team will be looking into the matter and you’ll be updated via this forum as the issue is resolved.

We’re sorry for the inconvenience.
Regards,

Please provide me with a timeline to fix this once you get one from your team.

Thanks,

Dan

Hi Dan,

Our development team is still investigating the effort and the time required for this issue. I’ll let you know as I receive some update from our development team.

We’re sorry for any inconvenience.
Regards,

Shahzad,

Do you have an update for me this week?

Thanks!

Dan

Hi Dan,

I would like to inform that that issue logged as PDFKITNET-10313 will be fixed in our upcoming monthly release due at the end of the September.

However, I would like to inform you that the objects you’re trying to extract as images, actually, are not images; these are the rectangle objects. Our component currently doesn’t support this feature, however we have logged this as PDFKITNET-10537 in our issue tracking system. Our team will be working on this feature; nevertheless, I’m afraid it’ll take some time before we could provide you this functionality.

I’m sure, in the meanwhile, the fix for PDFKITNET-10313 will avoid any exceptions in case there are no images in the PDF (rather only the rectangle objects).

We appreciate your patience.
Regards,

Shahzad,

Thank you for your reply. When adding support for the rectangle objects, will you be trying to convert those to images or how will they be handled?

For our project, each PDF is suppose to contain a single image. I need to extract those images to merge with other PDF files. Since Illustrator saves the image as "rectangle objects" my hope would be that those objects could be converted to images and then merged as if they were an image to begin with.

Please let me know what your plans are for handling these rectangle objects and what you believe your timeframe will be to provide a working fix.

Thank you!

Dan

Hi Dan,

We’ll save these rectangles as images.

As far as the time frame for this issue is concerned, we’re not quite sure yet. Our team is looking into this requirement and once we get an idea about the time and effort required for this issue, we’ll update you.

We appreciate your patience. If you have any other questions, please do let us know.
Regards,

The issues you have found earlier (filed as 10537;10313) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

I downloaded the 3.9.0.0 update and used the attached test file, but it did not show that there were any images in the PDF.

Please look into this and respond.

Thanks,

Dan

P.S. The code I used was:

Dim extractor As PdfExtractor = New PdfExtractor()

extractor.BindPdf("c:\test.pdf")

extractor.StartPage = 1

extractor.EndPage = 1

extractor.ExtractImage()

Dim prefix As String = "c:\Image"

Dim suffix As String = ".jpg"

Dim imageCount As Integer = 1

While extractor.HasNextImage()

extractor.GetNextImage(prefix + imageCount + suffix)

imageCount = imageCount + 1

End While

Hi Dan,

First of all, I would like to inform you that the objects, in this file and the file shared earlier, are rectangles and not the actual images. However, in order to extract these objects as images, we have introduced an overloaded method and you can use it as under:


extractor.ExtractImage(ExtractImageMode.IncludeRectangles);

Moreover, the issue was fixed with the file you shared earlier and should be working in the scenarios covered under previous sample file. However, I have noticed that the issue still exists with the file you shared with your previous post. I'm afraid this might be a different scenario and our team need to look into this. I have logged this issue as PDFKITNET-12597 in our issue tracking system. You'll be updated via this forum thread once the issue is resolved.

We're sorry for the inconvenience.
Regards,

The issues you have found earlier (filed as 12597) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.