PdfExtractor text/images "Header is illegal" exception

I need a way to check if a PDF is text only or if it also contains images. I’m using the Aspose.Pdf.Kit API 3.8.0 for this. But the following code results in an exception: “java.lang.Exception: Header is illegal”.

PdfExtractor extractor = new PdfExtractor();
extractor.bindPdf(…);

extractor.extractText();
boolean hasText = (extractor.getWordCount() > 0);

extractor.extractImage();
boolean hasImage = (extractor.hasNextImage());

What is wrong with my code? Or is there a better way to check if there are text and images in a PDF document?

Regards, Erwin.

Hi Erwin,

I have tested this issue at my end, but couldn’t notice any such problem. Could you please make sure that the version you’re using at your end is 3.8.0? If it is the latest version then please provide the following information, so we would be able to reproduce the issue using your particular scenario:

1. Java version
2. OS
3. Working environment (web/windows etc.)

Moreover, please share at which line this exception occurs. We’ll further investigate the issue at our end and guide you accordingly.

We’re sorry for the inconvenience.
Regards,

Java version: 1.6.0_24
OS: Windows XP SP3
Working env: Standalone application
Aspose.Pdf.Kit version: 3.8.0

The exception occurs at the line extractor.extractImage();

java.lang.Exception: Header is illegal.
at com.aspose.pdf.kit.PdfViewer.openPdfFile(Unknown Source)
at com.aspose.pdf.kit.ro.for(Unknown Source)
at com.aspose.pdf.kit.PdfExtractor.extractImage(Unknown Source)


Hi Erwin,

Thank you very much for sharing further details. We’ll investigate the issue at our end and update you with the results accordingly.

We’re sorry for the inconvenience.
Regards,

Hi Erwin,

I have again investigated this issue in detail at my end, using your particular scenario, but I’m afraid, I couldn’t reproduce the issue. Could you please check whether the input PDF causing this problem is the same which you shared with us in your first post? Moreover, can you please try to execute the same code on some different machine and see if the same issue still occurs? We have also released a new version of Aspose.Pdf.Kit for Java, so kindly try to test the issue with that as well. Please share the results with us.

We’re sorry for the inconvenience and looking forward to help you out.
Regards,