Issue in Extract Text from PDf

Hi,

I tried the following piece of code to extract text from PDF and it gives me a null pointer exception. Please help me resolve this issue. It is an urgent requirement. PFA the source pdf file.

String content = "";

com.aspose.pdf.facades.PdfExtractor pdfExtractor = new com.aspose.pdf.facades.PdfExtractor();

pdfExtractor.bindPdf("C:\\Users\\kiran_babu01\\Desktop\\Desktop\\ArunResume.pdf");

//use parameterless ExtractText method

pdfExtractor.extractText(); // Error line

pdfExtractor.getText("D:\\Reports\\Search Folder\\Data\\1.txt");

Error:

Exception in thread "main" java.lang.NullPointerException

at com.aspose.pdf.b.c.g.c.p.FE(Unknown Source)

at com.aspose.pdf.b.c.g.c.p.ddd(Unknown Source)

at com.aspose.pdf.b.c.g.c.p.populateMaps(Unknown Source)

at com.aspose.pdf.b.c.g.c.p.ddb(Unknown Source)

at com.aspose.pdf.b.c.g.c.p.dda(Unknown Source)

at com.aspose.pdf.b.c.g.c.b.a.g(Unknown Source)

at com.aspose.pdf.b.c.g.c.ac.dei(Unknown Source)

at com.aspose.pdf.b.c.g.c.ac.isAccessible(Unknown Source)

at com.aspose.pdf.Font.a(Unknown Source)

at com.aspose.pdf.Font.(Unknown Source)

at com.aspose.pdf.FontCollection.cOL(Unknown Source)

at com.aspose.pdf.FontCollection.(Unknown Source)

at com.aspose.pdf.Resources.getFonts(Unknown Source)

at com.aspose.pdf.b.c.g.d.l.a(Unknown Source)

at com.aspose.pdf.b.c.g.d.l.parse(Unknown Source)

at com.aspose.pdf.b.c.g.d.n.a(Unknown Source)

at com.aspose.pdf.b.c.g.d.n.a(Unknown Source)

at com.aspose.pdf.b.c.g.d.n.dff(Unknown Source)

at com.aspose.pdf.b.c.g.d.n.(Unknown Source)

at com.aspose.pdf.TextAbsorber.visit(Unknown Source)

at com.aspose.pdf.facades.PdfExtractor.extractText(Unknown Source)

at com.aspose.pdf.facades.PdfExtractor.extractText(Unknown Source)

at com.infosys.finacleportal.batch.TestJNPI.main(TestJNPI.java:102)

Line no 102 (where the error is pointed to) is :

pdfExtractor.extractText();

Thanks and regards,

Kiran

Hi Kiran,


Thanks for your inquiry. We have tested the scenario with Aspose.Pdf for Java 9.0.0 and unable to replicate the problem. Please download and try latest version of Aspose.Pdf for Java, it will help you to fix the issue.

Please feel free to contact us for any further assistance.

Best Regards,

Dear Tilal Ahmad,

I'm working on the latest version of Aspose (Aspose.Pdf for Java 9.0.0) and still get the same exception. My run time deatils are as follows:

OS: Windows 7 Enterprise Service Pack 1 (32 Bit)

Java : JRE 7

Kindly help me resolve this issue at the earliest as it is critical.

Thanks,

Kiran.

Hi Kiran,


Thanks for sharing the details.

I have also tested the scenario while using Aspose.Pdf for Java 9.0.0 in Eclipse Juno application running over Windows 7 (x64) Enterprise with Oracle JDK 1.7 and I am also unable to reproduce the issue. The text is properly being extracted. Please note that my regional language settings are English (US).

Can you please share some further details regarding your working environment i.e. regional language settings, any additional configurations over your system, JDK version (is it IBM JDK or Oracle JDK) etc. We are sorry for this inconveniiece.


ps, for your reference, I have attached the resultant file containing text/contents extracted from PDF file.

Hi Kiran,


Thanks for your patience. As above mentioned, we have tried to replicate the issue over a couple of machines using latest JAR of Aspose.Pdf for Java and I am afraid we are unable to notice the issue. Can you please confirm your region and language setting of your system?

Moreover in recent past, we have noticed some issue if default Locale was other than English. To confirm the problem cause at your end, please test your code by changing Locale to English before instantiating Aspose.Pdf object as following and share the results.

Locale.setDefault(Locale.ENGLISH);


We are truly sorry for the inconvenience caused.

Best Regards,