Problem of extract text from some pdf documents

ray_zhao · December 12, 2007, 8:03am

Hi,

I use java pdf development kit 1.9 of evaluation version to extract text from my pdf documents,most of pdf can be extract well,but some meet can't read and can't stop issues.

This is my sample code to using pdf development kit:

PdfExtractor extractor = new PdfExtractor();

ByteArrayOutputStream out=new ByteArrayOutputStream();
extractor.bindPdf(source);
extractor.setStartPage(1);
extractor.setEndPage(6);
extractor.extractText();
extractor.getText(out);
String originalcontent=out.toString().trim();
content=originalcontent.substring(179, countnum);
content=content.replaceAll("\r\n", "");
content=content.replaceAll("\n", "");

System.out.println(content);

most pdf can extract to text well,but when extracting some pdf,it can still a long time and the program can't terminate ,so i set a breakpoint and found the program will holding on the sentence "extractor.extractText();" all the time,attached file the one pdf document which meet this problem.

Thank you !

AdeelTaseer · December 12, 2007, 9:51am

Hi,

I have checked this and was able to reproduce the error. I have logged this as PDFKITJAVA-4168. We will try our best to resolve this as soon as possible.

Thanks.

aspose.notifier · September 3, 2009, 5:25am

The issues you have found earlier (filed as 4168) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.