Extract Paragraphs in PDF using Aspose.PDF for Java - ArrayIndexOutOfBoundsException And Unknown Source

withthewind · May 30, 2019, 10:17am

which produces “java.lang.ArrayIndexOutOfBoundsException: 1685 at com.aspose.pdf.ParagraphAbsorber.lf(Unknown Source)” error.

How can we solve this problem?

temp–(3d-pdf).pdf (448.8 KB)

com.aspose.pdf.License license = new com.aspose.pdf.License();

license.setLicense(“src/main/resources/license-office”);

Document pdfDoc= new Document(filePath+“temp–(3d-pdf).pdf”);

PageCollection pages=pdfDoc.getPages();

System. out .println(pages.size());

for ( int i=1;i<=pages.size();i++) {

Page page=pages.get_Item(i);

System. out .println(i);

System. out .println(page);

ParagraphAbsorber absorber = new ParagraphAbsorber();

absorber.visit(page);

List<PageMarkup> sList = absorber.getPageMarkups();

for (PageMarkup pMarkup:sList) {

List<MarkupSection> secs = pMarkup.getSections();

for (MarkupSection mse:secs) {

List<MarkupParagraph> paList=mse.getParagraphs();

for (MarkupParagraph mPa:paList) {

List<TextFragment> fraList=mPa.getFragments();

for (TextFragment tf:fraList) {

System. out .println(tf.getText());

System. out .println("######################");

}

System. out .println("*****************************");

}

asad.ali · May 30, 2019, 7:20pm

@withthewind

Thanks for contacting support.

We were able to replicate this issue in our environment and logged it as PDFJAVA-38595 in our issue tracking system. We will further look into details of the issue and keep you posted with the status of its rectification. Please be patient and spare us little time.

We are sorry for the inconvenience.

aspose.notifier · February 25, 2020, 8:02pm

The issues you have found earlier (filed as PDFJAVA-38595) have been fixed in Aspose.PDF for Java 20.2.