Aspose PDF TextAbsorber - Java

Hello,

We are trying to read a pdf by line number. Can we achieve that using aspose pdf for java? With the sample code in your documentation, it is returning all the text from input pdf. Our requirement is to read text from specific line numbers and not the whole pdf. Please advise.

We are using Aspose Pdf for Java 7 version 22.8

Thanks,
B

@judiciary

Can you please share your sample PDF for our reference? We will test the scenario in our environment and address it accordingly.

Attached a sample PDF. We would like to read the data present on specific line numbers. For example 5, 10, 12 etc

Thanks
B
Issue.pdf (2.0 MB)

@judiciary

We are checking it and will get back to you shortly.

Hello,

Any update on this?

Thanks,
B

@judiciary

We have tried different regular expressions to extract the text using line numbers however, we were not successful. Therefore, we have logged an investigation ticket for a deeper analysis of your requirements. We will look into the details of the ticket and let you know once it is resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.

@judiciary

We do not have that functionality, but you can use the code snippet below to detect lines by coordinates in this document. Please note that it may require some improvements for documents with complex structures. Additionally, you can filter out certain lines if needed.

Also, please pay attention to the use of flatten() for obtaining results, including values inside form fields; otherwise, these values will be ignored.

Document pdfDocument = new Document("Issue.pdf");
pdfDocument.flatten();

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
pdfDocument.getPages().accept(textFragmentAbsorber);
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
TextFragment previousFragment = null;

int lineNumber = 0;

var fragments = sortByTop(textFragmentCollection);

for (TextFragment textFragment : fragments) {
  if (previousFragment == null || !isOnSameLine(previousFragment, textFragment)) {
     lineNumber++;
     System.out.println("\n ----- Line " + lineNumber + " ----- ");
  }

  System.out.print(textFragment.getText());
  previousFragment = textFragment;
}

...

static final double MIN_DISTANCE = 1.0;

private static boolean isOnSameLine(TextFragment fragment1, TextFragment fragment2) {
    return distanceBetween(fragment1, fragment2) < MIN_DISTANCE;
}

public static double distanceBetween(TextFragment fragment1, TextFragment fragment2) {
    return Math.max(fragment1.getRectangle().getURY(), fragment2.getRectangle().getURY()) - Math.min(fragment1.getRectangle().getURY(), fragment2.getRectangle().getURY());
}

private TextFragment[] sortByTop(TextFragmentCollection fragmentCollection) {
  TextFragment[] array = new TextFragment[fragmentCollection.size()];

   var i = 0;
   for (TextFragment textFragment : fragmentCollection) {
      array[i] = textFragment;    
      i++;
   }

   Arrays.sort(array, new Comparator<TextFragment>() {
      @Override
      public int compare(TextFragment fragment1, TextFragment fragment2) {
         return Double.compare(fragment2.getRectangle().getLLY(), fragment1.getRectangle().getLLY());
      }    
   });
   return array;
}

Result:

 ----- Line 1 ----- 
9A. There is due, unpaid and owing from tenant(s) to plaintiff/landlord rent as follows: 
 ----- Line 2 ----- 
………………………………… Rent for July 2023 (Balance)$ 644.08
 ----- Line 3 ----- 
………………………………… Rent for August 2023$ 1,881.36
 ----- Line 4 ----- 
………………………………… Late Fees Apr-Aug 2023 ($100.00x5)$ 500.00
 ----- Line 5 ----- 
………………………………… 
 ----- Line 6 ----- 
………………………………… 
 ----- Line 7 ----- 
………………………………… 
 ----- Line 8 ----- 
………………………………… 
 ----- Line 9 ----- 
………………………………… 
 ----- Line 10 ----- 
………………………………… 
 ----- Line 11 ----- 
$ 300.00………………………………… Attorney fees*
 ----- Line 12 ----- 
$ 2,161.50………………………………… Other* (specify): 
 ----- Line 13 ----- 
$57.00 Court Costs 
 ----- Line 14 ----- 
$2104.50 Misc. Legal 
 ----- Line 15 ----- 
$ 5,486.94
 ----- Line 16 ----- 
ArrearsTOTAL
 ----- Line 17 ----- 
*the late charges, attorney fees and other charges are permitted to be charged as rent for purposes of this
 ----- Line 18 ----- 
action by federal, state and local law (including rent control and rent leveling) and by the lease.
 ----- Line 19 ----- 
9B. The date that the next rent is due isSeptember 1, 2023 
 ----- Line 20 ----- 
If this case is scheduled for trial before that date, the total amount you must pay to have this 
 ----- Line 21 ----- 
complaint dismissed is  $ 5,486.94
 ----- Line 22 ----- 
 If this case is scheduled for trial on or after that date, the total amount you must pay to have this 
 ----- Line 23 ----- 
complaint dismissed is  $ 5,486.94
 ----- Line 24 ----- 
Please include an additional late fee if your balance is paid after the 5th of the month. $100.00
 ----- Line 25 ----- 
These amounts do not include late fees or attorney fees for Section 8 and public housing tenants. 
 ----- Line 26 ----- 
Additional late fees may also be applicable. Payment may be made to the landlord or the clerk of 
 ----- Line 27 ----- 
made by 4:30 the court at any time before the trial date, but on the trial date payment must be 
 ----- Line 28 ----- 
p.m. to get the case dismissed.
 ----- Line 29 ----- 
9C. Landlord has attempted to settle this matter and is filing this Complaint as a last resort, to bring the Tenant's 
 ----- Line 30 ----- 
account current.
 ----- Line 31 ----- 
Check paragraphs 10 and 11 if the complaint is for other than or in addition to Non-Payment of rent. 
 ----- Line 32 ----- 
Attach all Notices to Cease and Notices to Quit/Demands For Possession.  
 ----- Line 33 ----- 
10. ___ Landlord seeks a judgment for possession for the additional or alternative reason(s) stated in the
 ----- Line 34 ----- 
notices attached to this complaint. STATE REASONS:
 ----- Line 35 ----- 
11. ___ The tenant(s) has (have) not surrendered possession of the premises and tenant(s) hold(s) over and
 ----- Line 36 ----- 
continue(s) in possession without the consent of landlord.
 ----- Line 37 ----- 
WHEREFORE, plaintiff/landlord demands judgment for possession against the tenant(s) listed above, 
 ----- Line 38 ----- 
together with costs. 
 ----- Line 39 ----- 
SignOfFilingAttorneyORLandlordProSe
 ----- Line 40 ----- 
DATED:  ________________________________ August 18, 2023
 ----- Line 41 ----- 
Attorney or Landlord Pro Se)(Signature of Filing 
 ----- Line 42 ----- 
NameOfAttorneyOrLandlordProSe
 ----- Line 43 ----- 
(Printed or Typed Name of Attorney or Landlord Pro Se)
 ----- Line 44 ----- 
Lauren Perrella, Esq., Atty ID No. 118482014 
 ----- Line 45 ----- 
February 18, 2022
 ----- Line 46 ----- 
Revised [09/01/2016] 7/14/2020. CN 11252 (Appendix XI-X)