Free Support Forum - aspose.com

Aspose PDF - Read existing PDF Paragraph by Paragraph


#21

Unabel to download the zip.
Error :
Sorry, this file is private. Only visible to topic owner and staff members.

But its still mot working for me. I am using JDK 9 and Ubuntu 18. Please confirm if there is any issue with these environments.


#22

@yogesh30890

You can download the file from this link.

There is no such known limitation for these environment. All Aspose.PDF for Java versions (higher than 18.4) support JDK 9. Furthermore, as requested earlier, would you please confirm if you are using a valid license. If still you are facing an issue with a valid license even, please share a sample console application with us. We will test the scenario in our environment and address it accordingly.


#23

Thanks @asad.ali. There was issue with licence that I was using. Its working fine now. Can you please help me if its possible to extract paragraph text per header.
For example if I have pdf structure like

  1. Heading 1 Text
    a. Sub heading 1 Text

    1. Text Text
    2. Text Text
  2. Heading 2 text

And I can get complete text of heading 1 together and heading 2 and so on…


#24

@yogesh30890

Thanks for your feedback.

Would you please share your sample PDF document with us. We will test the scenario in our environment and address it accordingly.


#25

Downloads.zip (117.5 KB)

Please find the sample PDF and expected output file


#26

@yogesh30890

Thanks for sharing sample PDF.

Please note that Headings and Paragraphs are defined at PDF generation time and once PDF is generated, there are no separate Headings and Paragraphs as all these elements become part of single PDF content. Considering which, it is not feasible to extract text on the basis of headings because they cannot be determined once PDF is saved.

You may however extract text from PDF using regular expressions. TextFragmentAbsorber Class offer various ways to extract text from PDF document which you can view at he shared link. We hope this may help you extracting desired text from your PDF. In case you face any issue, please feel free to let us know.