PDF extract paragraph

JAVA extracts paragraphs with PDF20.11. Why is every line a paragraph?
Attached is the PDF in question文档翻译测试文档_英到中.pdf (248.3 KB)
文档翻译测试文档_中到英语.pdf (218.7 KB)
文档翻译测试文档_中到英语_长文档.pdf (351.1 KB)

@dongsp

Could you please share a sample code snippet which you used at your side? We will test the scenario accordingly and share our feedback with you.

code.png (50.8 KB)
Hi , attached is the code snippet

@dongsp

The API extracts text from PDF in a way it was added and present in it. We have noticed the similar behavior in our environment while extracting the paragraphs from your files. Therefore, have logged following tickets in our issue tracking system:

  • PDFJAVA-39977 (文档翻译测试文档_中到英语_长文档.pdf)
  • PDFJAVA-39978 (文档翻译测试文档_中到英语.pdf)
  • PDFJAVA-39979 (文档翻译测试文档_英到中.pdf)

We will further look into details of the logged tickets and keep you posted with the status of their correction. Please be patient and spare us some time.

We are sorry for the inconvenience.