JAVA extracts paragraphs with PDF20.11. Why is every line a paragraph?
Attached is the PDF in question文档翻译测试文档_英到中.pdf (248.3 KB)
文档翻译测试文档_中到英语.pdf (218.7 KB)
文档翻译测试文档_中到英语_长文档.pdf (351.1 KB)
Could you please share a sample code snippet which you used at your side? We will test the scenario accordingly and share our feedback with you.
The API extracts text from PDF in a way it was added and present in it. We have noticed the similar behavior in our environment while extracting the paragraphs from your files. Therefore, have logged following tickets in our issue tracking system:
- PDFJAVA-39977 (文档翻译测试文档_中到英语_长文档.pdf)
- PDFJAVA-39978 (文档翻译测试文档_中到英语.pdf)
- PDFJAVA-39979 (文档翻译测试文档_英到中.pdf)
We will further look into details of the logged tickets and keep you posted with the status of their correction. Please be patient and spare us some time.
We are sorry for the inconvenience.