Hi, I am using Apose PDF version 24.9 to extract text from a PDF. For certain types of PDF, the extracted text contains 2 carriage return when the original pdf only has one.
The attached document can be used to reproduce the issue, the text extracted is as follow
John Smith
John Doe
When I would expect
John Smith
John Doe
The code used to extract is as follow
Document doc = new Document( dataDir + "asposeSupport_3.pdf"); TextAbsorber textAbsorber = new TextAbsorber(); TextExtractionOptions options; options = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure); textAbsorber.setExtractionOptions(options); doc.getPages().accept(textAbsorber); content = textAbsorber.getText();
asposeSupport_3.pdf (135.5 KB)
Thanks