Hi,
I try to extract text from pdf. But when there is Arabic in the pdf, the extracted text order is reversed. I’m using 21.7 and jdk14.
Here is my code:
BufferedInputStream bis = new BufferedInputStream(new FileInputStream("C:\\Users\\xxx\\Desktop\\lucy-test-1.pdf"));
Document document = new Document(bis);
PdfExtractor ext = new PdfExtractor();
ext.setExtractTextMode(1);
ext.bindPdf(document);
ext.extractText(StandardCharsets.UTF_8);
ext.getText(new FileOutputStream("C:\\Users\\xxx\\Desktop\\lucy-test-1.txt"))
Attachment is my pdf and result text file:
extract text from pdf.zip (158.3 KB)