I am converting pdf to docx like below.
public static void convertToDocx(File pdfFile, File docxFile) throws Exception {
Document doc;
try (FileInputStream fis = new FileInputStream(pdfFile)) {
doc = new Document(fis);
}
DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
try (FileOutputStream fos = new FileOutputStream(docxFile)) {
doc.save(fos, saveOptions);
}
if (docxFile.length() == 0) {
throw new Exception("Conversion fail");
}
}
original pdf has image with text.
origin.pdf (4.4 MB)
But converted docx has only image
How can I keep text in pdf?