Hi Team,
I need to extract all the text from the attached PDF File, however in the extracted file the data from the table is missing.
I have tried various examples mentioned in your github but unfortunately the result was same.
public void extractTextBasedOnColumns() throws IOException {
// String path = “PathToDir”;
// instantiate Document instance with path of input file as argument
Document pdfDocument = new Document(“Do.pdf”);
// create TextFragment Absorber instance
TextFragmentAbsorber tfa = new TextFragmentAbsorber();
pdfDocument.getPages().accept(tfa);
// create TextFragment Collection instance
TextFragmentCollection tfc = tfa.getTextFragments();
for (TextFragment tf : (Iterable) tfc) {
// need to reduce font size at least for 70%
tf.getTextState().setFontSize(tf.getTextState().getFontSize() * 0.7f);
}
// temporary save the file
pdfDocument.save("" + “TempOutput.pdf”);
pdfDocument = new Document(“TempOutput.pdf”);
TextAbsorber textAbsorber = new TextAbsorber();
pdfDocument.getPages().accept(textAbsorber);
String extractedText = textAbsorber.getText();
textAbsorber.visit(pdfDocument);
// Create a writer and open the file
java.io.FileWriter writer = new java.io.FileWriter(new java.io.File(“C:\Newfolder\Extracted_text.txt”));
writer.write(extractedText);
// Write a line of text to the file
// Close the stream
writer.close();
}
Thanks,
ShivamDo.pdf (409.9 KB)