Hi,I am using the following code snippet to extract text.
Document pdfDocument = new Document("c:/documents/testdata/Entire_Proposal_1410329.pdf");
//Text Device
TextDevice textDevice = new TextDevice();
//Extraction Options
TextExtractionOptions extractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
textDevice.setExtractionOptions(extractionOptions);
textDevice.process(pdfDocument.getPages().get_Item(4), "c:/documents/testdata/Extracted.doc");
But the above code does not extract all the text, in some cases where the text is using Calbiri and Cambria font it shows blank in the Extarcted.doc. It also does not copy the formulas in a pdf document. I have attached one of the formuls in the pdf document. The formula is not an image.
Can you please guide me what am I doing wrong.
I also tried the below code to change the font and save it as a new pdf file but it does not change the font.
TextFragmentAbsorber absorber = new TextFragmentAbsorber(new TextEditOptions(TextEditOptions.FontReplace.RemoveUnusedFonts));
pdfDocument.getPages().get_Item(1).accept(absorber);
for(TextFragment textFragment: (Iterable)absorber.getTextFragments())
{
if(textFragment.getTextState().getFont().getFontName().equalsIgnoreCase("Cambria") ||
textFragment.getTextState().getFont().getFontName().equalsIgnoreCase("Calibri") ||
textFragment.getTextState().getFont().getFontName().equalsIgnoreCase("Calibri-Bold")){
textFragment.getTextState().setFontStyle(FontStyle.CourierBold);
textFragment.getTextState().setFontSize(20);
}
}
pdfDocument.save("c:/documents/testdata/fontschanged1.pdf");
Regards,
Rajeev Mathur