Issue in Extracting Text using Text Device

Hi,I am using the following code snippet to extract text.

Document pdfDocument = new Document("c:/documents/testdata/Entire_Proposal_1410329.pdf");

//Text Device

TextDevice textDevice = new TextDevice();

//Extraction Options

TextExtractionOptions extractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);

textDevice.setExtractionOptions(extractionOptions);

textDevice.process(pdfDocument.getPages().get_Item(4), "c:/documents/testdata/Extracted.doc");

But the above code does not extract all the text, in some cases where the text is using Calbiri and Cambria font it shows blank in the Extarcted.doc. It also does not copy the formulas in a pdf document. I have attached one of the formuls in the pdf document. The formula is not an image.

Can you please guide me what am I doing wrong.

I also tried the below code to change the font and save it as a new pdf file but it does not change the font.

TextFragmentAbsorber absorber = new TextFragmentAbsorber(new TextEditOptions(TextEditOptions.FontReplace.RemoveUnusedFonts));

pdfDocument.getPages().get_Item(1).accept(absorber);

for(TextFragment textFragment: (Iterable)absorber.getTextFragments())

{

if(textFragment.getTextState().getFont().getFontName().equalsIgnoreCase("Cambria") ||

textFragment.getTextState().getFont().getFontName().equalsIgnoreCase("Calibri") ||

textFragment.getTextState().getFont().getFontName().equalsIgnoreCase("Calibri-Bold")){

textFragment.getTextState().setFontStyle(FontStyle.CourierBold);

textFragment.getTextState().setFontSize(20);

}

}

pdfDocument.save("c:/documents/testdata/fontschanged1.pdf");

Regards,

Rajeev Mathur

Hi Rajeev,


Thanks for your inquiry. Please share your sample PDF document here, so we will test the scenario at our end and will guide you accordingly.

We are sorry for the inconvenience caused.

Best Regards,