Free Support Forum - aspose.com

Not able to Find text from pdf file

Try to find the text from pdf. but I am not able to find the text.
My finding text is “2c9f80f373a331b80173a338a0db0007_SIGNATURE” in pdf file.
I can able to find “2c9f80f373a331b80173a338a0db0007_TITLE” text but when it is comming some data in next line as like “2c9f80f373a331b80173a338a0db0007_SIGNATURE” that time code is not able to find the text.
And “2c9f80f373a331b80173a338a0db0007_SIGNATURE” text is hiidden in my pdf file.
I attached code, pdf, and converted Docx file for reference.
PFA
Test.zip (60.3 KB)

@asad.ali
can you please help me with this.

@rabinintig

The text inside your PDF is as

“2c9f80f373a331b80173a338a0db0007_SIGNATU
RE”

There is a line break in the text. Please use following code snippet to extract it:

Document pdfDocument = new Document(dataDir + "newtest.pdf");
TextFragmentAbsorber absorber = new TextFragmentAbsorber("2c9f80f373a331b80173a338a0db0007_SIGNATU\\s*RE\\b");
absorber.setTextSearchOptions(new TextSearchOptions(true));
pdfDocument.getPages().accept(absorber);
TextFragmentCollection textFragmentCollection = absorber.getTextFragments();
textFragmentCollection.size();
System.out.println(textFragmentCollection.size());