We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extract text from PDF file based on Matching Text

Hello team, we are using Aspose DLL in my project.
i need to extract text form PDF page based on matching text.
example: if the matching text is “Random”, TextFragmentAbsorber object should return fragments like ''Randomised".

Please help me to solve this issue.

Thanks,
Regards,
Vijaykumar L S

@vijiannabond

Thanks for contacting support.

You may please use TextFragmentAbsorber using suitable regular expressions. For example, if you need to extract all words containing ‘Random’ in them, you may use following code snippet:

var like = @"(?<TM>\w*ex\w*)";
var textFragmentAbsorber = new TextFragmentAbsorber(like);
var textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
Document pdfDocument = new Document(dataDir + "EmailAddress.pdf");
pdfDocument.Pages.Accept(textFragmentAbsorber);
var textFragmentCollection = textFragmentAbsorber.TextFragments;
foreach (TextFragment textFragment in textFragmentCollection)
{
 // do some stuff
}

In case you face any issue, please share your sample PDF document with us. We will test the scenario in our environment and address it accordingly.

Thanks for your reply…I achived this.

I need one more help…
if searchable text containes two lines of the PDF page, i want to get same text from the PDF page using text FragmentAbsorber.

Thanks

@vijiannabond

Thanks for getting back to us.

In case you need to extract/find multi-line as well as single line sentence/phrase from PDF page, you may please use TextFragmentAbsorber using following regular expressions:

var textFragmentAbsorber = new TextFragmentAbsorber(@"(?i)the\s*sentence\b");