We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

How to search multiple keywords in a PDF

Is there a way to search for multiple keywords in a PDF or DOCUMENT?
example
Paris AND Jackson AND Nepal AND (Trophy OR Award)
the above is to search for presence of Paris AND Jackson AND Nepal in one document and either of the two, Trophy OR award and non-case sensitive ?

Thanks

@Jackson94

Yes, you can search multiple keywords in PDF using Aspose.PDF. Please read the following article about searching text in PDF.
Search and Get Text from Pages of PDF

I visited the link but was unable to find a matching scenario.
Is there a specific example you could demonstrate to achieve the outcome ?

@Jackson94

You can specify regular expressions in order to get multiline text. Aspose.PDF identifies the line break and space with the expression “\s*”. Please check following code snippet to extract your particular phrase from the PDF:

Document pdfDocument = new Document(dataDir + "sample.pdf");
foreach (Page page in pdfDocument.Pages)
{
 var textFragmentAbsorber = new TextFragmentAbsorber(@"just\s*for\s*use\s*in\s*the\s*Virtual\s*Mechanics\s*tutorials.\s*More\s*text.\s*And\s*more\s*text\b");
 var textSearchOptions = new TextSearchOptions(true);
 textFragmentAbsorber.TextSearchOptions = textSearchOptions;
 page.Accept(textFragmentAbsorber);
 var textFragmentCollection = textFragmentAbsorber.TextFragments;
 // Perform other stuff
}

Multi line text implies a complete sentence separated by \s*.

If I want to search words on a page, example “text” and also “sample” to appear in a document explicitly and then only flag our the response ?
“text” can appear and also “sample” to appear on the same document and to flag out
can that be achieved using \s*.

Pl confirm ?

@Jackson94

You can use TextFragmentAbsorber constructor (Regex) and use a regex according to your requirement. You can find multiple keywords from the PDF. Following code example shows how to use it.

var regex = @"... regex for multiple keywords ";

Document pdfDocument = new Document(dataDir + "input.pdf");

Page page = pdfDocument.Pages[1];
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(regex);
var textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
page.Accept(textFragmentAbsorber);
var textFragmentCollection = textFragmentAbsorber.TextFragments;

foreach (var textFragment in textFragmentCollection)
{
    Console.WriteLine(textFragment.Text);
}

If you still face problem, please attach your input PDF and expected output here for our reference. We will then provide you more information about your query.