Need to find anything searched in the string using TextFragmentAbsorber class

Hi Support,
I am using Aspose.PDF.Text.TextFragmentAbsorber class to search text in the document. I have a scenario in which client needs to find text with braces () like (Mipo Office), or client need to find a text having Dot (.) in the end of word like Wong C. F.
Currently I am using following statement to find the FragmentCollections:

TextFragmentAbsorber textFragment = new TextFragmentAbsorber(@"\b" + phrase + @"\b");

But this statement does not fullfill the above explained scenario.
I have also tried the following statement:

TextFragmentAbsorber textFragment = new TextFragmentAbsorber(@"\b." + phrase + @"\b.");

But this also not covering the scenario.
Could you please guide me in this scenario.
Regards,

@Wahaj_Khan

Could you please share your sample PDF document with us. We will test the scenario in our environment and address it accordingly.

Hi @asad.ali
Please find the attached document with the scenarios:
TextFragmentIssue.pdf (37.9 KB)

@asad.ali
Please note I have also including word with line break in the attached document. Is TextFragmentAbsorber class supports searching of words with Line Breaks?

@Wahaj_Khan

We could not find these words in your shared PDF document. Could you please share the respective PDF document with us.

Also, in case of line break, you can use ‘\s*’ expression in the search phrase to find the complete word or sentence. In case you are not sure where the line break would occur, you can make the search phrase like following:

TextFragmentAbsorber tfa = new TextFragmentAbsorber(@"this\s*is\s*sentence\s*with\s*the\s*line\s*break", new TextSearchOptions(true));

Hi @asad.ali
The shared PDF is a test PDF containing sample texts against all the scenarios like (Sign Here), SignHere., Test. etc.
Regards

@Wahaj_Khan

We have used following code snippet to check what text API extracts from your PDF:

Document doc = new Document(dataDir + "TextFragmentIssue.pdf");
TextFragmentAbsorber absorber = new TextFragmentAbsorber(@"\b(Sign\s*Here)\b", new TextSearchOptions(true));
doc.Pages.Accept(absorber);
if(absorber.TextFragments.Count > 0)
{
 foreach(var tf in absorber.TextFragments)
 {
  Console.WriteLine(tf.Text);
 }
}

Above code snippet extracted 4 instances of “Sign Here” from your PDF Document. However, would you please let us know about your expected output like do you want to extract all instances of “Sign Here” or “Test” from the PDF document with single search operation OR you want to extract only one instance of either of them?

Hi @asad.ali
The shared PDF has different text scenarios like word with braces, words having Period (.) at different places, words with line break.
My expected output is to find the exact word which is being searched either it is a single instance or multiple For Example: If I search SignHere. then it should only extract all instance of exact word with Period (.) in the end.
Regards,

@Wahaj_Khan

You can get the desired results by using correct regular expressions. Please check the following code snippet that extracts one one instance of “SignHere.” from the PDF:

Document doc = new Document(dataDir + "TextFragmentIssue.pdf");
TextFragmentAbsorber absorber = new TextFragmentAbsorber(@"SignHere[.]", new TextSearchOptions(true));
doc.Pages.Accept(absorber);
if(absorber.TextFragments.Count > 0)
{
 foreach(var tf in absorber.TextFragments)
 {
  Console.WriteLine(tf.Text);
 }
}

Hi @asad.ali
Thanks for providing the solution. I have a confusion on one point that is the expression SignHere[.] could also be used to find words with braces for example (SignHere)?
Regards

@Wahaj_Khan

The expression to find (SignHere) would be a different one as following:

[(]SignHere[)]

@asad.ali
Is there any generic expression to find anything searched within the string. like the words with Braces or having Period (.) in the end as the cases discussed above?
Regards,

@Wahaj_Khan

The following regular expression will find all the instances of “Sign Here” with braces around or having period (.) at the end.

([(]Sign\s*Here[)])|SignHere[.]