Problem with use regular expression for highlight text

I am trying to highlight the search text in a PDF file. I have not had trouble finding simple sentences, but when I use the regular expression * at the end of a word.

For example, if I search for "tr *" in:

"I am trying to highlight the search text in a PDF file. I have not HAD trouble"

I hope as a result, "trying" and "trouble", and actually I also highlighting all the words with "t" as "highlight", "text", "not" .....

How the operator "*" is used in regular expressions using TextFragmentAbsorber?
How I can specify that recognizes the exact pattern besides the "*" as in my example "tr*"?

My code is:

Document docPdf = new Document(new MemoryStream(binaryFile));
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("(?i)tr*");
//set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
//accept the absorber for all the pages
docPdf.Pages.Accept(textFragmentAbsorber);
//get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
//highlight background text
textFragment.TextState.BackgroundColor = System.Drawing.Color.Yellow;
}
docPdf.Save(stream);


I hope I was clear with my question ... I hope your answer!
Thanks

Hi Silvana,


Thanks for contacting support.

Please share the resource PDF file, so that we can further investigate the scenario in our environment. We are sorry for this inconvenience.

Hi Silvana,

Thanks for your inquiry. Please check the following code snippet to search text, starting with “tr” and ending on non-whitespace characters, using a regular expression. It will help you to accomplish the task.

Document docPdf = new Document();
MemoryStream ms = new MemoryStream();
Page page = docPdf.Pages.Add();
TextFragment text = new TextFragment("I am trying to highlight the search text in a PDF file. I have not HAD trouble");
page.Paragraphs.Add(text);
docPdf.Save(ms);
docPdf = new Document(ms);

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"(?i)tr\S*");

//set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;

//accept the absorber for all the pages
docPdf.Pages.Accept(textFragmentAbsorber);

//get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
    //highlight background text
    textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.Yellow;
}

docPdf.Save(myDir +"regularexpression.pdf");

Please feel free to contact us for any further assistance.

Best Regards,

Forgiveness does not mention that he wanted the pattern is the beginning of a word, I solved using:


… new TextFragmentAbsorber(@"(?i)\btr\w+\b") …

Thank you very much for your help !!!

Hi Silvana,


Thanks for your feedback. It is good to know that you have managed to resolve the issue.

Please keep using our Aspose.Pdf and feel free to ask any question or concern, we will be glad to extend our support.

Best Regards,