Search / highlight text from scanned document

Hello,


I’m using the free trial version of aspose pdf.
I’m trying to find and highlight words in a pdf document.
All is ok but when I want to use it for scanned document, the words are highlight but the text is no visible, it is under the yellow highlighted area…

I join you the pdf file.

Another question, When I’m searching for the word “win”, It highlight some end of words like “growing” and it doesn’t takes the word “Win” with the uppercase. Is this normal ?

Thank you.

nicolas.allerhand: I’m trying to find and highlight words in a pdf document.

I’m trying to find and highlight words in a PDF document.

I have tested it, and I am not able to reproduce it.

All is OK but when I want to use it for scanned documents, the words are highligted but the text is no visible, it is under the yellow highlighted area…

Hi Nicolas,

Thanks for contacting support.

I have tested the scenario and I am able to reproduce the same problem. For the sake of correction, I have logged it in our issue-tracking system as PDFNEWNET-39278. We will investigate this issue in detail and will keep you updated on the status of them.

Please text here my apology

We apologize for your inconvenience.

nicolas.allerhand: Another question: When I’m searching for the word “win”, it highlights some end of words like “growing” and it doesn’t take “Win” with the uppercase. Is this normal?

In order to select text in both uppercase and lowercase, please try using Regular Expressions

TextFragmentAbsorber absorber = new TextFragmentAbsorber("(?

Depending on the version your libraries, method or class names could have changed.

TextFragmentAbsorber absorber = new TextFragmentAbsorber("(?i)Win", new TextSearchOptions(true));

This corrects and avoids the problem with finding the words in uppercase and lowercase in scanned and text PDFs.

Hi Nayyer,


Thank to reply,

I’ll test the regular expression this morning.

By the way, in place of highlighting the text, is it possible to frame it ?

Thanks you.

Hi Nicolas,

Thanks for your inquiry. Yes, you can use annotations for highlighting your desired text. Please use the Highlight annotation for the purpose. Please check the following code snippet for the purpose, hopefully, it will help you to accomplish the task.

Document document = new Document(myDir + "20072015045240._1.pdf");

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("(?i)Win");

//set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;

document.Pages.Accept(textFragmentAbsorber);

TextFragmentCollection textFragmentCollection1 = textFragmentAbsorber.TextFragments;

foreach (TextFragment textFragment in textFragmentCollection1)
{
    Aspose.Pdf.InteractiveFeatures.Annotations.HighlightAnnotation freeText =
        new Aspose.Pdf.InteractiveFeatures.Annotations.HighlightAnnotation(
            textFragment.Page,
            new Aspose.Pdf.Rectangle(
                (float)textFragment.Position.XIndent,
                (float)textFragment.Position.YIndent,
                (float)textFragment.Position.XIndent + (float)textFragment.Rectangle.Width,
                (float)textFragment.Position.YIndent + (float)textFragment.Rectangle.Height
            )
        );
    freeText.Opacity = 0.5;
    freeText.Color = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Yellow);
    textFragment.Page.Annotations.Add(freeText);
}

document.Save(myDir + "texthighlight_output.pdf");

Please feel free to contact us for any further assistance.

Best Regards,

Thanks a lot, this resolved all my problems.


Have a good day

Hi Nicolas,


Thanks for your feedback. It is good to know that you have managed to accomplish your requirements.

Please keep using Aspose and feel free to contact us for any further assistance.We will be more than happy to extend our support.

Best Regards,