How to annotate letter or word by giving particular paragraph in Aspose.PDF

manipriya · March 23, 2022, 12:27pm

I have 2 paragraphs in a pdf

For example :

The PDF standard describes various tools that allow you to create an additional content on the page. In this topic we will talk about markup annotation tools. Markup annotations points in specific way to some places on the page and contains additional text info.

Most types of the markup annotations support two states: open and closed. Closed state means that annotation will appear on the page in some distinctive form such as an icon, box, stamp, etc. When the user activates the annotation by clicking it, it exhibits its associated object (annotation goes to the open state).

lets take a word -

and

It is present in both the paragraphs but I would like to annotate only in 1st paragraph by giving the 1st paragraph as input.

Please suggest.

asad.ali · March 23, 2022, 7:17pm

@manipriya

Could you please share a sample input along with expected output PDF? We will try to create a code sample for you to achieve results similar to expected output file and share with you.

manipriya · March 24, 2022, 10:36am

Here in the below pdf i have highlighted “is” in the first paragraph.
so is will be there in whole document but it should highlight in 1st paragraph only.

Inputs to the PDF are one is paragraph and other is markup text
forexample : markupPdf(paragraph, actualtext)
1st paragraph -
Adobe® Portable Document Format (PDF) is a universal file format that preserves all
of the fonts, formatting, colours and graphics of any source document, regardless of
the application and platform used to create it.
2nd actualText -
is

if actualtext is is present in multiple places in paragraph, do we have anything like sentences that takes exact match. means like inputs will be 3 -> paragraph, sentence, is

MarkupPdf.pdf (11.3 KB)

right now im using the following code where it takes inputtext and finds the text in whole document

 var textFragmentCollection = TextFragmentCollection(parameters.InputText, pdfDocument);
                if (textFragmentCollection.Count == 0)
                {
                    return false;
                }

                foreach (var textFragment in textFragmentCollection)
                {
                    var freeText = new HighlightAnnotation(textFragment.Page, textFragment.Rectangle)
                    {
                        Color = Color.FromRgb(System.Drawing.Color.Yellow),
                        Name = Properties.Resources.AnnotationName,
                        Title = Properties.Resources.AnnotationTitle,
                        Contents = parameters.InputText.Replace(Properties.Resources.Search, " "),
                    };
                    textFragment.Page.Annotations.Add(freeText);
                }



     private static TextFragmentCollection TextFragmentCollection(string inputText, Document document)
    {
        var text = inputText.Replace(" ", Properties.Resources.Search);
        ParagraphAbsorber absorber = new ParagraphAbsorber();
        absorber.Visit(document);
        foreach (PageMarkup markup in absorber.PageMarkups)
        {

            TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(text, new TextSearchOptions(true));
            document.Pages.Accept(textFragmentAbsorber);
        
        //    var textFragmentAbsorber = new TextFragmentAbsorber(text, new TextSearchOptions(true));
        //foreach (var page in document.Pages)
        //{
        //    textFragmentAbsorber.Visit(page);
        //}
        //document.Pages.Accept(textFragmentAbsorber);
        var count = textFragmentAbsorber.TextFragments.Count;

        if (count > 0)
        {
            return textFragmentAbsorber.TextFragments;
        }
        }
        var textFragmentAbsorberWithoutOptions = new TextFragmentAbsorber(inputText, new TextSearchOptions(false));
        foreach (var page in document.Pages)
        {
            textFragmentAbsorberWithoutOptions.Visit(page);
        }
        //document.Pages.Accept(textFragmentAbsorberWithoutOptions);
        return textFragmentAbsorberWithoutOptions.TextFragments;
    }

asad.ali · March 24, 2022, 8:37pm

@manipriya

Please try using the below code example in order to achieve your requirement and feel free to let us know in case you face any issues:

Document pdfDocument = new Document(dataDir + "MarkupPdf.pdf");
// Instantiate ParagraphAbsorber
ParagraphAbsorber absorber = new ParagraphAbsorber();
absorber.Visit(pdfDocument);
int paragrap = 2;
string word = "is";

foreach (PageMarkup markup in absorber.PageMarkups)
{
 int i = 1;
 foreach (MarkupSection section in markup.Sections)
 {
  int j = 1;

  foreach (MarkupParagraph paragraph in section.Paragraphs)
  {
   if (i == paragrap)
   {
    foreach (List<TextFragment> line in paragraph.Lines)
    {
     TextFragment foundText = line.Where(x => x.Text.Equals(word)).FirstOrDefault();
     if (foundText != null)
     {
      var freeText = new HighlightAnnotation(foundText.Page, foundText.Rectangle)
      {
       Color = Color.FromRgb(System.Drawing.Color.Yellow),
       Name = Properties.Resources.AnnotationName,
       Title = Properties.Resources.AnnotationTitle,
       Contents = parameters.InputText.Replace(Properties.Resources.Search, " "),
      };
      foundText.Page.Annotations.Add(freeText);
     }
    }
   }
   j++;
  }
  i++;
 }
}

manipriya · March 25, 2022, 8:32am

asad.ali:

Document pdfDocument = new Document(dataDir + "MarkupPdf.pdf");
// Instantiate ParagraphAbsorber
ParagraphAbsorber absorber = new ParagraphAbsorber();
absorber.Visit(pdfDocument);
int paragrap = 2;
string word = "is";

foreach (PageMarkup markup in absorber.PageMarkups)
{
 int i = 1;
 foreach (MarkupSection section in markup.Sections)
 {
  int j = 1;

  foreach (MarkupParagraph paragraph in section.Paragraphs)
  {
   if (i == paragrap)
   {
    foreach (List<TextFragment> line in paragraph.Lines)
    {
     TextFragment foundText = line.Where(x => x.Text.Equals(word)).FirstOrDefault();
     if (foundText != null)
     {
      var freeText = new HighlightAnnotation(foundText.Page, foundText.Rectangle)
      {
       Color = Color.FromRgb(System.Drawing.Color.Yellow),
       Name = Properties.Resources.AnnotationName,
       Title = Properties.Resources.AnnotationTitle,
       Contents = parameters.InputText.Replace(Properties.Resources.Search, " "),
      };
      foundText.Page.Annotations.Add(freeText);
     }
    }
   }
   j++;
  }
  i++;
 }
}

here paragraph is int(taking paragraph number).
any possibility to have paragraph as text instead of taking of paragraph number as input.
If there are number of pages how can we get the paragraph number.

please suggest

manipriya · March 25, 2022, 10:08am

This may work when a line is given as input instead of word

from the above code it tries to check the line with word which will not be found.

Is there any in terms of words instead of lines(sentense).

the letter “is” is not found with the code provided.

Thanks for the solution but Please check once again .

asad.ali · March 25, 2022, 4:41pm

@manipriya

foreach (MarkupParagraph paragraph in section.Paragraphs)
{
 string ptext = paragraph.Text;
}

The above code gives the text blocks in a paragraph which can be used to identify the paragraph where highlighting a word is required.

In that case you can further search for the phrase using TextFragmentAbsorber like below:

foreach (TextFragment fragment in line)
{
 TextFragmentAbsorber tabsorber = new TextFragmentAbsorber(word);
 tabsorber.TextSearchOptions.LimitToPageBounds = true;
 tabsorber.TextSearchOptions.Rectangle = fragment.Rectangle;
 fragment.Page.Accept(tabsorber);
 var textf = tabsorber.TextFragments.FirstOrDefault();
}