Please help me to find text in pdf file and get its pagenumber and line number

Please help me to find text in pdf file and get its pagenumber and line number.

@Amisha43

To find text in a PDF file and retrieve its page number and line number using Aspose.PDF for .NET, you can utilize the TextFragmentAbsorber class. This class allows you to search for specific text across all pages of a PDF document. Below is a step-by-step guide along with a code example to help you achieve this.

Steps to Find Text and Get Page and Line Numbers

  1. Set Up Your Project: Create a new C# project and add a reference to the Aspose.PDF for .NET library.

  2. Load the PDF Document: Use the Document class to load your PDF file.

  3. Create a TextFragmentAbsorber: Instantiate a TextFragmentAbsorber object with the text you want to search for.

  4. Accept the Absorber: Call the Accept method on the Pages collection of the document to search through all pages.

  5. Retrieve Text Fragments: Access the TextFragments property of the absorber to get the collection of matching text fragments.

  6. Loop Through the Fragments: Iterate through the fragments to get their properties, including the page number and line number.

Code Example

Here’s a sample code snippet demonstrating how to find text and get its page number and line number:

using System;
using Aspose.Pdf;
using Aspose.Pdf.Text;

namespace Aspose.Pdf.Examples
{
    public class SearchTextInPdf
    {
        public static void Run()
        {
            // Load the PDF document
            Document pdfDocument = new Document("path/to/your/document.pdf");

            // Create a TextFragmentAbsorber to find instances of the input search phrase
            TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("your search text");

            // Accept the absorber for all pages
            pdfDocument.Pages.Accept(textFragmentAbsorber);

            // Get the extracted text fragments
            TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

            // Loop through the fragments
            foreach (TextFragment textFragment in textFragmentCollection)
            {
                // Get the page number
                int pageNumber = textFragment.Page.Number;

                // Get the line number (YIndent can be used to determine line position)
                float yIndent = textFragment.Position.YIndent;

                Console.WriteLine($"Text: {textFragment.Text}");
                Console.WriteLine($"Page Number: {pageNumber}");
                Console.WriteLine($"Line Y-Position: {yIndent}");
            }
        }
    }
}

Explanation of the Code

  • The TextFragmentAbsorber is initialized with the text you want to search for.
  • The Accept method is called on the Pages collection to search through all pages.
  • The TextFragments property provides access to the matching text fragments.
  • For each fragment, you can retrieve the page number and the Y position, which can help you determine the line number based on your layout.

This approach allows you to effectively search for text in a PDF and obtain relevant information about its location.

Additional Resources

For more detailed information, you can refer to the following documentation:

Feel free to ask if you have any further questions!

please share any pdf file where this code is working

I have file with extension .pdf but its showing chrome icon in front what that file work ?

textFragmentAbsorber.TextFragments is coming empty

i am able to do ny sending regex . but its bringing first match only

@Amisha43
Please share code and document from which you’re trying to extract text fragments

same as u shared

@Amisha43
absorber_test.pdf (35.4 KB)

Note that if you are using non-licensed product , TextFragmentAbsorber will process only first 4 pages
therefore in attached PDf you will get only two Fragments in non licensed scenario
The keyword I tested was “veli@dynamosoftware.com”

Line number is not coming right

Please help me in getting right line number

@Amisha43
Could you describe for what purposes do you require line number?