Please help me to find text in pdf file and get its pagenumber and line number.
To find text in a PDF file and retrieve its page number and line number using Aspose.PDF for .NET, you can utilize the TextFragmentAbsorber
class. This class allows you to search for specific text across all pages of a PDF document. Below is a step-by-step guide along with a code example to help you achieve this.
Steps to Find Text and Get Page and Line Numbers
-
Set Up Your Project: Create a new C# project and add a reference to the Aspose.PDF for .NET library.
-
Load the PDF Document: Use the
Document
class to load your PDF file. -
Create a TextFragmentAbsorber: Instantiate a
TextFragmentAbsorber
object with the text you want to search for. -
Accept the Absorber: Call the
Accept
method on thePages
collection of the document to search through all pages. -
Retrieve Text Fragments: Access the
TextFragments
property of the absorber to get the collection of matching text fragments. -
Loop Through the Fragments: Iterate through the fragments to get their properties, including the page number and line number.
Code Example
Here’s a sample code snippet demonstrating how to find text and get its page number and line number:
using System;
using Aspose.Pdf;
using Aspose.Pdf.Text;
namespace Aspose.Pdf.Examples
{
public class SearchTextInPdf
{
public static void Run()
{
// Load the PDF document
Document pdfDocument = new Document("path/to/your/document.pdf");
// Create a TextFragmentAbsorber to find instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("your search text");
// Accept the absorber for all pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
// Loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
// Get the page number
int pageNumber = textFragment.Page.Number;
// Get the line number (YIndent can be used to determine line position)
float yIndent = textFragment.Position.YIndent;
Console.WriteLine($"Text: {textFragment.Text}");
Console.WriteLine($"Page Number: {pageNumber}");
Console.WriteLine($"Line Y-Position: {yIndent}");
}
}
}
}
Explanation of the Code
- The
TextFragmentAbsorber
is initialized with the text you want to search for. - The
Accept
method is called on thePages
collection to search through all pages. - The
TextFragments
property provides access to the matching text fragments. - For each fragment, you can retrieve the page number and the Y position, which can help you determine the line number based on your layout.
This approach allows you to effectively search for text in a PDF and obtain relevant information about its location.
Additional Resources
For more detailed information, you can refer to the following documentation:
Feel free to ask if you have any further questions!
please share any pdf file where this code is working
I have file with extension .pdf but its showing chrome icon in front what that file work ?
textFragmentAbsorber.TextFragments is coming empty
i am able to do ny sending regex . but its bringing first match only
same as u shared
@Amisha43
absorber_test.pdf (35.4 KB)
Note that if you are using non-licensed product , TextFragmentAbsorber will process only first 4 pages
therefore in attached PDf you will get only two Fragments in non licensed scenario
The keyword I tested was “veli@dynamosoftware.com”
Line number is not coming right
Please help me in getting right line number