AsposePdf extract text coordinates

David_Matin · March 24, 2025, 6:47am

Is there an API that can get the coordinates of text in a PDF file?

David_Matin · March 24, 2025, 7:04am

In Python Api

Anastasia_Radtsevich · March 24, 2025, 6:04pm

@David_Matin
TextFragmentAbsorber class allows you to find text, matching a particular phrase, from all the pages of a PDF document. In order to search text from the whole document, you need to call the accept method of Pages collection. The accept method takes TextFragmentAbsorber object as a parameter, which returns a collection of TextFragment objects. You can loop through all the fragments and get their properties like text, position (x_indent, y_indent), font_name, font_size, is_accessible, is_embedded, is_subset, foreground_color, etc.
The following code snippet shows you how to search for text from all the pages:

import aspose.pdf as ap


input_file = "Input.pdf"
pdf_document = ap.Document(input_file)
# Create TextAbsorber object to find all instances of the input search phrase
text_fragment_absorber = ap.text.TextFragmentAbsorber("text")
# Accept the absorber for all the pages
pdf_document.pages.accept(text_fragment_absorber)
# Get the extracted text fragments
text_fragment_collection = text_fragment_absorber.text_fragments
# Loop through the fragments
for text_fragment in text_fragment_collection:
    print(f"Position : {text_fragment.position} ")
    print(f"XIndent : {text_fragment.position.x_indent} ")
    print(f"YIndent : {text_fragment.position.y_indent} ")