Calculate Height and Coordinates of each word in PDF document

Hello Everyone,

We have a use case where we would need to calculate the height of each letter or each word and coordinates of word position in pdf document to make sure the text is readable to user. If Aspose PDF tool can provide the functionality, Please suggest us or if there is any reference to fulfill the requirement.

your help is highly appreciated.

Thanks & Regards,
Raviteja Marasu

@raviteja0470

I request you to try this code and share your feedback.

// Open document
Document pdfDocument = new Document(dataDir + "ExtractTextPage.pdf");

TextFragmentAbsorber tfa = new TextFragmentAbsorber();
pdfDocument.Pages.Accept(tfa);
TextFragmentCollection tfc = tfa.TextFragments;
foreach (TextFragment tf in tfc)
{
    Console.WriteLine(tf.Position);
}

Thanks you for your response, I have tried the code and getting position coordinates x and y values. I think I have not written the requirement properly. The requirement is to calculate the height of the letter or can be word. when we see text in PDF, I would need to get the rectangle that contains the text(word or letter) and from that rectangle, I believe we can calculate the height.

Could you help me to extract the rectangles of the text in PDF document?

@raviteja0470

You can get the rectangular coordinates for the text using tf.Rectangle property in above code sample.

Thanks you for your response, I got now the height of text from tf.rectangle property. I have ran into one issue. I tried to read one pdf document and every time, it only reads the first two or three lines only and then showing the remaining text fragments as nulls. Am I doing something wrong, Please help me.

code snippet is as follows,
Document pdfDocument = new Document(pdf path);

        TextFragmentAbsorber tfa = new TextFragmentAbsorber();
        pdfDocument.Pages[2].Accept(tfa);
        List<TextFragment> tfc = tfa.TextFragments.ToList();

Please try with attached pdf
sample.pdf (176.0 KB)

@raviteja0470

I can not reproduce the issue with latest version as the list contains 94 items without any null text value. Make sure to set the license and share your feedback. You can request a 30-day Temporary License. Please refer to How to get a Temporary License .

Hi,

I don’t see any option to apply for temporary license, Could you guide me to get temporary license or do we have to pay to get temporary license also? after adding the Aspose.PDF product, in License dropdown I don’t see temporary license option.

Appreciate your support

Thanks,
Raviteja Marasu

@raviteja0470

You do not need to pay for temporary license. It is a time-restricted full license that lets you test every aspect of a product before buying it. You can request one after step 5 of the Get Pricing Information wizard. Please refer to Get a temporary license button in attached snapshot.

License.PNG (50.2 KB)