We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extract Text With Coordinates

Hello,

we want to extract the text of a PDF-Document.
For each word we also want the coordinates of the rectangle of the word.

This is possible with the following ‘absorber’:

TextFragmentAbsorber absorber = new TextFragmentAbsorber("\S+"), new TextSearchOptions(true));

The problem is the following: We also want to recognize when a new line starts.

The following absorber results in an exception:

TextFragmentAbsorber absorber = new TextFragmentAbsorber("(\n|\S+)"), new TextSearchOptions(true));

Do you know a (proper) way to solve the problem?

Thanks

Hi Alexander,


Thanks for your inquiry. While searching the text, you can get Rectangle property of TextSegment to get its coordinates. Furthermore to get new line information, I am afraid there is no direct option for it. However, when you searching text in PDF you can observe the LLY(Y coordinate of lower left corner) of the words from Rectangle property and identify the new line start.

Please feel free to contact us for any further assistance.

Best Regards,

Thanks for your answer