Extracting text with coordinates

Hello,

I want to extract the content with coordinates from the PDF document. Suppose if document contains "Table of Content" in first page, I want to get the text as "Table" with x-indent and y-indent.

Using pdfextractor I can get the content.

using TextFragmentAbsorber I can get coordinates.

Is there anyway to get content with coordinates?

Hi Chenna,


Thanks for your inquiry. You can get both text and coordinates from PDF document using TextFragmentAbsorber. Please check following documentation link for the purpose. It will help you to accomplish the task.


Please feel free to contact us for any further assistance.

Best Regards,

Thanks for the response. I tried implement the code to extract content with coordinates using above code. The Textfragment will be 0 for all the files.
Please let me know how to extract the content with coordinates.

I have attached sample pdf document which I have used to extract the content with coordinates.

Hi Chenna,


Thanks for your inquiry. Please check following code snippet to get text and its coordinates form the PDF document. Hopefully it will help you to accomplish the task.

//open document<o:p></o:p>

Document pdfDocument = new Document(myDir + "Table+of+content.pdf");

//create TextAbsorber object to find all the phrases matching the regular expression

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"[\S]+");

//set text search option to specify regular expression usage

TextSearchOptions textSearchOptions = new TextSearchOptions(true);

textFragmentAbsorber.TextSearchOptions = textSearchOptions;

//accept the absorber for all the pages

pdfDocument.Pages.Accept(textFragmentAbsorber);

//get the extracted text fragments

TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

//loop through the fragments

foreach (TextFragment textFragment in textFragmentCollection)

{

Console.WriteLine("Text : {0} ", textFragment.Text);

Console.WriteLine("Position : {0} ", textFragment.Position);

Console.WriteLine("LLX : {0} ", textFragment.Position.XIndent);

Console.WriteLine("LLY : {0} ", textFragment.Position.YIndent);

Console.WriteLine("URX : {0} ", textFragment.Position.XIndent+textFragment.Rectangle.Width);

Console.WriteLine("URY : {0} ", textFragment.Position.YIndent+textFragment.Rectangle.Height);

}

Please feel free to contact us for any further assistance.


Best Regards,

Thanks for the response. It’s working. I am able to get the text with coordinates.

Hi Cheena,


Thanks for your feedback. It is good to know that you have managed to accomplish your requirement.

Please keep using our API and feel free to ask any question or concern, we will be more than happy to extend our support.

Best Regards,