Hello,
I want to extract the content with coordinates from the PDF document. Suppose if document contains "Table of Content" in first page, I want to get the text as "Table" with x-indent and y-indent.
Using pdfextractor I can get the content.
using TextFragmentAbsorber I can get coordinates.
Is there anyway to get content with coordinates?
Hi Chenna,
Thanks for your inquiry. You can get both text and coordinates from PDF document using TextFragmentAbsorber. Please check following documentation link for the purpose. It will help you to accomplish the task.
Please feel free to contact us for any further assistance.
Best Regards,
Thanks for the response. I tried implement the code to extract content with coordinates using above code. The Textfragment will be 0 for all the files.
Please let me know how to extract the content with coordinates.
I have attached sample pdf document which I have used to extract the content with coordinates.
Hi Chenna,
Thanks for your inquiry. Please check the following code snippet to get text and its coordinates from the PDF document. Hopefully, it will help you to accomplish the task.
//open document
Document pdfDocument = new Document(myDir + "Table+of+content.pdf");
//create TextAbsorber object to find all the phrases matching the regular expression
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"[\S]+");
//set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
//get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
Console.WriteLine("Text : {0} ", textFragment.Text);
Console.WriteLine("Position : {0} ", textFragment.Position);
Console.WriteLine("LLX : {0} ", textFragment.Position.XIndent);
Console.WriteLine("LLY : {0} ", textFragment.Position.YIndent);
Console.WriteLine("URX : {0} ",textFragment.Position.XIndent+textFragment.Rectangle.Width);
Console.WriteLine("URY : {0} ",textFragment.Position.YIndent+textFragment.Rectangle.Height);
}
Please feel free to contact us for any further assistance.
Best Regards,
Thanks for the response. It’s working. I am able to get the text with coordinates.
Hi Cheena,
Thanks for your feedback. It is good to know that you have managed to accomplish your requirement.
Please keep using our API and feel free to ask any question or concern, we will be more than happy to extend our support.
Best Regards,