Extract table data from PDF

n.b.vijayakumaraccen · March 14, 2016, 3:07am

Hi Team,

I have a requirement to extract table data as a table from PDF.Can you please help us in achieving this.

tilal.ahmad · March 15, 2016, 2:36am

Hi there,

Thanks for your inquiry. Please check following documentation link to access a table in existing PDF document and extract text accordingly. Hopefully it will help you to accomplish the task.

Manipulate tables in existing PDF document.

However if there is any difference in your requirement and my understanding then please share some more details, so we will guide you accordingly.

Best Regards,

ishan.mehta065 · June 1, 2017, 4:02am

Hi Tilal Ahmad,

i tried the link given by you but it is not very helpful it is reading the table rows, cells but not giving data.

Please help

asad.ali · June 1, 2017, 11:50am

Hi Ishan,

Thanks for contacting support.

Please check following code snippet which I have used to test the scenario with one of my sample PDFs and was unable to notice the issue. For your reference, I have attached an input document as well.

Document pdfDocument = new Document(“document_with_table_out.pdf”);

TableAbsorber absorber = new TableAbsorber();

absorber.Visit(pdfDocument.Pages[1]);

// Extract Data From First Row, First Cell

TextFragment fragment = absorber.TableList[0].RowList[0].CellList[0].TextFragments[1];

Console.WriteLine(fragment.Text);

For more information you may visit “Manipulate Table in Existing PDF” article which contains updated information regarding table extraction and in case if you still face any issue, please share your input document so that we can test the scenario in our environment and address it accordingly.

Best Regards,

ishan.mehta065 · June 1, 2017, 11:44pm

Hi Asad ,

Thanks for your response but already tried this code:

Document pdfDocument = new Document("document_with_table_out.pdf");

TableAbsorber absorber = new TableAbsorber();
absorber.Visit(pdfDocument.Pages[1]);

// Extract Data From First Row, First Cell
TextFragment fragment = absorber.TableList[0].RowList[0].CellList[0].TextFragments.FirstOrDefault();

if (fragment != null)
{
    Console.WriteLine(fragment.Text);
}
else
{
    Console.WriteLine("No text found in the specified cell.");
}

the link you have mentioned i visited that but problem is still same.

Please help

regards

Ishan

asad.ali · June 2, 2017, 6:43am

Hi Ishan,

Thanks for writing back.

I believe you have similar inquiry in “Extract table from pdf” Forum Thread. We will definitely provide you feedback in the relevant thread accordingly.

Best Regards,