How to extract Table data from a pdf

j2k18 · December 4, 2018, 8:50pm

From the given PDF how to extract the table as is and get all keys and values of the table?
I tried with some previous examples from Aspose forum but i’m only getting the keys of that table,and if the key /values having multiple lines they are coming as 2-3 lines rather i want all the text inside a cell to be treated as a single String,also the data is not in a proper format and i’m not getting all the data of that table
please help on this,
thanks

ps: i’m also applying the temporary license for Aspose.pdf

Farhan.Raza · December 5, 2018, 7:05am

@j2k18

Thank you for contacting support.

You may extract text from a table with TableAbsorber as explained in Manipulate tables in existing PDF, or you can iterate through each row and each cell as per your requirements.

Document pdfDocument = new Document(dataDir + "Test.pdf");
TableAbsorber absorber = new TableAbsorber();
absorber.Visit(pdfDocument.Pages[1]);
foreach (AbsorbedTable table in absorber.TableList)
{
    foreach (AbsorbedRow row in table.RowList)
    {
        foreach (AbsorbedCell cell in row.CellList)
        {
            TextFragment textfragment = new TextFragment();
            TextFragmentCollection textFragmentCollection = cell.TextFragments;
            foreach (TextFragment fragment in textFragmentCollection)
            {
                Console.WriteLine(fragment.Text);
            }
        }
    }
}

We hope this will be helpful. In case you still face any problem then share your code snippet with us while elaborating the issue, so that we may investigate further to help you out.