Parse table in existing PDF

knjeckil · January 22, 2020, 4:51pm

Hello,

We are trying to parse a table from a PDF file using the TableAbsorber class, but the contents of specific columns are missing in the absorber.

Here are the PDF files and the results of the parsing process:

Test.pdf
Result.pdf

We used the following code snippet:

Document pdfDocument = new Document(dataDir + "Max.pdf");

TableAbsorber absorber = new TableAbsorber();

absorber.Visit(pdfDocument.Pages[1]);

foreach (AbsorbedTable table in absorber.TableList)
{
    foreach (AbsorbedRow row in table.RowList)
    {
        foreach (AbsorbedCell cell in row.CellList)
        {
            foreach (TextFragment text in cell.TextFragments)
            {
                Console.Write(text.Text + " ");
            }
        Console.Write("|");
        }
    Console.WriteLine("-------------------------------------------");
    }
    Console.WriteLine("===========================================");
}

Have you tested the PDF automation after installing the latest version?

Adnan.Ahmad · January 22, 2020, 9:29pm

@knjeckil,

Can you please share desired result in form of sample so that we may further investigate to help you out.

knjeckil · January 23, 2020, 8:58am

Please find attached expected result.
ExpectedResult.pdf (59.4 KB)

Adnan.Ahmad · January 23, 2020, 8:01pm

@knjeckil,

Thanks for sharing requested file.

I have observed your issue and like to inform that I have created investigation ticket with ID PDFNET-47610 in our issue tracking system to investigate and resolve this issue as soon possible.