Aspose.PDF Table Extraction - No TextFragments in Cells

austin.wilson427 · April 5, 2024, 4:44pm

I am doing a very basic table extraction from a PDF with well-defined tables. No matter what I try, I never get data back in cell.TextFragments for any cell. Is there anything I need to do differently to extract the actual text. I do get the correct number of tables, rows, and cells. Just never any text.

public async Task ExtractTableData()
{
Aspose.Pdf.License license = new Aspose.Pdf.License();

        license.SetLicense("C:\\Users\\awilson\\Desktop\\Aspose.PDF.NET.lic");

        var filePath = "C:\\Users\\awilson\\Desktop\\86853449-11db-4aa7-8c3e-c278b11d6bbc_51.pdf";

        Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(filePath);
        foreach (var page in pdfDocument.Pages)
        {
            Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
            absorber.Visit(page);
            foreach (AbsorbedTable table in absorber.TableList)
            {
                Console.WriteLine("Table");
                foreach (AbsorbedRow row in table.RowList)
                {
                    foreach (AbsorbedCell cell in row.CellList)
                    {
                        foreach (TextFragment fragment in cell.TextFragments)
                        {
                            var sb = new StringBuilder();
                            foreach (TextSegment seg in fragment.Segments)
                                sb.Append(seg.Text);
                            Console.Write($"{sb.ToString()}|");
                        }
                    }
                    Console.WriteLine();
                }
            }
        }

        return true;
    }

asad.ali · April 5, 2024, 10:45pm

@austin.wilson427

Can you please share your sample PDF document with us so that we can test the scenario in our environment and address it accordingly.

austin.wilson427 · April 6, 2024, 2:53pm

Attached is a basic file example. Thanks for your help!
Blank2.pdf (104.9 KB)

asad.ali · April 6, 2024, 9:14pm

@austin.wilson427

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-56971

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.