Missing table content

yjsdfsdf · August 7, 2023, 10:23am

I use aspose PDF to extract the contents of the table of PDF, the code is as follows, I found that some of the contents of the table can not be extracted, such as the cell content can not be extracted。

private static String _dataDir = "/home/admin1/pdf-examples/Samples/";

    public static void Extract_Table()
    {
        // Load source PDF document
        Document pdfDocument = new Document(_dataDir + "上市保荐书.pdf");
        for(Page page : pdfDocument.getPages())
        {
            TableAbsorber absorber = new TableAbsorber();
            absorber.visit(page);
            for (AbsorbedTable table : absorber.getTableList())
            {
                for (AbsorbedRow row : table.getRowList())
                {
                    for (AbsorbedCell cell : row.getCellList())
                    {
                        TextFragmentCollection textFragmentCollection = cell.getTextFragments();
                        for (TextFragment fragment : textFragmentCollection)
                        {
                            String txt = "";
                            for (TextSegment seg : fragment.getSegments())
                                txt += seg.getText();
                            System.out.println(txt);
                        }
                    }
                }
            }
        }
    }

Force Table Re

上市保荐书.pdf (723.8 KB)

asad.ali · August 7, 2023, 6:39pm

@yjsdfsdf

Can you please share the screenshot of the content that you are not able to extract? We will test the scenario in our environment and address it accordingly.

yjsdfsdf · August 7, 2023, 11:16pm

The contents of the table inside the red box are not extracted.
Dingtalk_20230808071537.jpg (129.5 KB)

asad.ali · August 8, 2023, 11:37am

@yjsdfsdf

We are checking it and will get back to you shortly.

asad.ali · August 8, 2023, 6:27pm

@yjsdfsdf

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFJAVA-43013

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

asad.ali · August 30, 2023, 7:29pm

@yjsdfsdf

Please use our new table recognition engine (available with 23.8 version and is superior in numerous scenarios and is capable of recognizing tables without borders). This engine could be activated by the code below:

TableAbsorber absorber = new TableAbsorber();
absorber.setUseFlowEngine(true);