I use aspose PDF to extract the contents of the table of PDF, the code is as follows, I found that some of the contents of the table can not be extracted, such as the cell content can not be extracted。
private static String _dataDir = "/home/admin1/pdf-examples/Samples/";
public static void Extract_Table()
{
// Load source PDF document
Document pdfDocument = new Document(_dataDir + "上市保荐书.pdf");
for(Page page : pdfDocument.getPages())
{
TableAbsorber absorber = new TableAbsorber();
absorber.visit(page);
for (AbsorbedTable table : absorber.getTableList())
{
for (AbsorbedRow row : table.getRowList())
{
for (AbsorbedCell cell : row.getCellList())
{
TextFragmentCollection textFragmentCollection = cell.getTextFragments();
for (TextFragment fragment : textFragmentCollection)
{
String txt = "";
for (TextSegment seg : fragment.getSegments())
txt += seg.getText();
System.out.println(txt);
}
}
}
}
}
}
上市保荐书.pdf (723.8 KB)