Hello,
we are facing issues on TableAbsorber, that can’t find a table in a very simple PDF with a paragraph and a two columns, 3 rows table.
Also the conversion to docx is not detecting the table, that is reproduced as lines and paragraphs only on docx.
Attached testing program Java sources and the PDF used.
Tested with Aspose.PDF 18.9.testasposepdf.zip (179.2 KB)
Many thanks for checking.
@renato.mauro
Thanks for contacting support.
We were able to notice both issues (i.e. API is not extracting table and Table is not rendered correctly after conversion to DOCX format) using following minimum code snippet:
Document doc = new Document(dataDir + "TestTable.pdf");
DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
doc.save(dataDir + "TestTable.docx", saveOptions);
Document pdfDocument = new Document(dataDir+"testtable.pdf");
// Create TableAbsorber object to find tables
TableAbsorber absorber = new TableAbsorber();
// Visit first page with absorber
absorber.visit(pdfDocument.getPages().get_Item(1));
System.out.println(absorber.getTableList().size());
We have logged both issues as following in our issue tracking system:
PDFJAVA-38078 - API is unable to extract table from PDF
PDFJAVA-38079 - PDF to DOCX - Table is not rendered correctly
We will further investigate both issues in details and keep you posted with the status of their correction. Please be patient and spare us little time.
We are sorry for the inconvenience.
The issues you have found earlier (filed as PDFJAVA-38079) have been fixed in Aspose.PDF for Java 21.12.
@gianfranco.dancelli
About PDFJAVA-38078, Please use the option:
absorber.setUseFlowEngine(true); // with 22.6 version of the API
This option activates an alternative engine that is able recognize complicated tables and tables without borders.