TableAbsorber doesn't recognize all tables data

lorenzo.fadda · September 23, 2020, 8:55am

Goodmorning everybody,
I need to extract some table data from PDF documents and to do this I tried to use Aspose.PDF 20.9 for .NET with a temporary license.
Specifically, I tried the “TableAbsorber” object and the first problem I’m having is that some information seems not to be intercepted: the right column of the first table on page 1 and the contents of the first row of the second table (only number 1 is intercepted).
PDF_FirstTable_Page _1.JPG (9.3 KB)
Extracted_Data_FirstTable_Page_1.JPG (4.8 KB)
PDF_SecondTable_Page5.JPG (18.7 KB)
Extracted_Data_SecondTable_Page_5.JPG (11.1 KB)

Also I wanted to know if there is a way to extract these tables in DataTable format; this would help a lot in parsing operations.

I have prepared a Windows Form Application containing the logic I am using and 3 PDF files that are giving me the same result (the PDF documents are in the “bin \ InputFiles” folder)
AsposeExtractTables.zip (194.6 KB)

Could you kindly help me in understanding how to handle these problems encountered?
Thank you in advance
Lorenzo

asad.ali · September 23, 2020, 8:47pm

@lorenzo.fadda

We have tested the scenario in our environment and were able to notice that API was not extracting all data of the tables. Therefore, we have logged an issue as PDFNET-48813 in our issue tracking system for the sake of correction. We will further look into its details and keep you informed about its rectification status. Please be patient and spare us some time.

Regretfully, this feature is not yet present in the API. A feature request has been logged in our issue tracking system as PDFNET-48814. We will surely let you know as soon as it is completely investigated for feasibility and available for use.

We apologize for the inconvenience.

lorenzo.fadda · October 15, 2020, 12:08pm

Hi Asad.Ali,
do you have any news about these requests?

Thank you for your work,
Lorenzo

asad.ali · October 15, 2020, 8:49pm

@lorenzo.fadda

The ticket were recently logged in our issue management system and we are afraid that they are not yet investigated. They will be analyzed and resolved on first come first serve basis and we will surely inform you as soon as we make some certain progress towards their resolution. Please have patience and give us some time.

We apologize for the inconvenience.