Extracting table data from a PDF document

Hi Support,

I have a requirement like identify table in pdf and read the data from the table. Is it possible through your api?i will explain clearly

step 1:- find the number of tables present in pdf

step 2:- read the data from the tables by row and column wise for each table

Thanks

Hi Surendra,


Thanks for your interest in our products.

Currently Aspose.Pdf.Kit for Java supports the feature to extract text, images, attachments and annotations from PDF document but I am afraid it does not support the feature to manipulate tables inside PDF document. For the sake of correction, I have logged this requirement as PDFNEWJAVA-33240 in our issue tracking system. The development team will further look into the details of this feature and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

At the moment, you can extract the contents of table objects by extracting the contents of PDF file. Please follow the instructions specified over Extract Text from PDF Document

Hi ,
I want Extract table data from a PDF document any update

Hi Osama,


Thanks for your inquiry. I am afraid the subjected feature is still not implemented as most of the PDF documents do not provide some mark for tables. Please share your sample PDF document here, if it is tagged PDF then we can look into it and will try to provide a solution.

We are sorry for the inconvenience caused.

Best Regards,

Hi Osama,

In addition to the above reply, you may convert PDF to excel using Aspose.Pdf for Java and later can extract row/column data from excel workbook using Aspose.Cells for Java. Please check the following Aspose.Cells documentation link for the purpose.

Please feel free to contact us for any further assistance.

Best Regards,