Convert Table in PDF to HTML Table

Raven_Hex · November 27, 2014, 3:59am

Hi,

If a PDF contains a table (text within can be copy and pasted), will the table retain its HTML structure after converting the PDF to HTML?

Regards,
Raven

tilal.ahmad · November 28, 2014, 12:55am

Hi Raven,

Thanks for your inquiry. Please note in PDF to HTML table data is converted as text and borders as image in HTML, kindly check PDF to HTML documentation for code snippet/details. Moreover, can you please share some more detail about your requirements so we will look into it and will guide you accordingly?

Best Regards,

Raven_Hex · November 30, 2014, 8:55am

Thank you for your response,

I have a requirement to convert PDF tables for data extraction.

Most of the tables follow a simple row-column layout.

HTML is chosen because we have a XSLT application to perform data transformation quickly.

Depending on how much effort is required, extracting the information directly from the PDF is still within consideration.

tilal.ahmad · November 30, 2014, 10:24pm

Hi Raven,

Thanks for your feedback. In addition to your existing approach of converting PDF to HTML for table data extraction, you may covert PDF to excel using Aspose.Pdf for .NET and later can extract row/column data from excel workbook using Aspose.Cells for .NET. Please check following Aspose.Cells documentation link for the purpose.

Please feel free to contact us for any further assistance.

Best Regards,