We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

PDF Table Extraction from Bank Statement

Hi, when I tried to extract the the table from the bank statement it does extract the data but it is not aligned in a row based on the PDF attached.
bankstatement_pg1.pdf (231.3 KB)
How do I align the extracted data based on the PDF Bank Statement?

@jssauva

Thanks for contacting support.

Would you kindly share the sample code snippet that you have used to extract the table data from shared PDF. Also, please share an expected output as well (if possible) so that we can test the scenario in our environment and address it accordingly.

Here’s a snippet of my code:

Stream strm = new MemoryStream(fuPdfUpload.FileBytes);
var pdfdoc = new Document(strm);

string fragCombined = "";
TextFragment txtFrag;
var absrb = new TableAbsorber();
int rwIndex = 0;
var sb = new StringBuilder();

sb.Append("<table>");

for (int pgCount = 1, loopTo = pdfdoc.Pages.Count; pgCount <= loopTo; pgCount++)
{
    absrb.Visit(pdfdoc.Pages(pgCount));

    if (pgCount == 1)
    {
        sb.Append("<tr>");
        for (int x = 0, loopTo1 = absrb.TableList(0).RowList(0).CellList.Count - 1; x <= loopTo1; x++)
            sb.Append("<th>" + (x + 1) + "</th>");
        sb.Append("</tr>");
    }
}

for (int tbCount = 0, loopTo2 = absrb.TableList.Count - 1; tbCount <= loopTo2; tbCount++)
{
    for (int rwCount = 0, loopTo3 = absrb.TableList(tbCount).RowList.Count - 1; rwCount <= loopTo3; rwCount++)
    {
        sb.Append("<tr>");
        for (int clCount = 0, loopTo4 = absrb.TableList(tbCount).RowList(rwCount).CellList.Count - 1; clCount <= loopTo4; clCount++)
        {
            fragCombined = "";
            for (int FgCount = 1, loopTo5 = absrb.TableList(tbCount).RowList(rwCount).CellList(clCount).TextFragments.Count; FgCount <= loopTo5; FgCount++)
            {
                txtFrag = absrb.TableList(tbCount).RowList(rwCount).CellList(clCount).TextFragments(FgCount);
                fragCombined = fragCombined + txtFrag.Text + "</br>";
            }
            sb.Append("<td>" + fragCombined + "</td>");
        }
        rwIndex += 1;
        sb.Append("</tr>");
    }
}


sb.Append("</table>");

dtGrid.Text = sb.ToString();

}

Result: Screen Shot 2020-02-01 at 11.34.10 AM.png (176.9 KB)

What I need: Screen Shot 2020-02-01 at 11.53.44 AM.png (53.1 KB)

@jssauva

We are testing the scenario and will get back to you shortly.

Got this using an online PDF to EXCEL converter. I don’t know if Aspose.pdf can do the same.

Screen Shot 2020-02-04 at 2.11.53 PM.png (179.2 KB)

@jssauva

Aspose.PDF for .NET offers the feature to convert PDF to XLS/XLSX as well. You may also try using this feature if it can satisfy your requirements. Please share your feedback with us so that we can further proceed accordingly.