Table Extraction from pdf

If there is a larger gap between two rows when converting to Excel, we must have empty rows or the same cell padding format to appropriately identify the rows. Is there a way or alternatives in aspose.pdf to excel?

@sathish.sundaresan

Are you doing a pdf -> Excel conversion? Why then are we talking about TableAbsorber ?
It would be nice if you could attach the document and the code snippet you used.

Hi sergei,

Please see the attachment for the document and c# code file we used for the pdf
table absorber. Please help me to extract the table from the document.

05107_1 1.pdf (466.7 KB)
33333.pdf (106.7 KB)

ExtractMarkedtable.pdf (6.3 KB)

Thanks
M.S.Sathish
9176398138

Hi sergei,

Please see the attachment for the document and c# code file we used for the pdf
table absorber. Please help me to extract the table from the document.

05107_1 1.pdf (466.7 KB)
33333.pdf (106.7 KB)

ExtractMarkedtable.pdf (6.3 KB)

Thanks
M.S.Sathish
9176398138

Hi Team,

Please see the attachment for the document and c# code file we used for the pdf
table absorber. Please help me to extract the table from the document.

05107_1 1.pdf (466.7 KB)
33333.pdf (106.7 KB)

ExtractMarkedtable.pdf (6.3 KB)

Thanks
M.S.Sathish
9176398138

@sathish.sundaresan
Thank you, I will study the information provided and write to you tomorrow.

@sathish.sundaresan
In document 05107_1 1.pdf, all symbols are paths (i.e., drawn graphically). Accordingly, as is, it is not converted to Excel and is not processed by TableAbsorber.

To extract tables from document 33333.pdf I used the following code

var pdfDocument = new Document(dataDir + "05107_1 1.pdf");
var tableAbsorber = new TableAbsorber();

tableAbsorber.UseFlowEngine = true;

// Visit first page with absorber
tableAbsorber.Visit(pdfDocument.Pages[1]);

foreach (AbsorbedTable table in tableAbsorber.TableList)
{
    Console.WriteLine("_______ Table __________");
    foreach (AbsorbedRow row in table.RowList)
    {
        foreach (AbsorbedCell cell in row.CellList)
        {
            string text = "|";
            foreach (TextFragment textFragment in cell.TextFragments)
            {
                text += textFragment.Text;
            }
            Console.Write(text + '|');
        }
        Console.WriteLine("---------------------------------------------------");
    }
}

As you can see from the attached screenshot 1.png (6.7 KB), the result of the work is correct.

Thanks sergei.shibanov. let me check with this solution if any issues let you know

@sathish.sundaresan
Yes, sure

Hi,

Thank you for the solution. It is now working properly, but I have another issue: when we use table absorber, the cells are split into two since there is a space in the data. Is there a property to handle this space or merge cells depending on the gap or column header value? Please check my attached document and let me know if anything is needed.
MicrosoftTeams-image (4).png (6.9 KB)
04129_2.pdf (204.6 KB)

Thanks
M.S.Sathish

Hi sergei.shibanov,

Any update for my issue please let me know asap.

Thanks
M.S.Sathish

@sathish.sundaresan
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-56763

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hi Sergey.

Is there a way to check whether an absorbed table has a border or not? If so, how can we use that function in our code? Please use the same code that I previously gave.

Thank you
M.S. Sathish.

@sathish.sundaresan
Perhaps TableAbsorber uses information about existing boundaries when analyzing, but it does not output anything outside.

Hi sergei,

I understand, but if you can indicate whether the table has a full border or a partial border, it will help us fine-tune our logic to extract the correct table from the absorber. Please verify and let me know which property tells whether the border is enabled or not.

thanks
M.S.Sathish

@sathish.sundaresan
I talked to the development team - there is no such option now.
I’ll create a task for them to provide table border information from AbsorberCell. As they said, if this task will be in the category from a user with a purchased license, then for the useFlowEngine mode they will do it quickly enough.

@sathish.sundaresan
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-56829

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

1 Like

Hi,

We acquired an aspose.total license, but I need a solution to the problem; if you wish, Please check my login information for the purchased license.

My Login name: parthiban.veerappan@ant.works

Thanks
M.S.Sathish

@sathish.sundaresan
For both tasks, I checked that the user had purchased a license. Regarding the task PDFNET-56829 Add the ability to get information about borders when working with TableAbsorber, I wrote to the development team - they won’t have time for version 24.04, but most likely in 24.05.

The issues you have found earlier (filed as PDFNET-56829) have been fixed in Aspose.PDF for .NET 24.5.