Table Absorber - Absorbed table rows are merged in one

Hi Team,

We recently purchased an Aspose.total license and are experiencing an issue with the Table absorber functionality. Despite having enough space between rows, all rows are merged into a single row. Please check this issue and provide a solution as soon as possible.

11850_New.pdf (143.2 KB)

Thanks,
M.S.Sathish.

@parthiveera
Could you share code you using to achieve described result?

Hi ilya,

Please see the attached document and c# code file that we used for the PDF table absorber. Please assist me in extracting the table from the document, with proper row separation.

ExtractMarkedtable.pdf (6.3 KB)

11850_New.pdf (143.2 KB)

Thanks
M.S.Sathish
9176398138

@parthiveera
Could you describe what result you expect from Table absorber output?
It seems that TableAbsorber extracts multiple rows as small tables and it doesn’t look like single row merge
coordinates of extracted TextFragments also seem correct

Hi @ilyazhuykov ,

Please see the attached document for the expected rows to extract from the document.
result.PNG (15.0 KB)

Thanks
M.S.Sathish

@parthiveera
Ok, I’ll investigate is it possible to extract table as three rows as on picture
and write you as soon as there will be any news

@parthiveera
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-56996

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hi ilya.zhuykov

When may I expect an update to this issue? Only based on the update will I be able to develop my logic in my application, therefore please provide it as soon as possible.

Thanks
M.S.Sathish

@parthiveera
For now unfortunately I can’t exactly say when this issue will be resolved
I added it with maximum priority and mark of paid customer, so it should be prioritized by development team
If I found some workaround or there will be any news I’ll contact you as soon as possible

@parthiveera
I found workaround that could resolve your problem
Try the following code:

Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
absorber.UseFlowEngine = true;
absorber.Visit(page);

The output seems close to what you expected to get

Hi ilya.zhuykov,

We have already attempted this workaround, however we are not getting the intended results; all rows are coming in separately row by row. Please see my attached result from the table absorber.

current_result_after enabling useflowengine.PNG (18.7 KB)

expected results.PNG (15.4 KB)

Thanks
M.S.Sathish

@parthiveera
Could you share version of Aspose.PDF you are using?
I tried on latest version the following code

var input = InputFolder + "table_sample.pdf";
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(input);
foreach (Page page in pdfDocument.Pages)
{
    Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
    absorber.UseFlowEngine = true;
    absorber.Visit(page);
    foreach (AbsorbedTable table in absorber.TableList)
    {
        foreach (AbsorbedRow row in table.RowList)
        {
            Console.WriteLine("Row");
            Console.WriteLine();
            foreach (AbsorbedCell cell in row.CellList)
            {
                var fragments = cell.TextFragments.OrderBy(x => x.Position.XIndent);
                foreach (TextFragment fragment in fragments)
                {
                    string txt = "";
                    foreach (TextSegment seg in fragment.Segments)
                    {
                        txt += seg.Text;
                    }
                    Console.Write(txt);
                    Console.Write(" ");
                }
            }
            Console.Write(" ");
            Console.WriteLine();
        }
    }
}

and result differs from your version
output_24.4.png (8.7 KB)
Maybe it could be resolved by upgrading version

The issues you have found earlier (filed as PDFNET-56996) have been fixed in Aspose.PDF for .NET 24.7. This message was posted using Bugs notification tool by asad.ali

@parthiveera
FlowEngine was updated, please check results using the following settings:

absorber.UseFlowEngine = true;