We’re continuing to evaluate the PDF Product. We’re using it primarily to extract table data. One of the tables (See attached) has rows which contain split cells. When I use the TableAbsorber and loop thru the resulting TextFragmentCollection, only the ‘top’ text is contained in the collection. Is it possible to get both the top and bottom of the split cell?
The pdf document is here: https://eaip.isavia.is/A_04-2024_2024_07_11/
(Select the pdf icon in the upper left. The table is on page 1 (BIAR AD 2.2)
SplitTableCells.png (49.5 KB)
thanks,
Pat
@pmaneely
We are afraid that we could not find the PDF at given link. Can you please share direct download link for the PDF?
@pmaneely
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFNET-57635
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
@pmaneely
The text of the “split” cells is actually extracted by the table absorber, but the default recognition engine is unable to properly position it, so all these cells are pushed to the end of the table.
Please, use the TableAbsorber.UseFlowEngine property and see if the result satisfies their requirements:
Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
// Enable the Flow recognition engine
absorber.UseFlowEngine = true;
//absorber.Visit(page);
absorber.Visit(pdfDocument.Pages[5]);
foreach (AbsorbedTable table in absorber.TableList)
{
//TextStamp stamp = new TextStamp("Table Header Para");
//stamp.XIndent = table.Rectangle.URY;
//page.AddStamp(stamp);
foreach (AbsorbedRow row in table.RowList)
{
foreach (AbsorbedCell cell in row.CellList)
{
//TextFragment textfragment = new TextFragment("Hello World");
//cell.TextFragments.Add(textfragment);
TextFragmentCollection textFragmentCollection = cell.TextFragments;
foreach (TextFragment fragment in textFragmentCollection)
{
Console.WriteLine(fragment.Text);
}
//Console.WriteLine("Cell");
}
//Console.WriteLine("Row");
}
}