when Reading Table using TableAbsorber in Aspose.pdf. facing the issue reading when table have Merged cells it give it an Another Table
I have an OneTable It showed an More than One table I Attached the single Table PDF
Aspose merged table red as multiple table.pdf (281.3 KB)
I tried this Code using Aspose.pdf
public static void Extract_Table()
{
// Load source PDF document
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"c:\tmp\the_worlds_cities_in_2018_data_booklet 7.pdf");
foreach (var page in pdfDocument.Pages)
{
Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
absorber.Visit(page);
foreach (AbsorbedTable table in absorber.TableList)
{
foreach (AbsorbedRow row in table.RowList)
{
foreach (AbsorbedCell cell in row.CellList)
{
TextFragment textfragment = new TextFragment();
TextFragmentCollection textFragmentCollection = cell.TextFragments;
foreach (TextFragment fragment in textFragmentCollection)
{
string txt = "";
foreach (TextSegment seg in fragment.Segments)
{
txt += seg.Text;
}
Console.WriteLine(txt);
}
}
}
}
}
}
I created this sample code:
private void Logic()
{
Document doc = new Document($"{PartialPath}_input.pdf");
foreach (var page in doc.Pages)
{
Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
absorber.Visit(page);
int countTable = 0;
int countRow = 0;
int countCell = 0;
int countSegment = 0;
foreach (AbsorbedTable table in absorber.TableList)
{
countTable++;
countRow = 0;
foreach (AbsorbedRow row in table.RowList)
{
countRow++;
countCell = 0;
foreach (AbsorbedCell cell in row.CellList)
{
TextFragment textfragment = new TextFragment();
TextFragmentCollection textFragmentCollection = cell.TextFragments;
foreach (TextFragment fragment in textFragmentCollection)
{
fragment.Text = $"T{countTable};R:{countRow};C:{countCell}";
fragment.TextState.FontSize = 12;
fragment.TextState.ForegroundColor = Color.Black;
//fragment.Segments.Clear();
//countSegment = 0;
//foreach (TextSegment seg in fragment.Segments)
//{
// seg.Text = $"S:{countSegment}";
//}
}
}
}
}
}
doc.Save($"{PartialPath}_output.pdf");
}
The code is simple but it is supposes to help see the tables and it did not worked properly. So I will be creating a bug for the dev team.
@vijayanathan
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFNET-53877
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.