Hello,
I have an issue when trying to extract text from PDF document. The text is in cells of a table.
I then used the TableAbsorber to iterate though the tables, rows and cells, and I get text, but in some cells, the text retrieved is truncated.
My code:
Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
absorber.Visit(pdfDocument.Pages[1]);
foreach (AbsorbedTable table in absorber.TableList)
{
foreach (AbsorbedRow row in table.RowList)
{
foreach (AbsorbedCell cell in row.CellList)
{
TextFragment textfragment = new TextFragment();
TextFragmentCollection textFragmentCollection = cell.TextFragments;
foreach (TextFragment fragment in textFragmentCollection)
{
Console.WriteLine(fragment.Text);
}
}
}
}
I also upload the pdf I have to extract: list-gis-non-eu-countries-protected-in-eu_en.pdf (1.1 MB)
When extracting, for example, instead of getting the word “Bulqizë” of the first cell of the tab, I get “zë”.
And all text from the cells seems shatered…
Do you have any idea why is it so ?
Regards,
Jerome