A complex table.pdf (57.6 KB)
2023-03-28_215434.jpg (139.5 KB)
Hi,
I’m trying to read cell-text from the complex table in a pdf. But many results are incorrect or misplaced.Am I doing something wrong or are there issues with…? Using the Aspose.Pdf v12.0 dll for .Net, this is a .Net3.5 project. This is my code:
private void button1_Click(object sender, EventArgs e)
{
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"C:\Users\Administrator\Desktop\PdfToTxtDemo\VBTest\Files\测试表格.pdf");
private void button1_Click(object sender, EventArgs e)
{
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"C:\Users\Administrator\Desktop\PdfToTxtDemo\Files\A complex table.pdf");
var nCount = pdfDocument.Pages.Count + 1;
int nRow = 0, nCell = 0;
string sCellText;
bool bValidTable= true;
for (int i=1; i < nCount; i++)
{
Aspose.Pdf.Page page = pdfDocument.Pages[i];
Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
absorber.Visit(page);
foreach (AbsorbedTable table in absorber.TableList)
{
nRow = 0;
//Debug.Print("Table Rect: "+table.Rectangle.ToString());
foreach (AbsorbedRow row in table.RowList)
{
/*if (row.CellList.Count < 5)
{
bValidTable = false;
break;;
}*/
bValidTable = true;
nCell = 0;
nRow++;
Debug.Print("Row[" + nRow.ToString() + "]");
foreach (AbsorbedCell cell in row.CellList)
{
nCell++;
sCellText = "";
foreach (TextFragment fragment in cell.TextFragments)
{
/*
var sb = new StringBuilder();
foreach (TextSegment seg in fragment.Segments)
{
sb.Append(seg.Text);
}
sCellText += sb.ToString();
*/
sCellText += fragment.Text;
}
Debug.Print(" Cell[" + nCell.ToString() + "]:" + sCellText);// cell.Rectangle.ToString() + "," + sCellText);
}
}
if (bValidTable)
{
Debug.Print("==============================================================================================");
}
}
}
pdfDocument.Dispose();
pdfDocument = null;
MessageBox.Show("OK");
}
Some of the results of the code run are as follows, but they are incorrect.
Row[2]
Cell[1]:1 Project1 I’m trying to
Cell[2]:
Cell[3]:replace text in a pdf. It needs to keep the font used on the text, especially if it’s an embedded font.
Cell[4]: Kg 100 200 300
Cell[5]:
Cell[6]:
Cell[7]:
Cell[8]:
Cell[9]:
Thanks,
Shayoo