Can't read the complex table cells from that TableAbsorber

A complex table.pdf (57.6 KB)
2023-03-28_215434.jpg (139.5 KB)
Hi,
I’m trying to read cell-text from the complex table in a pdf. But many results are incorrect or misplaced.Am I doing something wrong or are there issues with…? Using the Aspose.Pdf v12.0 dll for .Net, this is a .Net3.5 project. This is my code:

    private void button1_Click(object sender, EventArgs e)
    {
        Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"C:\Users\Administrator\Desktop\PdfToTxtDemo\VBTest\Files\测试表格.pdf"); 

private void button1_Click(object sender, EventArgs e)
{
	Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"C:\Users\Administrator\Desktop\PdfToTxtDemo\Files\A complex table.pdf"); 

	var nCount = pdfDocument.Pages.Count + 1;
	int nRow = 0, nCell = 0;
	string sCellText;
	bool bValidTable= true;

	for (int i=1; i < nCount; i++)
	{
		Aspose.Pdf.Page page = pdfDocument.Pages[i];
		Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
		absorber.Visit(page);                

		foreach (AbsorbedTable table in absorber.TableList)
		{
			nRow = 0;
			//Debug.Print("Table Rect: "+table.Rectangle.ToString());
			foreach (AbsorbedRow row in table.RowList)
			{
				/*if (row.CellList.Count < 5)
				{
					bValidTable = false;
					break;;
				}*/

				bValidTable = true;
				nCell = 0;
				nRow++;
				Debug.Print("Row[" + nRow.ToString() + "]");

				foreach (AbsorbedCell cell in row.CellList)
				{                            
					nCell++;                            
					sCellText = "";
					foreach (TextFragment fragment in cell.TextFragments)
					{
						/*
						var sb = new StringBuilder();
						foreach (TextSegment seg in fragment.Segments)
						{
							sb.Append(seg.Text);
						}
						sCellText += sb.ToString();
						*/
						sCellText += fragment.Text;
					}

					Debug.Print("    Cell[" + nCell.ToString() + "]:" +  sCellText);// cell.Rectangle.ToString() + "," + sCellText);
				}
			}

			if (bValidTable)
			{
				Debug.Print("==============================================================================================");
			}                    
		}
	}
	pdfDocument.Dispose();
	pdfDocument = null;
	MessageBox.Show("OK");
}

Some of the results of the code run are as follows, but they are incorrect.

Row[2]
Cell[1]:1 Project1 I’m trying to
Cell[2]:
Cell[3]:replace text in a pdf. It needs to keep the font used on the text, especially if it’s an embedded font.
Cell[4]: Kg 100 200 300
Cell[5]:
Cell[6]:
Cell[7]:
Cell[8]:
Cell[9]:

Thanks,
Shayoo

@Shayoo,

Sadly the old version of Aspose does not have support. Since even if there is an error, it won’t be corrected there.

The code seems fine. It is probably an issue with that specific version of the API.

Sadly, I cannot ask you to upgrade since your framework is limited to not using the latest one.

That version of the API is more than ten years old. If you are ever able to upgrade your framework, then I can provide support for you on our current version of the API, which is 23.3.

Thanks. I’ll try downloading the latest trial version of the Aspose.pdf DLL and test reading this complex table.

1 Like