Unable to get data of table

Hi,

i have attached screenshot of sample table unable to get content of table.
i am using aspose PDF version 10.9.0.

please share the sample code of c#

image.png (7.8 KB)

@jitendra1

Would you kindly use latest version of the API i.e. Aspose.PDF for .NET 19.9 and in case you still face any issue, please share your sample PDF document along with complete code snippet you are using. We will test the scenario in our environment and address it accordingly.

I have already test aspose latest of the API i.e. Aspose.PDF for .NET 19.9, but we have face same issues please suggest and share the sample code in c#.net

@jitendra1

Kindly share the sample source PDF document along with sample code snippet with us. We will test the scenario in our environment and address it accordingly.

This is my sample code with source PDF document.DataFile.pdf (28.5 KB)

public void readPDFtable()
        {
            var doc = new Aspose.Pdf.Document("C:/Users/vimlesh/Downloads/DataFile.pdf");
            var absorber = new Aspose.Pdf.Text.TableAbsorber();
            Aspose.Pdf.Generator.Pdf pdf1 = new Aspose.Pdf.Generator.Pdf();
            // Aspose.Pdf.Table gettable=(Aspose.Pdf.Table)doc.Pages[1].Paragraphs;
            Aspose.Pdf.Generator.Table tab1 = new Aspose.Pdf.Generator.Table();
            //tab1.Top(doc.Pages[1]);
            //          int i=tabletextdata.Rows.Count;
            long cpt = 0;
            Console.WriteLine("Begin");
            Aspose.Pdf.License license = new Aspose.Pdf.License();
            license.SetLicense("XXXXXXXXXXXXXXX");
            //Aspose.Pdf.Table table = (Aspose.Pdf.Table)doc.GetChild(NodeType.Table, 0, true);
            //NodeCollection allTables = doc.GetChildNodes(NodeType.Table, true);
            //int tableIndex = allTables.IndexOf(table);
            foreach (Aspose.Pdf.Page page in doc.Pages)
            {
                try
                {
                    absorber.Visit(page);
                    //Tableaux
                    for (int idTable = 0; idTable < absorber.TableList.Count; idTable++)
                    {
                        Aspose.Pdf.Text.AbsorbedTable table = absorber.TableList[idTable];

 

                        //ligne
                        for (int idRow = 0; idRow < table.RowList.Count; idRow++)
                        //foreach (AbsorbedRow row in table.RowList)
                        {
                            Aspose.Pdf.Text.AbsorbedRow row = table.RowList[idRow];

                            //cellule
                            foreach (Aspose.Pdf.Text.AbsorbedCell cell in row.CellList)
                            {
                                foreach (Aspose.Pdf.Text.TextFragment text in cell.TextFragments)
                                {
                                    //if(text.Text == "Aegis Infotech Private Limited")
                                    //{ 
                                    //cpt = cpt + 1;
                                    Console.WriteLine(cpt.ToString() + " - " + text.Text);
                                    //}
                                    //Console.ReadLine();
                                }
                            }
                            //Console.Read();
                        }
                    }
                }
                catch (Exception)
                {
                    continue;
                }
            }
            Console.WriteLine("End");
            Console.Read();
        }

@jitendra1

Could you please try following code snippet and see if you are able to get desired results. Please share your feedback with us so that we may proceed further to assist you accordingly.

Document pdfDocument = new Document(dataDir + "DataFile.pdf");
            TableAbsorber absorber = new TableAbsorber();
            absorber.Visit(pdfDocument.Pages[1]);

            foreach (AbsorbedTable table in absorber.TableList)
            {
                foreach(AbsorbedRow row in table.RowList)
                {
                    Console.WriteLine("----------------");
                    for(int i = 0; i < row.CellList.Count; i++)
                    {
                        foreach(TextFragment text in row.CellList[i].TextFragments)
                        {
                            //if (text.Text.StartsWith("ABC"))
                            //{
                                Console.Write(text.Text);
                                Console.WriteLine("\n");
                                //break;
                            //}
                        }
                    }
                }
            }

Hi,

i have tried you code bu i have face same issues. i have shared out put of code please help on border table read.DataFile.pdf (28.5 KB)
i have also share the sample file.

please help on same issues.

MicrosoftTeams-image (2).png (4.8 KB)
MicrosoftTeams-image (1).png (3.8 KB)

@jitendra1

Do you want to extract the table values which are written in Blue Color? Would you please confirm that our understandings are correct so that we can proceed accordingly.

i want compete table value every cell those are showing in blue color also .please help on same.

@jitendra1

We also noticed that API was unable to extract all data from PDF and also, the sequence of extracted text was incorrect. Hence, we have logged an issue as PDFNET-47135 in our issue tracking system for the sake of correction. We will further look into details of it and keep you posted with the status of its correction. Please be patient and spare us little time.

We are sorry for the inconvenience.

Able to read table not able to read data from each cell of table. it always shows “Enumeration yielded no results”

@Amol_Hekade

Could you please share your sample PDF document. We will test the case in our environment and address it accordingly.

AccumTermsheet_LGT(UBS).pdf (193.5 KB)
food.pdf (151.1 KB)

Please find attached pdfs, help on solution.Error.png (112.1 KB)

Is it possible to have a call for discussion in detail. It will solve my problem on a priority basis and it will also fix the problem if it is present in the library.

Able to read table not able to read data from each cell of table. it always shows “Enumeration yielded no results” while reading data from cell. And this recognize part of datable not full table which is spitted across two pages.

If possible to have call then please tell me, What time we can have a call.

@Amol_Hekade

I’m sorry, we do not provide phone support under free support. The phone support is only provided under paid Enterpise Support or Business Support.

However, we’re going to investigate the issue you shared in detail and will update you soon. Please note that under free support the issues are handled on first come first serve basis, but we’ll try to prioritize the investigation on this one.

Thanks for your reply. Please try to investigate as early as possible.

@Amol_Hekade

We have tested the scenario in our environment while using Aspose.PDF for .NET 20.10 and following code snippet:

Document pdfDocument = new Document(dataDir + "food.pdf");
foreach (var page in pdfDocument.Pages)
{
 Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
 absorber.Visit(page);
 foreach (AbsorbedTable table in absorber.TableList)
 {
  foreach (AbsorbedRow row in table.RowList)
  {
   foreach (AbsorbedCell cell in row.CellList)
   {
    TextFragment textfragment = new TextFragment();
    TextFragmentCollection textFragmentCollection = cell.TextFragments;
    foreach (TextFragment fragment in textFragmentCollection)
    {
     string txt = "";
     foreach (TextSegment seg in fragment.Segments)
     {
      txt += seg.Text;
     }
     Console.WriteLine(txt);
    }
   }
  }
 } 
}

The API was able to extract table data from one of the PDFs i.e. AccumTermsheet_LGT(UBS).pdf. Furthermore, it detected a table occurrence inside food.pdf but was unable to extract any text. Hence, an issue has been logged in our issue tracking system for this particular file as PDFNET-48939. We will further look into its details and keep you informed about its rectification status. Please give us some time.

We apologize for the inconvenience.

PS: We tested the scenario by applying a valid license as well. Please make sure that you are using the API with a license. In case you do not have one, please try applying for 30-days free temporary license to evaluate the API without any limitation.

Thanks for your prompt reply, Waiting for PDFNET-48939 fixes.

Also need to do one more thing i.e. if table is divided into 2 pages must be read as single table not two different tables. It causes problems while parsing the pdf. It works in word parsing but not in pdf. Also look into this on priority,

Please try to setup call need to discuss more scenario like this.