Extract & Save tables in Pdf document

Babu007 · January 20, 2023, 6:07am

Hi Team,

I am using The following code for Extracting the tables from Pdf
public static void Extract_Table()
{
// Load source PDF document
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@“c:\tmp\the_worlds_cities_in_2018_data_booklet 7.pdf”);
foreach (var page in pdfDocument.Pages)
{
Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
absorber.Visit(page);
foreach (AbsorbedTable table in absorber.TableList)
{
foreach (AbsorbedRow row in table.RowList)
{
foreach (AbsorbedCell cell in row.CellList)
{
TextFragment textfragment = new TextFragment();
TextFragmentCollection textFragmentCollection = cell.TextFragments;
foreach (TextFragment fragment in textFragmentCollection)
{
string txt = “”;
foreach (TextSegment seg in fragment.Segments)
{
txt += seg.Text;
}
Console.WriteLine(txt);
}
}
}
}
}
}

But, Unable to save AbsorbedTable to new file I need to save only tables from Source document.pdf (216.4 KB)
to output document(An Empty one). Please help me.

asad.ali · January 20, 2023, 5:34pm

@Babu007

Do you want to convert achieved document into other office formats later as well? OR you only need to extract table and save it as a separate PDF document?

Babu007 · January 23, 2023, 7:39am

Hi,
Thank You, For fast response. If Possible, To convert various formats I am glad to known about it. But I prefer to save as Pdf document after extraction. Basically, One way or Another I want to separate/copy tables(only) from pdf document to another document.

asad.ali · January 23, 2023, 6:35pm

@Babu007

We are checking it and will get back to you shortly.

asad.ali · January 23, 2023, 7:06pm

@Babu007

Please check the supported formats section in the API documentation in order to convert the PDF document into other file formats. You will need to extract non-table data using ParagraphAbsorber Class and remove it so that only tables are left in the PDF. Once it is achieved, you can convert the file into any other file format.

Concerning to the moving table to a new PDF document, this requirement needs investigation for its feasibility. We have opened the following new ticket(s) in our internal issue tracking system and will analyze its feasibility according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-53548

You can obtain Paid Support services if you need support on a priority basis, along with the direct access to our Paid Support management team.