We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extract tables with merged cells from pdf using Aspose JAVA

I tried to extract tables from a pdf file which contain merged cells but I couldn’t have the correct results, please find here my source code.

package aspose;

import com.aspose.pdf.*;

public class App 
    public static void main( String[] args )
        Document doc = new Document("RIVP000C8E3B.pdf");

        try {
            TableAbsorber absorber = new TableAbsorber();
            PageCollection pc = doc.getPages();
            for(Page pg:pc){

                com.aspose.pdf.internal.ms.System.Collections.Generic.IGenericList<AbsorbedTable> l = absorber.getTableList();
                for(AbsorbedTable table:l){

                    com.aspose.pdf.internal.ms.System.Collections.Generic.IGenericList<AbsorbedRow> r = table.getRowList();
                    for(AbsorbedRow row:r)

                        com.aspose.pdf.internal.ms.System.Collections.Generic.IGenericList<AbsorbedCell> c = row.getCellList();
                        for(AbsorbedCell cell:c)

                            for(TextFragment tf:cell.getTextFragments())
                                for(TextSegment ts:tf.getSegments())
        } catch (Exception e) {
// TODO Auto-generated catch block


Thank you for your help.


Could you please share your sample PDF document with us. We will test the scenario in our environment and address it accordingly.

Thank you very much for your reply, please find an example of my file format and the original source code.
I want to extract all the tables in the file (without the text between them or the footer or the heading or the page number). But I got unexpected results in the file attached within sourceCode.zip.
Thank you again for helping me.
Test.pdf (225.3 KB)
sourceCode.zip (218.7 KB)


Thanks for sharing requested files.

The API extracts the table from PDF document in a way it was added at the time of PDF generation. We have noticed that the PDF was created using MS Word and API was unable to extract text correctly from the table cells. The sequence of the extracted cells and its text was not correct.

Therefore, we have logged an investigation ticket as PDFJAVA-38850 in our issue tracking system. We will further look into details of this issue and keep you posted with the status of its resolution. Please be patient and spare us little time.

We are sorry for the inconvenience.

Thank you very much for your support, I will wait for your new results because I really want to use your API.
Best regards.


The issue has just been logged in our issue tracking system and it has low priority. We will investigate it on first come first serve basis and will surely let you know about investigation result. Please spare us little time.

I can’t download source code. Can you help me please?
image.png (1.4 KB)


You are not thread owner which is why you are unable to download the source code. You can download it from here.

thank you very much!!! :blush::blush::blush::blush:

1 Like