AbsorbedTable cannot extract the merged cells

Hi,
When I try to get the merged cells which positions are right by the other cells, Aspose pdf cannot extract into TableAbsorber, does there any solution for this problem? Looking forward to your response.

In this JPG, I found out that the cells circled by orange cannot extract by Aspose.

截圖 2025-04-29 下午5.08.10.jpg (310.4 KB)

Here is the sample pdf, thanks a lot
辦理壽險業務應行特別注意事項(113年2月修訂)-內網版-54-59辦理壽險業務應行特別注意事項(113年2月修訂)-內網版-54-59-1.pdf (354.3 KB)

@Rachel_Lee

It seems that you are experiencing issues with extracting merged cells using the Aspose PDF library’s TableAbsorber. This is a known challenge when dealing with tables that contain merged cells, as the TableAbsorber may not always correctly identify and extract these cells.

Here are a few suggestions that might help you resolve this issue:

  1. Use the Flow Engine: One potential solution is to enable the flow engine in the TableAbsorber. You can do this by setting the UseFlowEngine property to true. This can improve the recognition of tables and may help in extracting the merged cells more accurately. Here’s a code snippet to illustrate this:

    TableAbsorber absorber = new TableAbsorber();
    absorber.UseFlowEngine = true; // Enable the flow engine
    absorber.Visit(page); // Replace 'page' with your actual page object
    
  2. Check for Updates: Ensure that you are using the latest version of the Aspose PDF library. Sometimes, issues are resolved in newer releases, so updating to the latest version might provide better support for merged cells.

  3. Provide Feedback: If the problem persists, consider sharing a screenshot highlighting the merged cells along with the output you are currently receiving. This can help the support team understand the issue better and provide more targeted assistance.

  4. Explore Community Solutions: You may also want to check the Aspose community forums for similar issues. Other users may have encountered the same problem and could have shared their solutions or workarounds.

If you try these suggestions and still face difficulties, please feel free to reach out with more details, and I would be happy to assist further!

When I enable the flow engine, the row and column structure of the table becomes chaotic.
Is it possible to keep the table’s structure the same as when the flow engine is disabled, and still extract the cells?

@Rachel_Lee
I tried to reproduce on latest version with following

var input = InputFolder + "cell_extracting_issue.pdf";
var output = OutputFolder + "cell_extracting_issue.txt";

StringBuilder stringBuilder = new StringBuilder();
using (var document = new Aspose.Pdf.Document(input))
{
    foreach (var page in document.Pages)
    {
        Aspose.Pdf.Text.TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
        absorber.Visit(page);
        foreach (var table in absorber.TableList)
        {
            foreach (var row in table.RowList)
            {
                foreach (var cell in row.CellList)
                {
                    var textfragment = new Aspose.Pdf.Text.TextFragment();
                    TextFragmentCollection textFragmentCollection = cell.TextFragments;
                    foreach (var fragment in textFragmentCollection)
                    {
                        string txt = "";
                        foreach (var seg in fragment.Segments)
                        {
                            txt += seg.Text;
                        }
                        stringBuilder.AppendLine(txt);
                        stringBuilder.AppendLine("__________________________________");
                    }
                }
            }
        }
        File.WriteAllText(output, stringBuilder.ToString());

    }
}

and here’s result
cell_extracting_issue.zip (1.5 KB)
In text output file there’s fragments that occur in cells you mentioned as missing
Screenshot_1.png (5.7 KB)
Could you specify if issue is tied to them missing completly or the way it’s extracted is the problem?
I used version Aspose Pdf 25.2

I purchased a license that only supports the version up to 24.1. If I want to upgrade to version 25.2, can I exchange the license, or do I need to purchase a new one?

@Rachel_Lee
I haven’t heard about license exchanging so I suppose the only way is to repurchase a new one.