Cannot read Pdf table with row and column span

Hi,
I am using Aspose.PDF to read a file containing a table (see the attachment) and convert the result into Markdown. However, I noticed that the ColSpan and RowSpan attributes are not accurately detected.

Here are my settings:

  1. Using AbsorbedTable with useFlowEngine = true.
  2. Aspose.PDF version: 23.4.

This is my expected result:

交易代號及名稱 變更項目 檢核對象
要保人
檢核對象
被保人
檢核對象
受益人 ( 法 人、團體名稱 )
檢核對象
法定 ( 監護 / 輔助 ) 代 理人、臨櫃代理人
檢核對象
負責人、實質受益 人、高階管理人員
3313/3312 要保書基本資料 建檔 / 要保書基本資料建檔 3366 新立暫收保費 ( 轉帳 )
3611 變更契約中文資訊 1. 要保人更名 7. 要保人變更 Ⅴ 檢核新 要保人 ( 姓 名 ) 及原要 保人 ( 姓名 )
3611 變更契約中文資訊 2. 被保人更名
3611 變更契約中文資訊 3.4.5.8.9.A.B. C.D.E.H.I 變更 生存 / 滿期 / 理賠 受益人變更 ( 更名 ) Ⅴ 檢核新受益 人 ( 姓名 ) 及原 受益人 ( 姓名 )
3321/3322 撤保 ( 補入 )
3323/3324 申請終止 ( 補入 ) Ⅴ (91.12.31 以前成立契 約 )
3530/3529 保單借款 ( 補入 )
3535/3536 契約還款 ( 補入 )
3620 保險單線上借款申請 申請
3325/3326 滿期給付 ( 補入 )

However, Aspose extracts the table from the PDF into three AbsorbedTables, as shown below.
When Aspose encounters merged cells, it converts them into a separate table.

交易代號及名稱 變更項目 檢核對象
3313/3312 要保書基本資料 建檔 / 要保書基本資料建檔 3366 新立暫收保費 ( 轉帳 )
3611 變更契約中文資訊 1. 要保人更名 7. 要保人變更 Ⅴ 檢核新 要保人 ( 姓 名 ) 及原要 保人 ( 姓名 )
3321/3322 撤保 ( 補入 )
3323/3324 申請終止 ( 補入 ) Ⅴ (91.12.31 以前成立契 約 )
3530/3529 保單借款 ( 補入 )
3535/3536 契約還款 ( 補入 )
3620 保險單線上借款申請 申請
3325/3326 滿期給付 ( 補入 )
2. 被保人更名
3.4.5.8.9.A.B. C.D.E.H.I 變更 生存 / 滿期 / 理賠 受益人變更 ( 更名 ) Ⅴ 檢核新受益 人 ( 姓名 ) 及原 受益人 ( 姓名 )
要保人 被保人 受益人 ( 法 人、團體名稱 ) 法定 ( 監護 / 輔助 ) 代 理人、臨櫃代理人 負責人、實質受益 人、高階管理人員
辦理壽險業務應行特別注意事項(113年2月修訂)-內網版-4.pdf (309.5 KB)

@Rachel_Lee

It seems that you are experiencing issues with Aspose.PDF not accurately detecting the ColSpan and RowSpan attributes when extracting tables from a PDF document. This can lead to the creation of multiple AbsorbedTables instead of a single cohesive table.

To address this issue, here are a few suggestions:

  1. Ensure Proper Use of AbsorbedTable: You mentioned using AbsorbedTable with useFlowEngine = true. This setting is generally recommended for better layout handling, but it may not always work perfectly with complex tables that include merged cells. You might want to experiment with useFlowEngine = false to see if it improves the detection of merged cells.

  2. Check for Updates: Since you are using Aspose.PDF version 23.4, consider checking if there are any updates or newer versions available. Sometimes, issues with table extraction and cell merging are addressed in later releases.

  3. Manual Handling of Merged Cells: If the automatic detection of ColSpan and RowSpan continues to fail, you may need to implement a manual approach. After extracting the tables, you can programmatically check for merged cells and adjust the resulting Markdown output accordingly. This would involve iterating through the cells and checking their properties to determine if they span multiple rows or columns.

  4. Refer to Documentation: For more detailed information on handling ColSpan and RowSpan, you can refer to the Aspose documentation on these properties. They provide insights on how to set and retrieve these attributes when creating or manipulating tables in PDF documents.

If you continue to face challenges, consider reaching out to the Aspose support team or checking the community forums for similar issues and solutions shared by other users.

For further reading, you can check the following resources:

Feel free to ask if you have more questions or need additional assistance!

I cannot find RowSpan and ColSpan Attriburtes in AbsorbedCell

@Rachel_Lee

Can you please share the sample code snippet that you have tried for testing? We will test the scenario in our environment and address it accordingly.