Hi,
I am using Aspose.PDF to read a file containing a table (see the attachment) and convert the result into Markdown. However, I noticed that the ColSpan and RowSpan attributes are not accurately detected.
Here are my settings:
- Using
AbsorbedTable
with useFlowEngine = true
.
- Aspose.PDF version: 23.4.
This is my expected result:
交易代號及名稱 |
變更項目 |
檢核對象 要保人 |
檢核對象 被保人 |
檢核對象 受益人 ( 法 人、團體名稱 ) |
檢核對象 法定 ( 監護 / 輔助 ) 代 理人、臨櫃代理人 |
檢核對象 負責人、實質受益 人、高階管理人員 |
3313/3312 要保書基本資料 建檔 / 要保書基本資料建檔 3366 新立暫收保費 ( 轉帳 ) |
|
Ⅴ |
Ⅴ |
Ⅴ |
Ⅴ |
Ⅴ |
3611 變更契約中文資訊 |
1. 要保人更名 7. 要保人變更 |
Ⅴ 檢核新 要保人 ( 姓 名 ) 及原要 保人 ( 姓名 ) |
|
|
Ⅴ |
|
3611 變更契約中文資訊 |
2. 被保人更名 |
Ⅴ |
|
|
Ⅴ |
|
3611 變更契約中文資訊 |
3.4.5.8.9.A.B. C.D.E.H.I 變更 生存 / 滿期 / 理賠 受益人變更 ( 更名 ) |
Ⅴ |
|
Ⅴ 檢核新受益 人 ( 姓名 ) 及原 受益人 ( 姓名 ) |
Ⅴ |
Ⅴ |
3321/3322 撤保 ( 補入 ) |
|
Ⅴ |
|
|
Ⅴ |
|
3323/3324 申請終止 ( 補入 ) |
|
Ⅴ |
|
Ⅴ (91.12.31 以前成立契 約 ) |
Ⅴ |
Ⅴ |
3530/3529 保單借款 ( 補入 ) |
|
Ⅴ |
|
|
Ⅴ |
|
3535/3536 契約還款 ( 補入 ) |
|
Ⅴ |
|
|
Ⅴ |
|
3620 保險單線上借款申請 |
申請 |
Ⅴ |
|
|
Ⅴ |
|
3325/3326 滿期給付 ( 補入 ) |
|
|
|
Ⅴ |
Ⅴ |
Ⅴ |
However, Aspose extracts the table from the PDF into three AbsorbedTables, as shown below.
When Aspose encounters merged cells, it converts them into a separate table.
交易代號及名稱 |
變更項目 |
檢核對象 |
|
|
|
|
3313/3312 要保書基本資料 建檔 / 要保書基本資料建檔 3366 新立暫收保費 ( 轉帳 ) |
|
Ⅴ |
Ⅴ |
Ⅴ |
Ⅴ |
Ⅴ |
3611 變更契約中文資訊 |
1. 要保人更名 7. 要保人變更 |
Ⅴ 檢核新 要保人 ( 姓 名 ) 及原要 保人 ( 姓名 ) |
|
|
Ⅴ |
|
3321/3322 撤保 ( 補入 ) |
|
Ⅴ |
|
|
Ⅴ |
|
3323/3324 申請終止 ( 補入 ) |
|
Ⅴ |
|
Ⅴ (91.12.31 以前成立契 約 ) |
Ⅴ |
Ⅴ |
3530/3529 保單借款 ( 補入 ) |
|
Ⅴ |
|
|
Ⅴ |
|
3535/3536 契約還款 ( 補入 ) |
|
Ⅴ |
|
|
Ⅴ |
|
3620 保險單線上借款申請 |
申請 |
Ⅴ |
|
|
Ⅴ |
|
3325/3326 滿期給付 ( 補入 ) |
|
|
|
Ⅴ |
Ⅴ |
Ⅴ |
2. 被保人更名 |
Ⅴ |
|
|
Ⅴ |
|
3.4.5.8.9.A.B. C.D.E.H.I 變更 生存 / 滿期 / 理賠 受益人變更 ( 更名 ) |
Ⅴ |
|
Ⅴ 檢核新受益 人 ( 姓名 ) 及原 受益人 ( 姓名 ) |
Ⅴ |
Ⅴ |
@Rachel_Lee
It seems that you are experiencing issues with Aspose.PDF not accurately detecting the ColSpan
and RowSpan
attributes when extracting tables from a PDF document. This can lead to the creation of multiple AbsorbedTables
instead of a single cohesive table.
To address this issue, here are a few suggestions:
-
Ensure Proper Use of AbsorbedTable: You mentioned using AbsorbedTable
with useFlowEngine = true
. This setting is generally recommended for better layout handling, but it may not always work perfectly with complex tables that include merged cells. You might want to experiment with useFlowEngine = false
to see if it improves the detection of merged cells.
-
Check for Updates: Since you are using Aspose.PDF version 23.4, consider checking if there are any updates or newer versions available. Sometimes, issues with table extraction and cell merging are addressed in later releases.
-
Manual Handling of Merged Cells: If the automatic detection of ColSpan
and RowSpan
continues to fail, you may need to implement a manual approach. After extracting the tables, you can programmatically check for merged cells and adjust the resulting Markdown output accordingly. This would involve iterating through the cells and checking their properties to determine if they span multiple rows or columns.
-
Refer to Documentation: For more detailed information on handling ColSpan
and RowSpan
, you can refer to the Aspose documentation on these properties. They provide insights on how to set and retrieve these attributes when creating or manipulating tables in PDF documents.
If you continue to face challenges, consider reaching out to the Aspose support team or checking the community forums for similar issues and solutions shared by other users.
For further reading, you can check the following resources:
Feel free to ask if you have more questions or need additional assistance!
I cannot find RowSpan and ColSpan Attriburtes in AbsorbedCell
@Rachel_Lee
Can you please share the sample code snippet that you have tried for testing? We will test the scenario in our environment and address it accordingly.