When I have an element in my pdf that represents a table, the two columns are merged into one column when I extract TextFragment.
pdf file 测试术语.pdf (123.9 KB)
Extract the content after translation image.jpg (147.1 KB)
Note The following: 1 10 mm THK. CLEAR TEMPERED GLASS, ALU FILLET
Here the 1 and 10MM THK. CLEAR TEMPERED GLASS, ALU FILLET are two columns but the extract is one TextFragment image.png (386.8 KB)
Can you please share a bit more details like how you are making a translated version of the PDF and how you are extracting text? Please share the sample code snippet with us so that we can test the scenario in our environment and address it accordingly.
Instead of using TextFragmentAbsorber, can you please try to extract the table using TableAbsorber class and let us know if you notice some improvements.
We were able to notice these issues in our environment. Therefore, we have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Can you please share which version of the API are you using and what are your environment details like OS Name and Version, Application Type, etc.? Please make sure to test with 24.3 version in case it helps.
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFNET-56860
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.