Pdf.dll unable to read multi-header grids

JohnDeerePage3.docx (19.5 KB)

John_Deere_Short.pdf (464.2 KB)

Ok we have an issue. On page 3 of the pdf file there are over 5 grids. Only the first grid is being absorbed properly.

In the word doc, you can see what your TextFragemnts and absorber are returning.

Notice Grid 4 labeled: “WATER HEATER SCHEDULE - ELECTRIC” There is a [Table Name] row, then 2.5 header rows, then one data row. There are 4 rows but it only reads 3. As you can see in the word doc row count for grid 4.

Look at the text in the word doc, it stops reading the entire grid on row 2, the third row (Row: 2 | MARK | STORAGE (GAL) | WATER IN (°F) | WATER OUT (°F) | )

What can we do about this?

It’s like this for all tables except the first one.

Only the first table is read. All the other tables have some non-standard header rows that is crashing the reading.

If I can just get it to read the rows, I can make sense out of it. Please help.

@maseyo
We are looking into it and will be sharing our feedback with you shortly.

@maseyo

We have already observed your other inquiries and topics where you are struggling with reading tables inside PDF and we are investigating them. We are sorry for the trouble being faced. If possible, can you please share the code snippet using which you obtained the results in .docx file?

Is there any progress here?

@maseyo

We believe that you have reported similar issue and similar files in one of your other threads also and we have already generated a ticket in our issue tracking system for this case.