Pdf.dll unable to read multi-header grids

JohnDeerePage3.docx (19.5 KB)

John_Deere_Short.pdf (464.2 KB)

Ok we have an issue. On page 3 of the pdf file there are over 5 grids. Only the first grid is being absorbed properly.

In the word doc, you can see what your TextFragemnts and absorber are returning.

Notice Grid 4 labeled: “WATER HEATER SCHEDULE - ELECTRIC” There is a [Table Name] row, then 2.5 header rows, then one data row. There are 4 rows but it only reads 3. As you can see in the word doc row count for grid 4.

Look at the text in the word doc, it stops reading the entire grid on row 2, the third row (Row: 2 | MARK | STORAGE (GAL) | WATER IN (°F) | WATER OUT (°F) | )

What can we do about this?

It’s like this for all tables except the first one.

Only the first table is read. All the other tables have some non-standard header rows that is crashing the reading.

If I can just get it to read the rows, I can make sense out of it. Please help.