Dear Aspose Team,
we’ve got an very big issue with hundreds of documents when using the Aspose.PDF CallBackGetHocr mechanism.
You can find a very simple sample document attached to this post.
What I can tell you about the faulty process is this:
• Although there are 4 images in the pdf file, the callback was triggered only 3 times
• The hocr result (html) was assigned to the wrong image. The text of the “tall text container” was connected to the lowest “more text” image.
You can see the effect best when opening the processed document in a pdf viewer and try to select the text behind the lowest “more text” box.
The same issue was reported and logged as PDFJAVA-36669. We are using aspose.pdf version 17.6 but the problem still exists (with other documents).
That leads to the next question. Do you need all correlated document to fix the problem, or is there a chance to get this fixed in a more general manner?
Kind regards
example.pdf (86.2 KB)
result.pdf (89.2 KB)