The current (java) mechanism for overlaying HOCR text with documents is to use the Document.CallBackGetHocr interface with Document.convert method. This requires decoding of each image which can be pretty CPU/Memory intensive.
If we already have the HOCR for the document, is it possible just to overlay the text without using the callback/image decoding? I simply want to add the text at the required positions.
Those are for the .NET API but I think I’ve found the java equivalent. Seems like there are more factors to consider here (transparent text, text positioning adjustment relative to the image x/y, font scaling to fit the bounding box, etc.). I thought maybe there was a way to simply provide the HOCR without the image decoding but maybe not?
We have logged a feature request under the ticket ID PDFJAVA-37343 in our issue tracking system to add HOCR formatted text in a PDF document. You might also share the sample HOCR formatted samples which you require to add in the PDF document. We have linked your post to this ticket and will keep you informed regarding any available updates.
The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan