We are trying to embed hocr content into a pdf file.
For most files of our customer, the output pdf is correctly generated.
But for one specific layout, the text is put at the bottom right.
We are using Aspose.Pdf 20.12.
The code we use is
using (var pdf = new Aspose.Pdf.Document(“C:\Aspose\test.pdf”))
I’m attaching the input (test.pdf), the output (test_with_hocr.pdf), and the HOCR (hocr.html) generated by Tesseract.
You can avoid using Tesseract by returning the contents of the file “hocr.html”.
Aspose HOCR examples.zip (181.7 KB)
Thanks for looking into this.