Hi,
I try to convert PDF (aspose pdf 11.9.0) with Hocr generated from Tesseract 3.0.4.
- With Html hocr : do it nothing ! PDF is same before the transform.
- With xhtml hocr : convert method throw FormatException.
you can reproduce the issue using the attached project.
Here is a sample of my code:
public void Save(Func<int, Stream> getStream)
{ using (var s = getStream(0)) { this.asposeDoc.Convert(hocrTesseract); this.asposeDoc.Save(s); } } private string hocrTesseract(System.Drawing.Image img) { using (var ocr = new TesseractEngine(@"(...)", "fra", EngineMode.Default)) using (var bitmap = new Bitmap(img)) using (var page = ocr.Process(bitmap)) { return page.GetHOCRText(0); } }