Aspose.ocr HOCR format

crshekharam · October 14, 2019, 12:04pm

Dear Team
I am trying to convert a scanned PDF to searchable PDF. I found solution in forums to convert PDF to image, convert image to HOCR using Tesseract and finally to searchable PDF using Asponse.PDF. Is there a direct way to convert image to HOCR in Aspose.OCR itself so that I need not run tesseract in background.

Regards
Raj

asad.ali · October 14, 2019, 9:35pm

@crshekharam

You can perform OCR on PDF files using Aspose.OCR for .NET without using external tesseract services. For your requirements of making searchable PDFs, would you kindly share a sample PDF document with expected output. We will further look into it and share our feedback with you.

crshekharam · October 15, 2019, 1:41am

Please find input and ouput documents

Input (263.6 KB)
Expected Ouput (195.8 KB)

asad.ali · October 15, 2019, 5:28pm

@crshekharam

We have logged a feature request as OCR-816 in our issue tracking system for implementation of your requirements. We will further investigate the feasibility of the feature and share our feedback with you as soon as some updates are available in this regard. Please be patient and spare us little time.

We are sorry for the inconvenience.