Poor image quality using Document.callBackGetHocr

mike1986 · February 15, 2018, 12:46pm

I’m using this example code:Manipulate PDF Document|Aspose.PDF for Java, for making searchable pdf from not searchable pdf, but quality of image resulting from img saving is very poor compared to the original.

The problem is that consequently tesseract does not recognize the text very well.

Can you help me to improve img quality please?

The space taken by its images does not matter, because they are erased immediately after treatment.

The quality of the final searchable pdf is ok.

imran.rafique · February 15, 2018, 10:07pm

@mike1986,

Kindly send us your source PDF and the complete code. We will investigate your scenario in our environment and share our findings with you.

imran.rafique · February 15, 2018, 10:07pm

@mike1986,

Kindly send us your source PDF and the complete code. We will investigate your scenario in our environment and share our findings with you.

imran.rafique · February 16, 2018, 9:44pm

@mike1986,

You can convert PDF pages to image format, and then call Tesseract exe. We have converted your PDF pages to images with the latest version 18.1 of Aspose.PDF for Java API and the output images are fine. To convert PDF Pages to JPEG Image in Java, please refer to the code examples mentioned in the following article:

mike1986 · February 19, 2018, 8:47am

ok thank you for your answer. But after how do i make only one pdf with all images with and aspose api?

imran.rafique · February 19, 2018, 7:40pm

@mike1986,

You can create a PDF from scratch, and then insert page and image elements. Document instance represents a PDF document and Add method of the PageCollection class allows to insert an empty page. Please refer to this help topic: Add Image to Existing PDF File