Converting non searchable PDF to searchable PDF document

vmerz · March 24, 2016, 7:02am

Hi everyone,

i´ve tried to convert a non searchable PDF to a searchable one. Therefore i tried the code mentioned [here](http://www.aspose.com/docs/x/I4B-GQ).
The text is returned by tesseract, unfortunately this exception is raised when the callback returns:

Exception in thread “main” class com.aspose.pdf.internal.p235.z76: Unknown char: ;
com.aspose.pdf.internal.p280.z6.m1(Unknown Source)
com.aspose.pdf.internal.p280.z6.m1(Unknown Source)
com.aspose.pdf.internal.p280.z6.m1(Unknown Source)
com.aspose.pdf.internal.p235.z61.m1(Unknown Source)
com.aspose.pdf.internal.p235.z43.m12(Unknown Source)
com.aspose.pdf.internal.p505.z19.m1(Unknown Source)
com.aspose.pdf.internal.p505.z19.m1(Unknown Source)
com.aspose.pdf.ADocument.convert(Unknown Source)
com.aspose.pdf.Document.convert(Unknown Source)
de.caspier.nova.test.TestOcr.convertTesseract(TestOcr.java:137)
de.caspier.nova.test.TestOcr.main(TestOcr.java:31)

Any ideas why this happens?

My second question is, if there is another possibility to reach my goal without tesseract. Maybe with Aspose.ocr?

Looking forward to your answers!
Regards,
Vincent

P.S.: Tried with different PDF-Files

codewarior · March 25, 2016, 7:28am

Hi Vincent,

Thanks for contacting support.

Please share the resource PDF document causing this problem, so that we can test the scenario in our environment. We are sorry for this inconvenience.

Now concerning to using Aspose.OCR, this API provides the feature to OCR text over image and once you have performed the OCR, you will have to created PDF from scratch using Aspose.Pdf for Java.

vmerz · March 31, 2016, 2:32am

Hello Nayyer Shahbaz,

thank you very much for your answer. A resource PDF document is attached.

If I switch over to your alternative with Aspose.OCR & Aspose.PDF i will loose the layout of the source document, won´t I?

Regards,
Vincent

codewarior · April 1, 2016, 7:16am

Hi Vincent,

Thanks for sharing the resource file.

I have tested the scenario and have managed to reproduce same problem. For the sake of correction, I have logged it as PDFNEWJAVA-35697 in our issue tracking system. We will further look into the details of this problem and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

vmerz · April 14, 2016, 4:28am

Hi Nayyer Shahbaz,

is there a possibility to view the state of the issue.

Regards,
Vincent

codewarior · April 15, 2016, 8:36am

Hi Vincent,

Thanks for your patience.

As we recently have noticed earlier reported issue, so its pending for review and is not yet resolved. However the product team will surely consider investigating/fixing it as per development schedule and as soon as we have some definite updates regarding its resolution, we will let you know. Please be patient and spare us little time. We are sorry for this delay and inconvenience.