ASPOSE.OCR reads only one character from pdf image

Anil1995 · September 24, 2021, 11:42am

@asad.ali After testing with temporary license I am getting the same result as before. What’s the wrong I am unable to trace it out. See the attached text file. Thank you.resultPdfOcr.zip (941 Bytes)

asad.ali · September 27, 2021, 3:54pm

@Anil1995

We have already tested your temporary license which you shared in private message. As shared earlier, it is already expired. Please try to get new temporary license by posting a request in purchase forum and in case you already got one and still facing the issue, please share the new license with us in a personal message so that we can proceed with the investigation.

PS: We already tested the same case in our environment using valid license and could not replicate the issue.

Anil1995 · October 4, 2021, 4:59pm

Hi @asad.ali
I have shared my temporary license with you in personal message before 7 days. Any update on this?? Till now I am unable to get the desire output. I believe you will look into it. Thank You.

asad.ali · October 5, 2021, 7:20pm

@Anil1995

Yes, we tested the scenario again using your latest shared license file and achieved the attached results. resultPdfOcr.zip (3.0 KB)

It looks like you are not setting the license properly in your application. We request you please share a sample console application with us which is able to replicate the issue so that we can further dig into the issue that you are facing and assist you accordingly.

Anil1995 · October 6, 2021, 5:04pm

@asad.ali I have attached the console project. Please add the Aspose.OCR dll file as well as Aspose.Pdf dll file. Please check and confirm me. Thank youAspose.Test.zip (4.9 MB)

asad.ali · October 6, 2021, 7:00pm

@Anil1995

We were able to determine the actual cause of the issue. The issue is occurring because you are using Aspose.PDF without license and due to trial mode limitation, you are obtaining limited results. Please get a 30-days temporary license for Aspose.PDF as well and set it in the code just like you are doing for Aspose.OCR and test again.

Anil1995 · October 7, 2021, 9:59am

@asad.ali Thank you so much after setting the temporary license of Aspose.pdf it really work. Finally you figure it out. Again thank you so much for your continuous help really appreciated. While evaluating the output I found some unnecessary characters and also OCR unable to read the agenda section. I attached the screenshot of it please check once and confirm me because agenda section is important for us. Thanks pdfopt.PNG.jpg (104.6 KB)
resultPdfOcr.zip (2.9 KB)
Ucharacters.PNG (20.6 KB)

asad.ali · October 7, 2021, 6:20pm

@Anil1995

We have also noticed this behavior from the API in our environment. Hence, have updated the information of the earlier logged ticket. We will look into its details and let you know as soon as it is fixed. Please be patient and give us some time.

We are sorry for the inconvenience.

Anil1995 · October 8, 2021, 11:55am

@asad.ali Thanks for your quick response. Please give me update when it is ready because we have to deploy it in our application as soon as possible. Thank you.

asad.ali · October 8, 2021, 8:09pm

@Anil1995

Sure, we will inform you as soon as we make definite progress towards issue resolution. Please spare us some time.

asad.ali · October 11, 2021, 8:45pm

@Anil1995

We have performed further investigations against the earlier logged ticket. Unfortunately, this image extracted from the PDF has a complex structure, and we can’t handle such a structure without loss now. In the future, we have plans to import a new model that will be able to recognize documents of any structure. We will let you know once the ticket is completely resolved.

Anil1995 · October 18, 2021, 5:36am

@asad.ali Thank you so much. Please confirm me.

asad.ali · October 18, 2021, 8:59pm

@Anil1995

We will surely inform you once we make some progress towards ticket resolution.

asad.ali · July 30, 2023, 9:28pm

@Anil1995

We have improved our recognition results.

OcrInput input = new OcrInput(InputType.PDF);
input.Add("MP09.pdf");
var result = api.Recognize(input, new RecognitionSettings
{
 DetectAreasMode = DetectAreasMode.DOCUMENT // PHOTO
});

AsposeOcr.SaveMultipageDocument("D://document.txt", SaveFormat.Text, result);

Please, view the attached files - the recognition result is better (here is the result without using Aspose.PDF (without the first page))

Also, we work now with the ability to extract text from the PDF without an image (as the first page in your PDF file). We will share updates with you in this regard as well.
files.zip (5.8 KB)