ASPOSE.OCR reads only one character from pdf image

Anil1995 · August 9, 2021, 5:16am

@asad.ali The .png file shared by you doesn’t contain the date and time of page no 3 below Agenda section. This is my priority area which I want read but OCR omit that area please refer to your output png file and also the page no 4 some text aren’t there below the page no 4. Can you please look into this. Thank you.

asad.ali · August 9, 2021, 6:55pm

@lion.brotzky

We would like to share with you that our Online Free App (Live Demo) uses the Cloud SDK. In case you want to generate similar results, you need to use OCR Cloud SDK. It uses the model that allows you to recognize tables and receipts. We plan to include this model in the downloadable version, but there is a lot of work to be done and we are not sure how soon we can provide it.

asad.ali · August 9, 2021, 7:00pm

@Anil1995

We have logged an issue as OCRNET-411 in our issue tracking system. We will surely look into its details and let you know as soon as it is rectified. Please be patient and spare us some time.

We are sorry for the inconvenience.

Anil1995 · August 17, 2021, 9:32am

@asad.ali Any update on last request?

asad.ali · August 17, 2021, 6:26pm

@Anil1995

We are afraid that the earlier logged ticket is not yet reviewed. We will investigate and resolve it on a first come first serve basis and let you know as soon as it is resolved. Please give us some time.

Anil1995 · August 18, 2021, 8:00am

@asad.ali Thanks for your information. Please look into it, we are eagerly waiting to implement in our project. Thank you.

Anil1995 · August 24, 2021, 4:45pm

@asad.ali When will we expect that the issue(OCRNET-411) will resolve because we are paused our development work. Thank you.

asad.ali · August 24, 2021, 9:13pm

@Anil1995

Could you please try using the below code snippet and let us know if it satisfies what you need:

string file = @"MP09.pdf";
            string totalResult = String.Empty;

            Document pdfDocument = new Document(file);

            TextAbsorber textAbsorber = new TextAbsorber();
            pdfDocument.Pages.Accept(textAbsorber);
            string extractedText = textAbsorber.Text;
            Console.WriteLine(extractedText);
            totalResult += extractedText;



            ImagePlacementAbsorber abs = new ImagePlacementAbsorber();
            pdfDocument.Pages.Accept(abs);
            // int i = 0;

            foreach (ImagePlacement imagePlacement in abs.ImagePlacements)
            {
                string slResult = "";
                XImage ximage = imagePlacement.Image;
                Console.Out.WriteLine("image width:/ height" + imagePlacement.Rectangle.Width + "/" + imagePlacement.Rectangle.Height);
                AsposeOcr libOcr = new AsposeOcr();

                using (MemoryStream ms = new MemoryStream())
                {
                    ximage.Save(ms);
                    ms.Position = 0;
                  // to check images
                  // using (FileStream fs = new FileStream("D://img" + (i++).ToString() + ".jpg", FileMode.Create))
                  // {
                  //     fs.Write(ms.ToArray());
                        slResult = libOcr.RecognizeImage(ms);
                  // }
                }
                Console.WriteLine(slResult);
                totalResult += slResult;
            }

            File.WriteAllText("D://resultPdfOcr.txt", totalResult);

Anil1995 · August 25, 2021, 2:04pm

@asad.ali By using the above code given by you still I am not getting the desired output. It’s able to read only single page and print few unnecessary characters. Attached is the screenshot of the text file which is generated. Can you please check once more. I believe you are having the pdf file which I want to make OCR operation. Can you please check it with the issue (OCRNET-411) for more accurate result. Thank You.Screenshot (89).png (59.2 KB)

asad.ali · August 25, 2021, 9:30pm

@Anil1995

We will further investigate the ticket as per your feedback and will inform you once we have some updates regarding investigation results.

Anil1995 · September 11, 2021, 6:11pm

@asad.ali What is the status of OCRNET-411 regrading pdf OCR operation? Thank you.

asad.ali · September 13, 2021, 8:17pm

@Anil1995

We again investigated the ticket and found that you are using the API without any license as using the license we got correct output which is attached as well. resultPdfOcr.zip (2.9 KB)

Please use a temporary license as suggested in one of our previous responses and let us know in case you still face any issues.

Anil1995 · September 15, 2021, 12:52pm

@asad.ali As you know I already taken the temporary license before and within the license period I also checked the output of pdf OCR with the code shared by you, it was not giving me the desired output I also pinged you regrading this earlier. Now it’s is too long time so my temporary license got expired so I will not able to check it now. For checking it now I need another temporary license, So can you please provide me temporary license again? Thank you.

asad.ali · September 15, 2021, 2:58pm

@Anil1995

As we already tested with a valid license and output was fine which was shared with you as well, it looks like the issue is occurring at your end due to not using a valid license. You can please post an inquiry in our Purchase forum in order to get a new temporary license or an extension to your existing license. In case you still notice any issues, please feel free to let us know.

Anil1995 · September 16, 2021, 12:03pm

@asad.ali I have tried the same code given by you earlier. I am using temporary license one my friends account, I tested over there but the output result is same as before. Attach is the screenshot of output text file.Screenshot (105).png (60.5 KB)

Below is the code snippet which I am using:

static void Main(string[] args)
{
StringBuilder sb = new StringBuilder();
Aspose.OCR.License olicense = new Aspose.OCR.License();
olicense.SetLicense(@“C:\Users\OPTLPTP217\Downloads\Aspose.OCR.NET.lic”);
string file = @“D:\ExtractText\pdf\MP09.pdf”;
string totalResult = String.Empty;

        Document pdfDocument = new Document(file);

        TextAbsorber textAbsorber = new TextAbsorber();
        pdfDocument.Pages.Accept(textAbsorber);
        string extractedText = textAbsorber.Text;
        Console.WriteLine(extractedText);
        totalResult += extractedText;

        ImagePlacementAbsorber abs = new ImagePlacementAbsorber();
        pdfDocument.Pages.Accept(abs);
        // int i = 0;

        foreach (ImagePlacement imagePlacement in abs.ImagePlacements)
        {
            string slResult = "";
            XImage ximage = imagePlacement.Image;
            Console.Out.WriteLine("image width:/ height" + imagePlacement.Rectangle.Width + "/" + imagePlacement.Rectangle.Height);
            AsposeOcr libOcr = new AsposeOcr();

            using (MemoryStream ms = new MemoryStream())
            {
                ximage.Save(ms);
                ms.Position = 0;
                // to check images
                // using (FileStream fs = new FileStream("D://img" + (i++).ToString() + ".jpg", FileMode.Create))
                // {
                //     fs.Write(ms.ToArray());
                slResult = libOcr.RecognizeImage(ms);
                // }
            }
            Console.WriteLine(slResult);
            totalResult += slResult;
        }

        File.WriteAllText("D://resultPdfOcr.txt", totalResult);
        Console.ReadLine();
    }

Waiting for your response. Thank you

Anil1995 · September 16, 2021, 12:40pm

@asad.ali I am using following temporary license. Please find the attachment. Screenshot (106).png (52.8 KB)

asad.ali · September 16, 2021, 10:25pm

@Anil1995

Please share your temporary license in a private message with us. You can share your license by adding it to .zip archive and attaching it to your message. In order to send a private message, please reply to the post and click the top-left button in post editor as shown in screenshot.
privatemessage.png (9.6 KB)

asad.ali · September 21, 2021, 8:09pm

@Anil1995

The license file which you have shared is already expired. You can please post a request in our purchase forum to get an extension or new temporary license and test again using Aspose.OCR for .NET 21.8. Please feel free to let us know in case you face any issues.

davidnutt · September 22, 2021, 1:18pm

I am facing a similar issue with a paid license and Aspose OCR 21.8. I am using the following code:

 License ocrLicense = new License();
 ocrLicense.setLicense(ASPOSE_LICENSE);
 AsposeOCRPdf pdf = new AsposeOCRPdf();
 LOGGER.info("ENGAGING OCR");
 DocumentRecognitionSettings set = new DocumentRecognitionSettings(0);
 set.setLanguage(Language.Eng);
 // Recognize images from PDF 
 ArrayList<RecognitionResult> results = pdf.RecognizePdf(fileName, set);
 for (RecognitionResult result : results) {
      pageText.add(result.recognitionText);
      LOGGER.info(result.recognitionText);
       LOGGER.info("recognitionLinesResult");
      for (int i = 0; i < result.recognitionLinesResult.size() -1; i++) {
           LOGGER.info("LINE " + i + ": " + result.recognitionLinesResult.get(i).textInLine);
      }
}

It logs the following two lines for every page

 I
 recognitionLinesResult

Any assistance would be greatly appreciated!

asad.ali · September 23, 2021, 3:15pm

@davidnutt

Could you please share the sample PDF file with us as well? We will test the scenario in our environment and address it accordingly.