OCR a PDF image to a readable PDF

We are testing Aspose OCR. Got example code from the site but it is not working.
Code is:

Dim ocr = New AsposeOcr()
Dim settings = New DocumentRecognitionSettings()
settings.Language = Language.Eng
settings.StartPage = 1
settings.PagesNumber = 1
settings.DetectAreas = True
settings.DetectAreasMode = DetectAreasMode.COMBINE
settings.ThreadsCount = 1

If File.Exists(“C:\Users\Russ\Documents\OCRTest_NoStamp.pdf”) Then
Dim res = ocr.RecognizePdf(“C:\Users\Russ\Documents\OCRTest_NoStamp.pdf”, settings)

  AsposeOcr.SaveMultipageDocument("C:\Users\Russ\Documents\OCRTest_NoStampOCR.pdf", SaveFormat.Pdf, res)

    End If

@vaughnis

To perform OCR on a PDF image and convert it into a readable PDF using Aspose OCR, you can follow the example provided below. It appears that your current implementation is close, but there are some adjustments you might need to make.

Here’s a sample code snippet that demonstrates how to recognize text from a PDF document and save the output as a new PDF file:

Dim ocr As New AsposeOcr()
Dim settings As New RecognitionSettings()
settings.Language = Language.Eng

' Create an OcrInput object and add the PDF file
Dim input As New OcrInput(InputType.PDF)
input.Add("C:\Users\Russ\Documents\OCRTest_NoStamp.pdf", startPage:=1, pagesCount:=1)

' Perform OCR recognition
Dim results As List(Of RecognitionResult) = ocr.Recognize(input, settings)

' Save the recognized text to a new PDF file
AsposeOcr.SaveMultipageDocument("C:\Users\Russ\Documents\OCRTest_NoStampOCR.pdf", SaveFormat.Pdf, results)

Key Points:

  1. Initialization: Ensure that you initialize the AsposeOcr instance correctly.
  2. Input Setup: Use the OcrInput class to specify the PDF file and the pages you want to process.
  3. Recognition Settings: Set the language and any other recognition settings as needed.
  4. Saving the Output: Use the SaveMultipageDocument method to save the recognized text into a new PDF file.

Make sure that the file paths are correct and that the PDF file you are trying to process exists at the specified location. If you encounter any errors, check the exception messages for more details on what might be going wrong.

If you continue to experience issues, consider checking the official documentation or examples provided by Aspose for further guidance.

Sources:

I get errors on the following lines:
Dim input As New OcrInput(InputType.PDF)
Dim results As List(Of RecognitionResult) = ocr.Recognize(input, settings)

OCRInput - Type OCRInput is not defined
InputType - InputType is not declared
ocr.Recognize - Recognize is not a member of AsposeOCR

Downloading the latest version took care of the errors but when I attempt to open the OCRed PDF I get a message that states:
There was an error opening this document. This file cannot be opened because it has no pages.

@vaughnis

Would you please share what error you are facing? Also, please share your sample PDF with us so that we can test the scenario in our environment and address it accordingly.

See my last post. The errors were resolved by downloading the latest version. Now it creates the output file but when you attempt to open it it states:
There was an error opening this document. This file cannot be opened because it has no pages.

@vaughnis

Would you please share the sample PDF document for our reference as well? We will test the scenario in our environment and address it accordingly.

OCRTest.pdf (309.6 KB)

OCRTestOCR.pdf (1.1 KB)

These are the 2 documents. OCRTest is the input and OCRTestOCR is the output.

@vaughnis

Have you installed the API from NuGet Package Manager? Also, did you use a valid or 30-days free temporary license? We have tested the scenario in our environment with 25.1 version and did not notice any issues. Please check the attached output PDF for your kind reference.
output.pdf (388.6 KB)

@vaughnis

Have you installed the API from NuGet Package Manager? Also, did you use a valid or 30-days free temporary license? We have tested the scenario in our environment with 25.1 version and did not notice any issues. Please check the attached output PDF for your kind reference.
output.pdf (388.6 KB)

We have a license file. Ours license allowed us to get version 23.5 which we can’t get to work .We will get a new license file and try version 25.1. If we still have issues we will let you know.

@vaughnis

Sure, please take your time. In the meanwhile, you can also apply for a free 30-days temporary license to evaluate the latest version.

We got our new license file and downloaded version 25.1. But we still get the same result. The output file will not open. It states:
There was an error opening this document. This file cannot be opened because it has no pages.

Here is the code:
Dim ocr As New AsposeOcr()
Dim settings As New RecognitionSettings()
settings.Language = Language.Eng
Dim input As New OcrInput(InputType.PDF)

    input.Add("C:\Users\Russ\Documents\OCRTest.pdf", startPage:=1, pagesCount:=1)
    Dim resultsNoStamp As List(Of RecognitionResult) = ocr.Recognize(input, settings)
    AsposeOcr.SaveMultipageDocument("C:\Users\Russ\Documents\OCRTestOCR.pdf", SaveFormat.Pdf, resultsNoStamp)

I am told we also have a new support license.

Aspose Developer Support - 250220145936

@vaughnis

Is it possible if you could please share your sample console application with us in .zip format? You can remove DLLs from it to reduce its size as well as your license file from it and upload here for our reference. We can restore the NuGet Packages at our end and use the same application to reproduce the issue in our environment.

Additionally, have you set the license before using above code snippet?

I do set the license on page load.

How do I upload the project?

I have removed all the dlls and my zip shows that it is 26MB but when I attempt to upload I get a message that says:
Sorry, that file is too big (maximum size is 48.8 MB). Why not upload your large file to a cloud sharing service, then paste the link?

Please advise.

@vaughnis

Please upload the .zip to Google Drive and share the link with us.

@vaughnis

Looks like it is a private link. Could you please share public link because we are not able to access it.