OCR a PDF image to a readable PDF

I made it public. You should be able to access it now

Git it zipped down enough to upload.

OCRTestApp.zip (7.9 MB)

@vaughnis

Thanks for providing the requested information and your patience with us. We have tested with your application and looks like the culprit is below line of code:

input.Add("\Files\OCRTest.pdf", startPage:=1, pagesCount:=1)

The input document has only one page and index starts from 0. So it should be added like below:

input.Add("\Files\OCRTest.pdf", startPage:=0, pagesCount:=1)

Just by modifying like above, we were able to generate correct output.

OCRTestOCR_0.pdf (417.7 KB)

I started a new project and changed the StartPage to 0. When the app is run I get the error:

System.IO.FileNotFoundException: ‘Could not load file or assembly ‘Microsoft.ML.OnnxRuntime, Version=0.0.0.0, Culture=neutral, PublicKeyToken=f27f157f0a5b7bb6’ or one of its dependencies. The system cannot find the file specified.’

So I went to NuGet and added Microsoft.ML.OnnxRuntime version 1.20.1. Now when I run it it errors in code in NativeMethods.shared.cs. The error is:

System.NullReferenceException
HResult=0x80004003
Message=Object reference not set to an instance of an object.
Source=Microsoft.ML.OnnxRuntime
StackTrace:
at Microsoft.ML.OnnxRuntime.NativeMethods…cctor() in D:\a_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\NativeMethods.shared.cs:line 368

This module got loaded with the OnnxRuntime. Please advice.

@vaughnis

Please make sure that you start a project with .NET Framework >=4.6.2. Also, please install Aspose.OCR for .NET from NuGet Package Manager because this way it will install all dependencies with correct versions. As shared earlier, we performed all these steps and did not notice any issues in our environment.

Ok. I changed the framework to 4.6.2 but it still errors. I even started a new project but get the same error. In both scenarios I added Aspose.OCR using NuGet manager. The error occurs on the code line:

Dim resultsNoStamp As List(Of RecognitionResult) = ocr.Recognize(input, settings)

Here are the details from the error:

System.NullReferenceException
** HResult=0x80004003**
** Message=Object reference not set to an instance of an object.**
** Source=Microsoft.ML.OnnxRuntime**
** StackTrace:**
** at Microsoft.ML.OnnxRuntime.NativeMethods…cctor() in D:\a_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\NativeMethods.shared.cs:line 368**

@vaughnis

Please download a sample project at given link and test using it. You can change the path to .lic and other input/output files according to your environment and let us know if you face issues with it.

I downloaded the sample project and it error on the same line of code. The problem is in the NativeMethods.shared.cs code at line 368.
The line is:
OrtCreateEnv = (DOrtCreateEnv)Marshal.GetDelegateForFunctionPointer(api_.CreateEnv, typeof(DOrtCreateEnv));

@vaughnis

It is strange because we tested using same application in our environment and we did not face any issues. Are you testing it in Windows? Also, is it x86 architecture only at your end? Can you please make sure that the project is configured to use x64? Please share some screenshots of the issue and full environment details for our reference so that we can proceed with the investigation.

Sorry for the delay in responding. I have been out of the office.
We have the setting as Any CPU. When I try it with x64 I get the following error:

Could not load file or assembly ‘OCR1’ or one of its dependencies. An attempt was made to load a program with an incorrect format.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.BadImageFormatException: Could not load file or assembly ‘OCR1’ or one of its dependencies. An attempt was made to load a program with an incorrect format.

Source Error:

An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.

Assembly Load Trace: The following information can be helpful to determine why the assembly ‘OCR1’ could not be loaded.

`=== Pre-bind state information ===
LOG: DisplayName = OCR1
(Partial)
WRN: Partial binding information was supplied for an assembly:
WRN: Assembly Name: OCR1 | Domain ID: 2
WRN: A partial bind occurs when only part of the assembly display name is provided.
WRN: This might result in the binder loading an incorrect assembly.
WRN: It is recommended to provide a fully specified textual identity for the assembly,
WRN: that consists of the simple name, version, culture, and public key token.
WRN: See whitepaper Best Practices for Assembly Loading - .NET Framework | Microsoft Learn for more information and common solutions to this issue.
LOG: Appbase = file:///S:/Visual Studio 2013/Projects - Beta/OCR1/
LOG: Initial PrivatePath = S:\Visual Studio 2013\Projects - Beta\OCR1\bin
Calling assembly : (Unknown).

LOG: This bind starts in default load context.
LOG: Using application configuration file: S:\Visual Studio 2013\Projects - Beta\OCR1\web.config
LOG: Using host configuration file: C:\Users\Russ\Documents\IISExpress\config\aspnet.config
LOG: Using machine configuration file from C:\Windows\Microsoft.NET\Framework\v4.0.30319\config\machine.config.
LOG: Policy not being applied to reference at this time (private, custom, partial, or location-based assembly bind).
LOG: Attempting download of new URL file:///C:/Users/Russ/AppData/Local/Temp/Temporary ASP.NET Files/vs/933423e0/35293208/OCR1.DLL.
LOG: Attempting download of new URL file:///C:/Users/Russ/AppData/Local/Temp/Temporary ASP.NET Files/vs/933423e0/35293208/OCR1/OCR1.DLL.
LOG: Attempting download of new URL file:///S:/Visual Studio 2013/Projects - Beta/OCR1/bin/OCR1.DLL.
ERR: Failed to complete setup of assembly (hr = 0x8007000b). Probing terminated.`

Stack Trace:

`[BadImageFormatException: Could not load file or assembly ‘OCR1’ or one of its dependencies. An attempt was made to load a program with an incorrect format.]
System.Reflection.RuntimeAssembly._nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks) +0
System.Reflection.RuntimeAssembly.nLoad(AssemblyName fileName, String codeBase, Evidence assemblySecurity, RuntimeAssembly locationHint, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks) +37
System.Reflection.RuntimeAssembly.InternalLoadAssemblyName(AssemblyName assemblyRef, Evidence assemblySecurity, RuntimeAssembly reqAssembly, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean throwOnFileNotFound, Boolean forIntrospection, Boolean suppressSecurityChecks) +159
System.Reflection.RuntimeAssembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, IntPtr pPrivHostBinder, Boolean forIntrospection) +80
System.Reflection.RuntimeAssembly.InternalLoad(String assemblyString, Evidence assemblySecurity, StackCrawlMark& stackMark, Boolean forIntrospection) +22
System.Reflection.Assembly.Load(String assemblyString) +29
System.Web.Configuration.CompilationSection.LoadAssemblyHelper(String assemblyName, Boolean starDirective) +38

[ConfigurationErrorsException: Could not load file or assembly ‘OCR1’ or one of its dependencies. An attempt was made to load a program with an incorrect format.]
System.Web.Configuration.CompilationSection.LoadAssemblyHelper(String assemblyName, Boolean starDirective) +726
System.Web.Configuration.CompilationSection.LoadAllAssembliesFromAppDomainBinDirectory() +196
System.Web.Configuration.CompilationSection.LoadAssembly(AssemblyInfo ai) +45
System.Web.Compilation.BuildManager.GetReferencedAssemblies(CompilationSection compConfig) +172
System.Web.Compilation.BuildManager.GetPreStartInitMethodsFromReferencedAssemblies() +91
System.Web.Compilation.BuildManager.CallPreStartInitMethods(String preStartInitListPath, Boolean& isRefAssemblyLoaded) +111
System.Web.Compilation.BuildManager.ExecutePreAppStart() +156
System.Web.Hosting.HostingEnvironment.Initialize(ApplicationManager appManager, IApplicationHost appHost, IConfigMapPathFactory configMapPathFactory, HostingEnvironmentParameters hostingParameters, PolicyLevel policyLevel, Exception appDomainCreationException) +695

[HttpException (0x80004005): Could not load file or assembly ‘OCR1’ or one of its dependencies. An attempt was made to load a program with an incorrect format.]
System.Web.HttpRuntime.FirstRequestInit(HttpContext context) +660
System.Web.HttpRuntime.EnsureFirstRequestInit(HttpContext context) +89
System.Web.HttpRuntime.ProcessRequestNotificationPrivate(IIS7WorkerRequest wr, HttpContext context) +189`

@vaughnis

Have you tried using same application on another machine with different environment? Because we have used and shared same application with you and it was created from scratch. We are still unable to replicate these errors in our environment. The issue apparently looks related with System Configurations. Please share your complete environment details e.g. Windows OS Name and Version so that we can further try to investigate the issue and address it accordingly.

I thought maybe the issue was I have been running the code from the development environment (on my development PC). I published to our test server and I get the following message:

Server Error in ‘/’ Application.


Configuration Error

Description: An error occurred during the processing of a configuration file required to service this request. Please review the specific error details below and modify your configuration file appropriately.

Parser Error Message: The CodeDom provider type “Microsoft.CodeDom.Providers.DotNetCompilerPlatform.VBCodeProvider, Microsoft.CodeDom.Providers.DotNetCompilerPlatform, Version=2.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35” could not be located.

Source Error:

Line 49: <compilers> Line 50: <compiler language="c#;cs;csharp" extension=".cs" type="Microsoft.CodeDom.Providers.DotNetCompilerPlatform.CSharpCodeProvider, Microsoft.CodeDom.Providers.DotNetCompilerPlatform, Version=2.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" warningLevel="4" compilerOptions="/langversion:default /nowarn:1659;1699;1701" /> Line 51: <compiler language="vb;vbs;visualbasic;vbscript" extension=".vb" type="Microsoft.CodeDom.Providers.DotNetCompilerPlatform.VBCodeProvider, Microsoft.CodeDom.Providers.DotNetCompilerPlatform, Version=2.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" warningLevel="4" compilerOptions="/langversion:default /nowarn:41008 /define:_MYTYPE=\&quot;Web\&quot; /optionInfer+" /> Line 52: </compilers> Line 53: </system.codedom>

Source File: C:\inetpub\wwwroot\ALJ\ocr\web.config Line: 51

@vaughnis

The issue still seems related to missing DLLs and assembly references. It is very hard to replicate it in our environment. We will certainly assist you in resolving the issue if we could replicate it somehow at our end. If possible, please share such information that could help us in reproducing the error and then address it accordingly.