OCR .Net sometimes does not recognize blank spaces

image.png (13,8 KB)

When recognizing document images, we usually get the text without any problems. However, in the address area, many of the blank spaces are not recognized. In the attached screenshot, you can see a portion of the original image and the result. The question is: is there any way or filter to optimize the image so that these blank spaces are better recognized? Keep in mind that everything else is recognized correctly, the image is the correct size, etc.

@antfor

To improve the recognition of blank spaces in the address area when using Aspose.OCR, you can apply preprocessing filters to optimize the image before performing OCR. One effective approach is to use the AutoSkew filter, which can help align the text properly and may enhance the recognition of spaces.

Here’s a sample code snippet demonstrating how to set up the OCR process with preprocessing filters:

using System;
using System.Collections.Generic;
using Aspose.OCR.Models.PreprocessingFilters;

namespace Aspose.OCR.Examples
{
    public class PerformOCROnImage
    {
        public static void Run()
        {
            // Initialize an instance of AsposeOcr
            AsposeOcr api = new AsposeOcr();

            // Set preprocessing filters
            PreprocessingFilter filters = new PreprocessingFilter
            {
                PreprocessingFilter.AutoSkew()
            };

            // Create OcrInput object and add your image
            OcrInput input = new OcrInput(InputType.URL, filters);
            input.Add("your_image_url_here"); // Replace with your image URL

            // Recognize image
            List<RecognitionResult> result = api.Recognize(input, new RecognitionSettings
            {
                RecognizeSingleLine = false,
                // Define recognition areas if needed
                RecognitionAreas = new List<Aspose.Drawing.Rectangle>()
                {
                    new Aspose.Drawing.Rectangle(1, 3, 390, 70),
                    new Aspose.Drawing.Rectangle(1, 72, 390, 70)
                }
            });

            // Print result
            Console.WriteLine($"Text:\n {result[0].RecognitionText}");
            Console.WriteLine("Areas:");
            result[0].RecognitionAreasText.ForEach(a => Console.WriteLine($"{a}"));
            Console.WriteLine("Warnings:");
            result[0].Warnings.ForEach(w => Console.WriteLine($"{w}"));
            Console.WriteLine($"JSON: {result[0].GetJson()}");
        }
    }
}

In this example, you can replace "your_image_url_here" with the URL of your image. The AutoSkew filter is applied to help with the alignment, which may improve the recognition of spaces in the text. Additionally, you can explore other preprocessing filters available in the Aspose.OCR library to further enhance the image quality before recognition.

If you continue to experience issues with blank spaces not being recognized, consider experimenting with different preprocessing techniques or adjusting the recognition areas to better capture the text layout.