Extract Text by coordinates in a PDF but using OCR

Bart_Despontin · December 1, 2023, 10:59am

Is it possible to extract text by coordinates in a PDF using OCR a bit similar to Extracting text inside a rectangle|Documentation but then for a pdf instead of an image as source.

asad.ali · December 1, 2023, 9:58pm

@Bart_Despontin

We need to investigate these requirements. Can you please share a sample image and your expected output results from it? We will log an investigation ticket and share the ID with you.

asad.ali · December 4, 2023, 11:03pm

@Bart_Despontin

Meanwhile, below is the code snippet that can be used to achieve your requirements:

OcrInput input = new OcrInput(InputType.PDF);
input.Add(imgPath);

var result = api.Recognize(input, new RecognitionSettings
{
    RecognitionAreas = new List<Aspose.Drawing.Rectangle>
    {
        new Aspose.Drawing.Rectangle(10, 10, 200, 500)
    }
});

Bart_Despontin · December 5, 2023, 9:28am

Hi,

It a not a particular PDF but just in general if it was possible to extract text from a pdf for a certain rectangle region. Your answer above is enough, we were able to extract text from the document.

Thanks.

asad.ali · December 5, 2023, 1:14pm

@Bart_Despontin

It is nice to know that you were able to extract the text. In case you need further assistance, please feel free to create a new topic.