Aspose.OCR results in skewed and stretched images in PDF

frimbingpickering · May 26, 2023, 10:09am

Hi there!

I have a problem using Aspose.OCR on images in combination with creating PDFs.

When I apply OCR to an image with Recognize and then use SaveMultipageDocument to save it as a PDF, I expect a PDF with the original image in it, in its original size, skew and aspect ratio, not a PDF with the (de)skewed and stretched image used for preprocessing.

In other words; from a data integrity perspective, the end result in the PDF should be the same as the input (plus any text found via OCR as a searchable text layer).

Unfortunately, that’s not the current behavior. Can you please let me know how achieve this?

Attached you will find a .NET Framework console app (ReproduceAsposeOcrIssue.zip (1.2 MB)) that mimics our approach. Note that I did not include the license, so you need to add your license as ReproduceAsposeOcrIssue.Properties.Resources.Aspose_Total_NET.

To easily observe the difference, the sample input file ( flower.JPG (332.6 KB)) and the incorrect resulting PDF (flower.pdf (3.9 MB)) are also attached. And I know this image does not contain text, but that’s beside the point.

Oh, one more thing, I tried to play around with the Aspose.OCR.Models.PreprocessingFilters.PreprocessingFilter, namely Scale and Resize, but
without any luck: see comments in ReproduceAsposeOcrIssue\ReproduceAsposeOcrIssue\AsposeOcrEngine\AsposeOcrEngine.cs.

Looking forward hearing from you!

asad.ali · May 26, 2023, 10:26pm

@frimbingpickering

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): OCRNET-682

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

asad.ali · August 14, 2023, 1:12pm

@frimbingpickering

We would like to share with you that the earlier logged ticket has been resolved and its fix-in version will be released during this week.

aspose.notifier · August 16, 2023, 11:13am

The issues you have found earlier (filed as OCRNET-682) have been fixed in this update. This message was posted using Bugs notification tool by anna.pylaieva

frimbingpickering · August 22, 2023, 8:36am

Hi!

Thanks for the effort and letting us know.
We will check out the new version and see if our issues are resolved.

Cheers!

frimbingpickering · August 22, 2023, 12:26pm

Hi again!

We ran some tests, here are the conclusions;

The image in the output PDF is no longer stretched
The canvas of the PDF page on which the image is displayed is still too large (see pdf.pdf (3.9 MB))
- How can we make the canvas match the image size? We want to produce a result like this: flower.pdf (17.0 KB)

Thanks in advance!

asad.ali · August 22, 2023, 8:16pm

@frimbingpickering

Would you please share some more details like sample code snippet that you used and source/output PDF document as well? We will further proceed to assist you accordingly.

frimbingpickering · September 6, 2023, 1:21pm

It should be reproducible with the same test program (ReproduceAsposeOcrIssue.zip) and file ( flower.JPG (332.6 KB))

Only thing I’ve changed is the Aspose.OCR NuGet package, which I’ve updated to 23.8.0.

asad.ali · September 6, 2023, 3:07pm

@frimbingpickering

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): OCRNET-725

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

frimbingpickering · November 2, 2023, 12:39pm

Are there any updates on this ticket? Would be great to use Aspose OCR to directly make representable PDFs.

asad.ali · November 2, 2023, 5:14pm

@frimbingpickering

We are sorry that the earlier logged ticket has not been yet resolved. However, we have already recorded your concerns and will surely update you as soon as we make some progress towards ticket resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.

asad.ali · November 3, 2023, 10:14pm

@frimbingpickering

We request you please use the latest available version of the API and see if the issue still persists. We believe that its fixed in the latest version.

frimbingpickering · November 7, 2023, 9:25am

I have ran a test on Aspose.OCR version 23.10.1 using a set of images, including the previously showcased flower image. This test was done using the same program used to reproduce the original issue, and produced the following result: flower.pdf (3.9 MB)

As you can see the resulting PDF still shows a skewed result. I ran this test on various data and found that some automatic processing is done on images to rotate, change the aspect ratio, or otherwise change the canvas size of the resulting image as a PDF.

asad.ali · November 7, 2023, 6:31pm

@frimbingpickering

Thanks for your feedback. We have updated the ticket information and will let you know in case we have further updates.

anna.pylaieva · January 31, 2024, 4:05pm

Hi @frimbingpickering, I’m a developer from Aspose.OCR team. Please try to use the latest release (24.1.0). I no longer get distortions in the output PDF file

frimbingpickering · February 6, 2024, 1:12pm

Hi @anna.pylaieva thank you for the update. I have ran my test set of images through a program using Aspose.OCR version 24.1.0 and have concluded that the issues do indeed seem to be resolved!

For example, the flower.pdf file which previously was a rotated, cropped and rescaled onto an A4-sized canvas version of the image is now represented 1:1 as seen here:
flower.pdf (3.9 MB)

Thank you once again for the update and the work in resolving this issue.

asad.ali · February 6, 2024, 10:49pm

@frimbingpickering

It is nice to know that your issue has been resolved. Please keep using the API and feel free to create a new topic in case you need further assistance.