Convert PDF to TXT using Aspose

Hello Awais,

I need to convert a PDF to TXT and I am trying to use either Aspose.Pdf (21.2.0 thru 20.11.0) or Aspose.OCR to do it.

If I try to use Aspose.Pdf, when I try to set the license to our Aspose Total license, I get Exception of type ‘System.Exception’ was thrown.

When I try to use Aspose.OCR (21.2.0), the license works but when I try to use it, i.e.

Dim recognitionEngine = New Aspose.OCR.AsposeOcr()
' Add image to the recognition batch
Dim source = New Aspose.OCR.OcrInput(Aspose.OCR.InputType.SingleImage)
source.Add(pdfFilePath)

I get “Aspose.OCR.OcrInput is not defined”. And -

' Perform OCR
Dim results As List(Of Aspose.OCR.RecognitionResult) = recognitionEngine.Recognize(source)
' Output recognized text
Console.WriteLine(results(0).RecognitionText)

Has “Recognize is not a member of ‘Aspose.OCR’” displayed.

I know our license expired in 2022, but shouldn’t both of these still be working? I’m using Microsoft.NETCore.App ver. 6.0.25.

Thank you.

~WRD0002.jpg (357 Bytes)

@rdaviessci You can use Aspose.Words for .NET to convert PDF to TXT. Please see the following code:

Document doc = new Document(@"C:\Temp\in.pdf");
doc.Save(@"C:\Temp\out.txt");

My colleagues from Aspose.PDF and Aspose.OCR teams will comment on the above shortly.

Hello, yes I can confirm that your suggested method works for a “standard” PDF, thanks. The problem is I am often dealing with PDFs that only contain a scanned document, so it is a PDF of an image. The result is then blank. That’s why I was trying to use Aspose.OCR. Is the syntax that I included above, which I got from your site, not the correct syntax for OCR version 21.2.0?

@rdaviessci

The Aspose.OCR code that you are using is correct. However, it is not compatible with the older version. Also, the support of processing scanned PDF document was not this mature or added in older versions of the API. Please note that we always recommend using the latest version of the API because it contains maximum support and fixes. You can obtain a 30-days free temporary license to test latest version of Aspose.OCR and let us know in case you still notice any issues. We will further proceed to assist you accordingly.

Ok, thanks on OCR. But I would still like to know what to do about this PDF issue:
If I try to use Aspose.Pdf, when I set the license to our Aspose.Total.NET.lic dated 1/5/2021, I get “Exception of type ‘System.Exception’ was thrown.”

I should still be able to use version 21.2.0 with that license, shouldn’t I?

@rdaviessci

Would you please share your license file with us in a private message? You can add your license file to .zip archive and attach in a private message. We will test the scenario in our environment and address it accordingly. Please click on the user name to find the message option.
image.png (15.2 KB)

Here it is.

@rdaviessci

We tested your license file and it worked fine in our environment with 21.2 version of the API. We are attaching a console application for your reference. Please restore/install NuGet Packages and place your license file in Debug folder and test. Please share your feedback if you still face any issues.
ConsoleApp1.zip (10.9 KB)