How to Ocr pdf file using OCR dll in vb.net


#1

Hi

Currently i am evaluate Aspose.OCR.dll with temporary license , i am try to ocr file,but nothing happen,pls send sample code,Below is my code which i tried.

'Create an instance of Document to load the PDF
Dim pdfDocument = New Aspose.Pdf.Document(“C:\Users\Aravind\Desktop\New folder\NOOCR.pdf”)
Dim wordLicense As Aspose.OCR.License = New Aspose.OCR.License()
wordLicense.SetLicense(“C:\Users\Aravind\Desktop\New folder\Aspose.OCR.lic”)

'Create an instance of OcrEngine for recognition
Dim ocrEngine = New Aspose.OCR.OcrEngine()

'Iterate over the pages of PDF
For pageCount As Integer = 1 To pdfDocument.Pages.Count
'Creating a MemoryStream to hold the image temporarily
Using imageStream = New System.IO.MemoryStream()
'Create Resolution object with DPI value
Dim resolution = New Aspose.Pdf.Devices.Resolution(300)

'Create JPEG device with specified attributes (Width, Height, Resolution, Quality)
'where Quality [0-100], 100 is Maximum
Dim jpegDevice = New Aspose.Pdf.Devices.JpegDevice(resolution, 100)

'Convert a particular page and save the image to stream
jpegDevice.Process(pdfDocument.Pages(pageCount), imageStream)

imageStream.Position = 0

'Set Image property of OcrEngine to the stream obtained from previous step
ocrEngine.Image = Aspose.OCR.ImageStream.FromStream(imageStream, Aspose.OCR.ImageStreamFormat.Jpg)

'Perform OCR operation on one page at a time
If ocrEngine.Process() Then
Console.WriteLine(ocrEngine.Text)

End If
End Using

#2
Hi Aravind,

Thank you for your inquiry and sharing code.

This is to update you that we need sample input PDF file that you have used at your end to look into this issue. This will help us to reproduce the issue. Please forward us the sample PDF file. We will test it at our end and update you about our findings in this thread.

For reference and details on how to perform OCR on PDF document, visit the link Performing OCR on PDF Documents.


#3

Hi

Currently how many languages support for OCR, is it support Chinese ?

Regards
Aravind

#4
Hi Aravind,

This is to update you that Aspose.OCR for .NET API currently supports the following languages.

  • English
  • Spanish
  • French
  • Portuguese

We had restructured Aspose.OCR API because of the performance issues and restructured API is much better as compared to the older one. Now our team is working on improving already supported features and languages.

A much more sophisticated engine is required to support Chinese, Japanese and Korean languages and the implementation of this feature has been postponed for a later release. We will not be able to start implementing this feature until existing features and languages acquire a certain level of maturity.

At the moment, we are not in a position to share any reliable ETA, however, we will update you once our product team brings this feature on their roadmap again. We are sorry for the inconvenience.


#5

Hi

Thank you for ur reply, may i know when going to release latest ocr api for new languages other than Chinese, Japanese and Korean languages.

Regards
Aravind

#6
Hi Aravind,

Our product team is working on improving already supported features and languages. There is no plan for support of new languages along with Chinese, Japanese and Korean languages. We will publish the monthly release (latest version) that will contain improvements and fixes for the bug reported by our customers, if any.


#7

Hi,
Pls send sample code in vb, here i attach sample pdf file
https://www.dropbox.com/s/8qcmzz44cyrlp03/a.pdf?dl=0


#8

@bpanchu,

We have added the code to perform OCR on PDF Documents in Visual Basic .NET below.

Dim pdfDocument = New Aspose.Pdf.Document(PDF File)
Dim ocrEngine = New Aspose.OCR.OcrEngine
Dim pageCount As Integer = 1
Do While (pageCount <= pdfDocument.Pages.Count)
    Dim imageStream = New System.IO.MemoryStream
    Dim resolution = New Aspose.Pdf.Devices.Resolution(300)
    Dim jpegDevice = New Aspose.Pdf.Devices.JpegDevice(resolution, 100)
    jpegDevice.Process(pdfDocument.Pages(pageCount), imageStream)
    imageStream.Position = 0
    ocrEngine.Image = Aspose.OCR.ImageStream.FromStream(imageStream, Aspose.OCR.ImageStreamFormat.Jpg)
    If ocrEngine.Process Then
        Console.WriteLine(ocrEngine.Text)
    End If

    pageCount = (pageCount + 1)
Loop

We hope that this answered your question. Please feel free to reach us if additional information is required.