Hi
How can we convert Non-OCR PDF Documents to OCR PDF DOcuments when the document contains both Images and Text.
Also Aspose.OCR Supports OCR of Chinese documents with images and chinese text?
Hi
How can we convert Non-OCR PDF Documents to OCR PDF DOcuments when the document contains both Images and Text.
Also Aspose.OCR Supports OCR of Chinese documents with images and chinese text?
To convert non-OCR PDF documents that contain both images and text into OCR PDF documents, you can utilize Aspose.OCR. This library allows you to recognize text from PDF files, effectively creating a searchable and indexable document.
Here’s a general approach to achieve this:
OcrInput
class to add your PDF document for recognition.Here’s a sample code snippet in C# that demonstrates how to recognize text from a PDF:
using System;
using System.Collections.Generic;
using Aspose.OCR;
namespace Aspose.OCR.Examples
{
public class RecognizePdf
{
public static void Run()
{
// Initialize an instance of AsposeOcr
AsposeOcr api = new AsposeOcr();
// Create OcrInput object and add the PDF document
OcrInput input = new OcrInput(InputType.PDF);
input.Add("path_to_your_pdf.pdf", startPage: 1, pagesCount: 1); // Adjust page range as needed
// Perform OCR
List<RecognitionResult> results = api.Recognize(input, new RecognitionSettings { });
// Print the results
foreach (var result in results)
{
Console.WriteLine($"Recognized Text: {result.RecognitionText}");
}
}
}
}
Regarding your second question, yes, Aspose.OCR supports OCR for Chinese documents that contain images and Chinese text. You can specify the language during the recognition process to ensure accurate text extraction.
For more detailed guidance, you can refer to the official documentation on creating searchable PDFs from scanned documents, which outlines the process and provides additional examples here and here for Java-specific operations.
If you have any further questions or need additional assistance, feel free to ask!
Sources:
[1]: Creating a searchable PDF from the scan - Aspose Documentation
[2]: OCR Operations | Aspose.OCR Java API
Unable to OCR Chinese PDF Document to PDF.
getting error “Exception of type ‘System.Exception’ was thrown”
Would you kindly share the sample file that you are trying to work with. We will test the scenario in our environment and address it accordingly.