Creating a Searchable PDF from a Scanned PDF File

Cengizkurban · November 11, 2025, 9:28am

Hi Everyone,
We are testing Aspose C# SDk to create Searchable PDF from a scanned PDF file. But we are getting only OCR text result. Is there any way to achive this task using the sdk? We can only see OCR to Image SDK in the list.

Professionalize.Discourse · November 11, 2025, 9:29am

@Cengizkurban

Creating a Searchable PDF from a Scanned PDF with Aspose.PDF for .NET

You can turn a scanned (image‑only) PDF into a searchable PDF by combining two Aspose products:

Product	Role
Aspose.PDF for .NET	Holds the final PDF document and can embed hidden text layers.
Aspose.OCR for .NET	Performs the optical‑character recognition on each scanned page.

Below is a complete, step‑by‑step example (C#) that:

Loads the scanned PDF.
Renders each page to an image (required for OCR).
Runs OCR on the image to obtain the text.
Adds the original image back to a new PDF page.
Overlays the OCR‑generated text as hidden (searchable) text.
Saves the result as a searchable PDF.

1. Install the required NuGet packages

Install-Package Aspose.PDF
Install-Package Aspose.OCR

2. Sample code

using System;
using Aspose.Pdf;
using Aspose.Pdf.Text;
using Aspose.OCR;
using Aspose.OCR.Models;

class SearchablePdfFromScannedPdf
{
    static void Main()
    {
        // -----------------------------------------------------------------
        // 1️⃣ Load the scanned PDF (image‑only)
        // -----------------------------------------------------------------
        string scannedPdfPath = @"C:\Input\ScannedDocument.pdf";
        Document scannedPdf = new Document(scannedPdfPath);

        // -----------------------------------------------------------------
        // 2️⃣ Create a new PDF that will hold the searchable result
        // -----------------------------------------------------------------
        Document searchablePdf = new Document();

        // -----------------------------------------------------------------
        // 3️⃣ Process each page
        // -----------------------------------------------------------------
        for (int pageIdx = 1; pageIdx <= scannedPdf.Pages.Count; pageIdx++)
        {
            // ----- a) Render the page to an image (Aspose.PDF rasterization) -----
            var rasterizer = new Aspose.Pdf.Devices.JpegDevice(300, 300);
            using (var imageStream = new System.IO.MemoryStream())
            {
                rasterizer.Process(scannedPdf.Pages[pageIdx], imageStream);
                imageStream.Position = 0;

                // ----- b) Run OCR on the rendered image -----
                var ocrEngine = new OcrEngine();
                ocrEngine.Image = new Aspose.OCR.Image(imageStream);
                OcrResult ocrResult = ocrEngine.Recognize();

                // ----- c) Add a new page to the result PDF -----
                var resultPage = searchablePdf.Pages.Add();

                // ----- d) Insert the original scanned image back onto the page -----
                var image = new Aspose.Pdf.Image
                {
                    ImageStream = new System.IO.MemoryStream(imageStream.ToArray())
                };
                resultPage.Paragraphs.Add(image);

                // ----- e) Overlay the OCR text as hidden text -----
                // The text is placed at (0,0) with a very small font size – it is
                // invisible but searchable.
                var hiddenFragment = new TextFragment(ocrResult.Text)
                {
                    Position = new Position(0, 0),
                    Font = new Font(FontFamily.Helvetica, 0.01f), // virtually invisible
                    TextState = { RenderingMode = TextRenderingMode.Invisible }
                };

                resultPage.Paragraphs.Add(hiddenFragment);
            }
        }

        // -----------------------------------------------------------------
        // 4️⃣ Save the searchable PDF
        // -----------------------------------------------------------------
        string outputPath = @"C:\Output\SearchableDocument.pdf";
        searchablePdf.Save(outputPath);

        Console.WriteLine($"Searchable PDF created at: {outputPath}");
    }
}

What the code does

Step	Explanation
Render page to image	`JpegDevice` (or `PngDevice`) rasterizes the scanned page because `Aspose.OCR` works with `Image` objects.
Run OCR	`OcrEngine` extracts the textual content from the image.
Add original image	The rasterized image is re‑added to the new PDF page so the visual appearance stays the same.
Add hidden text	A `TextFragment` with `RenderingMode = Invisible` (or a very tiny font) is placed on the same page. PDF viewers can index this text, making the PDF searchable.
Save	The final document is a normal PDF that looks like the scanned one but is fully searchable.

3. Important notes

Topic	Guidance
Image quality	Use a resolution of 300 DPI (or higher) for the rasterized image; this gives the OCR engine enough detail.
Language support	Set `ocrEngine.Config.Language = Language.English;` (or any other supported language) before `Recognize()` if your document is not English.
Performance	For large PDFs consider processing pages in parallel (`Parallel.For`) and re‑using a single `OcrEngine` instance.
Alternative	If you have Aspose.PDF Cloud, the same operation can be performed via the OCR API – see the cloud docs for `POST /pdf/{fileName}/ocr`.
License	Ensure you have a valid commercial license for both Aspose.PDF and Aspose.OCR; otherwise the output will contain evaluation watermarks.

        string tempDir = Path.Combine(Path.GetTempPath(), "PdfOcrTemp_" + Guid.NewGuid());
        Directory.CreateDirectory(tempDir);

        try
        {
            // OCR motoru
            AsposeOcr ocrEngine = new AsposeOcr();

            // PDF dosyası aç
            Document pdfDoc = new Document(inputPdf);

            // Son PDF
            Document resultPdf = new Document();

            int pageIndex = 1;
            foreach (Page pdfPage in pdfDoc.Pages)
            {
                // PDF sayfasını 300 DPI PNG olarak kaydet
                string imagePath = Path.Combine(tempDir, $"page_{pageIndex}.png");
                Resolution resolution = new Resolution(100);
                PngDevice pngDevice = new PngDevice(resolution);
                using (FileStream fs = new FileStream(imagePath, FileMode.Create))
                {
                    pngDevice.Process(pdfPage, fs);
                }

                // OCR için input oluştur
                OcrInput ocrInput = new OcrInput(InputType.SingleImage);
                ocrInput.Add(imagePath);

                // OCR işlemini çalıştır
                var ocrResults = ocrEngine.Recognize(ocrInput);

                string recognizedText = string.Empty;
                if (ocrResults != null && ocrResults.Count > 0)
                    recognizedText = ocrResults[0].RecognitionText;

                // Yeni sayfa oluştur
                Page newPage = resultPdf.Pages.Add();

                // Orijinal görseli ekle
                Image img = new Image
                {
                    File = imagePath,
                    FixWidth = pdfPage.Rect.Width,
                    FixHeight = pdfPage.Rect.Height
                };
                newPage.Paragraphs.Add(img);

                // OCR metni PDF sayfasına ekle (seçilebilir)
                TextFragment tf = new TextFragment(recognizedText);
                newPage.Paragraphs.Add(tf);

                pageIndex++;
            }

            // OCR sonrası PDF’i kaydet
            resultPdf.Save(outputPdf);

            Console.WriteLine("✅ OCR tamamlandı! Çıktı: " + outputPdf);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Hata oluştu: " + ex.Message);
        }
        finally
        {
            // Geçici dosyaları temizle
            try { Directory.Delete(tempDir, true); } catch { }
        }

        Console.WriteLine("Program sonlandı.");


    }
}

}

asad.ali · November 17, 2025, 5:09pm

@Cengizkurban

We have checked your document and noticed that the images contain text with different directions. We have logged an investigation ticket as OCRNET-1131 in our issue tracking system to analyze this scenario from Aspose.OCR perspective because it alone support generated searchable PDFs from a scanned PDF document.

We will look into the details of the logged ticket and keep you posted with the status of its resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.

Cengizkurban · November 17, 2025, 6:39pm

Hi,
But this is not only about a vertical pdf. We are having the same issue with all scanned documents. Would it be possible for tou look at are codes and let us what we are doing wrong. Because it should be working on an ordinery scanned pdf right?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml.Linq;
using Aspose.Pdf;
using Aspose.OCR;
using Aspose.Pdf.Devices;
using System.IO;
using Aspose.Pdf.Text;

namespace asposeocrsearchabele
{
class Program
{
static void Main(string[] args)
{
string inputPdf = @“C:\YesPDF\data\localweb\uploads\input\ss.pdf”;
string outputPdf = @“C:\YesPDF\data\localweb\uploads\input\ss_ocr.pdf”;

        string tempDir = Path.Combine(Path.GetTempPath(), "PdfOcrTemp_" + Guid.NewGuid());
        Directory.CreateDirectory(tempDir);

        try
        {
            // OCR motoru
            AsposeOcr ocrEngine = new AsposeOcr();

            // PDF dosyası aç
            Document pdfDoc = new Document(inputPdf);

            // Son PDF
            Document resultPdf = new Document();

            int pageIndex = 1;
            foreach (Page pdfPage in pdfDoc.Pages)
            {
                // PDF sayfasını 300 DPI PNG olarak kaydet
                string imagePath = Path.Combine(tempDir, $"page_{pageIndex}.png");
                Resolution resolution = new Resolution(100);
                PngDevice pngDevice = new PngDevice(resolution);
                using (FileStream fs = new FileStream(imagePath, FileMode.Create))
                {
                    pngDevice.Process(pdfPage, fs);
                }

                // OCR için input oluştur
                OcrInput ocrInput = new OcrInput(InputType.SingleImage);
                ocrInput.Add(imagePath);

                // OCR işlemini çalıştır
                var ocrResults = ocrEngine.Recognize(ocrInput);

                string recognizedText = string.Empty;
                if (ocrResults != null && ocrResults.Count > 0)
                    recognizedText = ocrResults[0].RecognitionText;

                // Yeni sayfa oluştur
                Page newPage = resultPdf.Pages.Add();

                // Orijinal görseli ekle
                Image img = new Image
                {
                    File = imagePath,
                    FixWidth = pdfPage.Rect.Width,
                    FixHeight = pdfPage.Rect.Height
                };
                newPage.Paragraphs.Add(img);

                // OCR metni PDF sayfasına ekle (seçilebilir)
                TextFragment tf = new TextFragment(recognizedText);
                newPage.Paragraphs.Add(tf);

                pageIndex++;
            }

            // OCR sonrası PDF’i kaydet
            resultPdf.Save(outputPdf);

            Console.WriteLine("✅ OCR tamamlandı! Çıktı: " + outputPdf);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Hata oluştu: " + ex.Message);
        }
        finally
        {
            // Geçici dosyaları temizle
            try { Directory.Delete(tempDir, true); } catch { }
        }

        Console.WriteLine("Program sonlandı.");


    }
}

}

asad.ali · November 18, 2025, 6:36am

@Cengizkurban

Below is the code snippet that can be used only with Aspose.OCR API in order to convert a scanned PDF document into searchable PDF:

  OcrInput ocrInput = new(InputType.PDF);
  ocrInput.Add("file.pdf");
  OcrOutput output = api.Recognize(ocrInput);
  output.Save("ResultFile.pdf", Aspose.OCR.SaveFormat.Pdf);

Creating a Searchable PDF from a Scanned PDF File

Creating a Searchable PDF from a Scanned PDF with Aspose.PDF for .NET

1. Install the required NuGet packages

2. Sample code

What the code does

3. Important notes

Further reading