Microsoft Visual Studio Community 2022 (64-bit) - Current
Version 17.14.17
.net 4.7.2
Microsoft Visual Studio Community 2022 (64-bit) - Current
Version 17.14.17
.net 4.7.2
The ocr license has expired, may I please have an extension like I got with the pdf license. As with the pdf api, if the ocr will work for us we will purchase a license. Thank you.
Yes, this seems like licensing issue.
Yes, you can please post your request in our Purchase forum to get an extension for trial license.
Furthermore, about the earlier logged ticket - please check below code sample:
// 1) Build OCR input from the PDF path
var ocr = new AsposeOcr();
var settings = new RecognitionSettings
{
DetectAreasMode = DetectAreasMode.TABLE
};
var input = new OcrInput(InputType.PDF);
input.Add(dataDir + "Grace_Lutheran.pdf");
// 2) Run OCR (returns List<RecognitionResult>); take the first result
List<RecognitionResult> results = ocr.Recognize(input, settings);
if (results == null || results.Count == 0)
throw new InvalidOperationException("OCR returned no results.");
RecognitionResult result = results[0];
// 3) Save OCR result to XLSX in-memory
using (var xlsxStream = new MemoryStream())
{
result.Save(xlsxStream, SaveFormat.Xlsx);
xlsxStream.Position = 0;
// 4) Use Aspose.Cells to convert XLSX -> CSV (UTF-8, quoted)
var wb = new Aspose.Cells.Workbook(xlsxStream);
var csvOpts = new Aspose.Cells.TxtSaveOptions(Aspose.Cells.SaveFormat.Csv)
{
Separator = ',',
Encoding = Encoding.UTF8,
AlwaysQuoted = true
};
using (var csvStream = new MemoryStream())
{
wb.Save(dataDir + "output1.csv", csvOpts);
}
}
Aspose.OCR has functionality to recognize PDF files without conversion to PNG
and about csv - we will add this format in the OCR.SaveFormat . It will be available in the release 25.11.0.
Also, we will add a task for the RecognitionResult class to return a structure with cells (columns and rows containing coordinates and text). We only need your confirmation will this kind of result help you in achieving what you actually require?
THANK YOU AGAIN!
Being able to not make a png first is awesome!!
Second, if you are familiar with how the Aspose.pdf api works, you return an absorber object which absorbs a page, like recognize does. In the absorber, since I am looking for tables it returns a TableList of all tables it found on a page, each table has a rowlist and a celllist.
If you could keep that paradigm, one you already have and return tables as if they were absorbed, that would be great for everyone, as they already know what that is.
Heck, I’d even name the function “absorb”
Here is my function to return an absorbed table
Function getAbsorber(ByVal iPage As Aspose.Pdf.Page) As Aspose.Pdf.Text.TableAbsorber
Dim iReturn As Aspose.Pdf.Text.TableAbsorber
Try
iReturn = New Aspose.Pdf.Text.TableAbsorber
iReturn.Visit(iPage)
Catch ex As Exception
Console.WriteLine("getAbsorber Error: " & ex.Message)
End Try
Return iReturn
End Function
The tables are in the Aspose.Pdf.Text.TableAbsorber object. Tables have to be a widely needed feature, no?
When is 25.11.0 available?
I don’t need coordinates on a page, I need, and eveyone needs, coordinates within a table
Table(0).Rows(3).Cells(4)
That is what we need. ![]()
It ran! Seems you might need some error code for the ocr api for not a valid license maybe?
But, it seems to read the entire page as one grid, one table, not separate grids.
Here is the class and the pdf and the output.csv
bidtracer.zip (628.2 KB)
Imports System.Collections.Generic
Imports System.IO
Imports Aspose
Imports Aspose.Cells
Imports Aspose.Pdf
Imports Aspose.Pdf.Devices
Imports Aspose.Pdf.Text
Imports System
Imports System.Text
Imports OCR = Aspose.OCR
’ Intentionally avoid: Imports Aspose.OCR (to prevent name shadowing with this class name).
’ Use fully-qualified names for Aspose.OCR types.
Public Class AsposeOCR
Sub New()
SetAsposeLicense()
End Sub
Public Shared Sub PerformOCROnPDFTable(ByRef dataDir As String, ByRef FIleName As String)
' 1) Render the page to PNG bytes
Dim pngBytes As Byte() = RenderPageToPng(Path.Combine(dataDir, FIleName), 1, 300)
' 2) Build OCR input from the PNG stream
Dim ocr = New OCR.AsposeOcr()
Dim settings As New OCR.RecognitionSettings() With {
.DetectAreasMode = .DetectAreasMode.TABLE
}
Using ms As New MemoryStream(pngBytes)
Dim input As New OCR.OcrInput(Aspose.OCR.InputType.SingleImage)
ms.Position = 0
input.Add(ms)
' 3) Run OCR (returns List(Of RecognitionResult)); take the first result
Console.WriteLine("Start Recognize: " & Date.Now.ToString("yyyy-MM-dd HH:mm:ss"))
Dim results As System.Collections.Generic.List(Of OCR.RecognitionResult) = ocr.Recognize(input, settings)
Console.WriteLine("End Recognize: " & Date.Now.ToString("yyyy-MM-dd HH:mm:ss"))
If results Is Nothing OrElse results.Count = 0 Then
Throw New InvalidOperationException("OCR returned no results.")
End If
Dim result As OCR.RecognitionResult = results(0)
' 4) Save OCR result to XLSX in-memory
Using xlsxStream As New MemoryStream()
result.Save(xlsxStream, Aspose.OCR.SaveFormat.Xlsx)
xlsxStream.Position = 0
' 5) Use Aspose.Cells to convert XLSX -> CSV (UTF-8, quoted)
Dim wb As New Workbook(xlsxStream)
Dim csvOpts As New Aspose.Cells.TxtSaveOptions(Aspose.Cells.SaveFormat.Csv) With {
.Separator = ","c,
.Encoding = Encoding.UTF8,
.AlwaysQuoted = True
}
' Save CSV to disk (e.g., App_Data\output.csv)
Dim outCsvPath As String = Path.Combine(dataDir, "output.csv")
wb.Save(outCsvPath, csvOpts)
End Using
End Using
End Sub
Private Shared Function RenderPageToPng(pdfPath As String, pageNumber As Integer, dpi As Integer) As Byte()
Using doc As New Document(pdfPath)
If pageNumber < 1 OrElse pageNumber > doc.Pages.Count Then
Throw New ArgumentOutOfRangeException(NameOf(pageNumber),
$"Page {pageNumber} is out of range. Document has {doc.Pages.Count} pages.")
End If
Dim res As New Resolution(dpi)
Dim device As New PngDevice(res)
Using outMs As New MemoryStream()
device.Process(doc.Pages(pageNumber), outMs)
Return outMs.ToArray()
End Using
End Using
End Function
' Extract all tables from a single flattened PDF page.
' Returns: List(Of Table) where Table = List(Of Row), Row = String() cells
Public Function ExtractTablesFromFlattenedPdfPage(pdfPath As String, pageNumber As Integer) As List(Of List(Of String()))
Dim allTables As New List(Of List(Of String()))()
Using doc As New Document(pdfPath)
If pageNumber < 1 OrElse pageNumber > doc.Pages.Count Then
Return allTables
End If
Dim page As Page = doc.Pages(pageNumber)
' Only operate on flattened pages (no selectable text)
If Not IsFlattened(page) Then
Return allTables
End If
' 1) Rasterize page to PNG bytes (higher DPI helps table detection)
Dim pngBytes As Byte() = RenderPageToPng(doc, pageNumber, 250)
' 2) Build OCR input from memory
Dim input As New Aspose.OCR.OcrInput(Aspose.OCR.InputType.SingleImage)
Using ms As New MemoryStream(pngBytes)
input.Add(ms)
End Using
' 3) Recognize with table detection enabled
Dim settings As New Aspose.OCR.RecognitionSettings()
settings.DetectAreasMode = Aspose.OCR.DetectAreasMode.TABLE
Dim engine As New Aspose.OCR.AsposeOcr()
Dim results As IList(Of Aspose.OCR.RecognitionResult)
Try
results = engine.Recognize(input, settings)
Catch ex As Exception
Dim i As String = ex.Message
End Try
If results Is Nothing OrElse results.Count = 0 Then
Return allTables
End If
' 4) Wrap into OcrOutput for flexible saving
Dim output As New Aspose.OCR.OcrOutput()
For Each r In results
output.Add(r)
Next
' 5) Preferred: AI table saver to Markdown (single-parameter SaveMd)
Dim tmpMd As String = Path.Combine(Path.GetTempPath(), $"aspose_tables_p{pageNumber}_{Guid.NewGuid():N}.md")
Dim saved As Boolean = False
Try
Dim tableAI As New Aspose.OCR.AI.TableAIProcessor(Aspose.OCR.AI.AITableDetectionMode.AUTO)
tableAI.SaveMd(tmpMd) ' <- correct signature: only the filename
saved = File.Exists(tmpMd)
Catch
saved = False
End Try
' Fallback: generic Markdown export (layout-based) if AI saver unavailable
If Not saved Then
Try
output.Save(tmpMd, Aspose.OCR.SaveFormat.Md)
saved = File.Exists(tmpMd)
Catch
saved = False
End Try
End If
' 6) Parse Markdown pipe tables -> rows/cells
If saved Then
Dim md As String = File.ReadAllText(tmpMd)
allTables.AddRange(ParseMarkdownPipeTables(md))
Try : File.Delete(tmpMd) : Catch : End Try
End If
End Using
Return allTables
End Function
' ----------------- Helpers -----------------
' Flattened page = no selectable text fragments
Private Function IsFlattened(p As Page) As Boolean
Dim tfa As New TextFragmentAbsorber()
p.Accept(tfa)
Return (tfa.TextFragments Is Nothing OrElse tfa.TextFragments.Count = 0)
End Function
' Render a PDF page to PNG bytes at a given DPI
Private Function RenderPageToPng(doc As Document, pageNumber As Integer, dpi As Integer) As Byte()
Dim res As New Resolution(dpi)
Dim dev As New PngDevice(res)
Using ms As New MemoryStream()
dev.Process(doc.Pages(pageNumber), ms)
Return ms.ToArray()
End Using
End Function
' -------- Markdown parsing: pipe tables -> rows/cells --------
' Accepts GitHub-style pipe tables like:
' | h1 | h2 |
' | --- | --- |
' | c11 | c12 |
Private Function ParseMarkdownPipeTables(md As String) As List(Of List(Of String()))
Dim tables As New List(Of List(Of String()))()
Using sr As New StringReader(md)
Dim line As String = sr.ReadLine()
While line IsNot Nothing
If IsPipeRow(line) Then
Dim header As String = line
Dim sep As String = sr.ReadLine()
If sep IsNot Nothing AndAlso IsSeparatorRow(sep) Then
Dim rows As New List(Of String())()
' include header row (remove next line if you prefer data-only)
rows.Add(SplitPipeRow(header))
Dim peek As String = sr.ReadLine()
While peek IsNot Nothing AndAlso IsPipeRow(peek) AndAlso Not IsSeparatorRow(peek)
rows.Add(SplitPipeRow(peek))
peek = sr.ReadLine()
End While
tables.Add(rows)
line = peek
Continue While
End If
End If
line = sr.ReadLine()
End While
End Using
Return tables
End Function
Private Function IsPipeRow(s As String) As Boolean
If s Is Nothing Then Return False
Dim t = s.Trim()
Return t.StartsWith("|") AndAlso t.EndsWith("|") AndAlso t.Contains("|")
End Function
Private Function IsSeparatorRow(s As String) As Boolean
If s Is Nothing Then Return False
Dim t = s.Trim()
If Not IsPipeRow(t) Then Return False
' A separator row has only -, :, spaces between pipes (e.g., | --- | :---: |)
Dim parts = t.Split("|"c)
For Each part In parts
Dim p = part.Trim().Replace("-", "").Replace(":", "")
If p.Length > 0 Then Return False
Next
Return True
End Function
Private Function SplitPipeRow(row As String) As String()
Dim inner = row.Trim()
If inner.StartsWith("|") Then inner = inner.Substring(1)
If inner.EndsWith("|") Then inner = inner.Substring(0, inner.Length - 1)
Dim cells = inner.Split("|"c)
For i = 0 To cells.Length - 1
cells(i) = cells(i).Trim()
Next
Return cells
End Function
''Mase Woods
''11/06/2025
''Function to set Aspose dll license
''Replace license path with actual path
Public Sub SetAsposeLicense()
' Load the license file
Try
Dim LicensePDF As Aspose.Pdf.License = New Aspose.Pdf.License()
Dim licFilePDF As String = AppDomain.CurrentDomain.BaseDirectory & "Aspose.PDF.lic"
LicensePDF.SetLicense(licFilePDF)
Dim LicenseOCR As Aspose.OCR.License = New Aspose.OCR.License()
Dim licFileOCR As String = AppDomain.CurrentDomain.BaseDirectory & "Aspose.OCR.lic"
LicenseOCR.SetLicense(licFileOCR)
Catch ex As Exception
Dim err As String = ex.Message
End Try
End Sub
End Class
Thanks for sharing all the details. We have gathered them and updated the ticket information accordingly. We will go through these and as soon as we have some feedback to share, we will update you. Please spare us some time.
PS: 25.11 will be released this month i.e. November 2025.