I’m using Aspose to read PDF Document in C# .NET. I extract data from bank statements. It works well for most banks, except Vancity.
It adds a lot of spaces in random places, for example, the word “package” became “pa cka ge”.
Unfortunately, I cannot post the file here because data is sensitive and C# code to read PDF file is pretty simple:
Read PDF Documents in C# .NET
// C# Code
// Create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
// Accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);
// Get the extracted text
string extractedText = textAbsorber.Text;
Am I missing some option (parameters) or is this something that requires a code fix?
I tested 8 different solutions and Vancity statements work on iText, Spire PDF and IronPDF.
Thanks,
Igor