I’m using Aspose to read PDF Document in C# .NET. I extract data from bank statements. It works well for most banks, except Vancity.
It adds a lot of spaces in random places, for example, the word “package” became “pa cka ge”.
Unfortunately, I cannot post the file here because data is sensitive and C# code to read PDF file is pretty simple:
Read PDF Documents in C# .NET
// C# Code // Create TextAbsorber object to extract text TextAbsorber textAbsorber = new TextAbsorber(); // Accept the absorber for all the pages pdfDocument.Pages.Accept(textAbsorber); // Get the extracted text string extractedText = textAbsorber.Text;
Am I missing some option (parameters) or is this something that requires a code fix?
I tested 8 different solutions and Vancity statements work on iText, Spire PDF and IronPDF.