Read PDF Documents in C# .NET

igorguerra · August 11, 2021, 7:49pm

I’m using Aspose to read PDF Document in C# .NET. I extract data from bank statements. It works well for most banks, except Vancity.

It adds a lot of spaces in random places, for example, the word “package” became “pa cka ge”.

Unfortunately, I cannot post the file here because data is sensitive and C# code to read PDF file is pretty simple:

Read PDF Documents in C# .NET

// C# Code
// Create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();

// Accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);

// Get the extracted text
string extractedText = textAbsorber.Text;

Am I missing some option (parameters) or is this something that requires a code fix?

I tested 8 different solutions and Vancity statements work on iText, Spire PDF and IronPDF.

Thanks,

Igor

asad.ali · August 12, 2021, 9:18am

@igorguerra

Can you please also share the sample PDF with us so that we can test the scenario in our environment and address it accordingly.

igorguerra · August 12, 2021, 7:44pm

Unfortunately I cannot share the file because it’s confidential. Let me try to find a similar file that causes the same problem. I’ll get back to you tomorrow.

Thanks for the help!

asad.ali · August 13, 2021, 11:19am

@igorguerra

Please take your time to gather the sample PDF file and share it with us. We will test the case at our end and address it accordingly.

igorguerra · August 23, 2021, 1:21pm

Hi Asad,

I really tried finding any public PDF from Vancity that could be used to replicate the issue while reading PDF documents in C# .NET but I really couldn’t, they all worked. I understand that it’s very hard to fix a problem that you can’t simulate and unfortunately I cannot share the PDF I have because it’s confidential.

Not sure what else we can do here, but feel free to close the ticket if there’s no further possible action.

Thanks,

Igor

asad.ali · August 23, 2021, 6:03pm

@igorguerra

Thanks for writing back. It seems like the issue is related to a particular PDF file as you also mentioned that you were unable to replicate it using other PDF files. We are not closing this ticket as you are free to post here if you manage to gather any sample PDF by any chance. We will surely check the issue then and address it accordingly.