How to find text after "Invoice #:"

Hi…
I can find the text “Invoice #” but how do get the text after the “Invoice #” ? The next 10 Chars?
thx

@jon_elster_i3intel_com

You can use regular expression like below in TextFragmentAbsorber Class:

[invoice #]+[0-9]{0,10}

Please feel free to let us know in case you face any issues.

Like this ??? I get 700+ fragments ?

                TextFragmentAbsorber textAbsorber = new TextFragmentAbsorber(new System.Text.RegularExpressions.Regex(@"[Invoice:]+[0-9]{0,10}"));

PLS HELP…

@jon_elster_i3intel_com

Please share the sample PDF for our reference so that we can further proceed to assist you.

how do I share privately?

Text absorber not picking up Text. I have absorber.text = "Invoice: "
But my PDF has “Invoice: 12345”

any ideas? thx

@jon_elster_i3intel_com

A private message has been sent to you for you to share the file privately. You can reply to it while attaching your file.

@jon_elster_i3intel_com

We tried to use the regex i.e. [CERTIFICATE NUMBER:]+\s[0-9]{0,10} to find the text in your PDF but API was unable to extract the text. We also checked the regular expression on https://www.regextester.com/ and found that it was working fine. Therefore, it seems like API is not accepting/processing such kind of regular expression. An issue as PDFNET-51887 has been logged in our issue tracking system for further investigation. We will look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

If I get the text… from the TextAbsorber, I can’t even use a regex to find “CERTIFICATE”?

@jon_elster_i3intel_com

How you are trying to find the text after using TextAbsorber? Can you please share the code snippet?

Hi

I’m looking for 8 digit number like this. But some 8 digits are not the number.

Also the certificate number appears not on the same line in textAbsorber

So none of this works

Regex expression = new Regex(@"(?<!\d)\d{8}(?!\d)", RegexOptions.Multiline);

var results = expression.Matches(textAbsorber.Text.Replace("\n", " “).Replace(”\r", " "));

thx

@jon_elster_i3intel_com

While testing the case with 22.5 version of the API and the below code snippet, we managed to obtain 3 matches from this regular expression:

Document pdfDocument = new Document(dataDir + "110403(2 of 5) 419.pdf");
TextAbsorber tabsorber = new TextAbsorber();
pdfDocument.Pages.Accept(tabsorber);
string wholetext = tabsorber.Text;

Regex expression = new Regex(@"(?<!\d)\d{8}(?!\d)", RegexOptions.Multiline);

var results = expression.Matches(wholetext);

Can you please try to use the latest version of the API and let us know in case you still notice any issues.

Thx… but we were targeting 2 – Certificate Number only

@jon_elster_i3intel_com

Please extract the text using TextAbsorber and then copy all text in a .txt file and share that file with us. We will compare it with the results we are getting in our environment and will proceed further to assist you.