Hello,
Here’s the code to extract the text from pdf.
string regexMatch = “[,0-9A-Za-z ]..[.)0-9a-z ]”;
var textFragmentAbsorber = new TextFragmentAbsorber(new Regex(@regexMatch), new TextSearchOptions(true));
textFragmentAbsorber.Phrase = “any information (including any technology, know”;
pdfDocument.Pages[replaceObj[i].pageSeq].Accept(textFragmentAbsorber);
textFragmentCollection = textFragmentAbsorber.TextFragments;
The issue is one of the following fragment which we are analyzing is having extra spaces -
“a. any information (including any technology, know-how, patent application, software, test”
Here is the file for your reference-
Nitrogen.pdf (591.5 KB)
Please let me know if you need any other information.
Regards,
Sumit Awasthi