Currently, I need to search a PDF file for specific strings and I have one case in which I can’t match the string.
The case is with the string “Unaudited Capital Account Statement”. In the PDF, this same string, in between each word, the white space is actually a non breaking space, more precisely it contains the Unicode \u00A0. Because of this, I cannot match the string with the PDF content and it fails.
One possible solution for this case would be to replace all the occurrences of the non breaking space and replace them for " ".
While searching for the phrase “Unaudited Capital Account Statement”, you may please try “Unaudited\sCapital\sAccount\sStatement” that means any white space character may appear between words.
We hope this will be helpful. Please feel free to contact us if you need any further assistance.
Please share your sample PDF document so that we may investigate this scenario and help you out. Before getting back to us, please ensure using Aspose.PDF for .NET 19.11.