Hi Team!
There is an issue when searching for Hebrew text in PDF with regex. I have a C# Regex object with a Hebrew pattern and when I use the TextFragmentAbsorber it doesn’t find anything but the attached pdf file contains the text that should be matched.
When I extract the Page text with the TextAbsorber the searched word contains some spaces in the output text and I don’t know why.
Pdf file: test-dlp.pdf (46.4 KB)
Regex pattern: סודי
Sample Project: sample-project.zip (2.0 KB)
Aspose.PDF: 24.1.0
If I change the pattern (reverse and add spaces) then there is a match. Edited pattern: י ס וד
What is the reason the original pattern doesn’t match and why there are extra spaces in the extracted text?