Hi,
I’ve found an issue during processing pdf files on different platforms.
The problem occurs during the “textfragmentabsorber” instance matching by “accept” on a document’s page object.
The issue itself is: When matching end of line without additional characters from the next line, Windows and Linux works different ways.
The text we are trying to match: 7\r\n
Windows can match that, Linux cannot.
If we include a character from the next line: 7\r\n6
Both Windows and Linux works.
I also noticed that if I remove the \n character: 7\r
Both Windows and Linux works.
I also included a minimal test project, which will write out the matched texts for different regexes by unicode characters which will show the difference if run on different platforms.
Is there anything I could do to avoid this behavior or this is an issue which should be fixed on your side?
Thanks in advance
PdfBreakLine.zip (12.7 KB)