I’m extracting text from PDFs together with their coordinates, and ideally would like to get the coordinates of each character individually. I’ve tried using a TextFragmentAbsorber with a regular expression that matches a single character, and this almost works, but there appears to be a bug which means that when a word has a repeated character, on the second occurrence the returned coordinates are those of the first occurrence.
TextFragmentAbsorber tfa = new TextFragmentAbsorber(".") { TextSearchOptions = new TextSearchOptions(true) };pdf.Pages[page].Accept(tfa);foreach (TextFragment tf in tfa.TextFragments){Console.WriteLine("{0}: {1}", tf.Text, tf.Rectangle.LLX);}