Hi,
I am using below code to extract PDF content using below code,
TextAbsorber textAbsorber = new TextAbsorber();
doc.Pages.Accept(textAbsorber);
string extractedText = textAbsorber.Text;
I can see MSA audit and PCPA AUDIT found single time in attached SAMPLE_PDF.pdf. but same same word find mutiple time when using below code,
foreach (var tagmodel in ShapesWithSignerTagModel)
{
var pageArray = tagmodel.PageNo.Distinct();
TextFragmentAbsorber absorber = new TextFragmentAbsorber(tagmodel.Tag);
foreach (int page in pageArray)
{
doc.Pages[page].Accept(absorber);
}
textFragments.AddRange(absorber.TextFragments.AsEnumerable());
}
foreach (TextFragment textFragment in textFragments.OrderBy(a => a.Page.Number))
{
textFragment.Text = ""; // **MSA audit** and **PCPA AUDIT** found mutiple times
}
SAMPLE_PDF.pdf (106.5 KB)
I have uploaded full sample code in below URL,