I am using below code to extract PDF content using below code,
TextAbsorber textAbsorber = new TextAbsorber();
doc.Pages.Accept(textAbsorber);
string extractedText = textAbsorber.Text;
I can see MSA audit and PCPA AUDIT found single time in attached SAMPLE_PDF.pdf. but same same word find mutiple time when using below code,
foreach (var tagmodel in ShapesWithSignerTagModel)
{
var pageArray = tagmodel.PageNo.Distinct();
TextFragmentAbsorber absorber = new TextFragmentAbsorber(tagmodel.Tag);
foreach (int page in pageArray)
{
doc.Pages[page].Accept(absorber);
}
textFragments.AddRange(absorber.TextFragments.AsEnumerable());
}
foreach (TextFragment textFragment in textFragments.OrderBy(a => a.Page.Number))
{
textFragment.Text = ""; // **MSA audit** and **PCPA AUDIT** found mutiple times
}
The reason you are getting duplicate search results is that you are initializing the TextFragmentAbsorber Class outside the loop due to which previous results of found text do not get cleared from the cache. You need to initiate the instance inside the loop like below:
foreach (int page in pageArray)
{
TextFragmentAbsorber absorber = new TextFragmentAbsorber(tagmodel.Tag);
doc.Pages[page].Accept(absorber);
}
Feel free to let us know in case you still notice any issues.
We were able to replicate the issue in our environment while testing the scenario with 22.4 version of the API. Therefore, it has been logged as PDFNET-51732 in our issue management system. We will further look into its details and keep you posted with the status of its rectification. Please be patient and spare us some time.
The ticket has recently been logged in our issue management system and it is not yet fully investigated. We will analyze and resolve it on first come first serve basis as per free support policies and let you know as soon as we make some definite progress towards its resolution. Please be patient and spare us some time.
As shared earlier that the issue has not been yet investigated and without its full analysis we are afraid that we cannot share any reliable ETA or timeframe for its fix. Your concerns have been recorded and we will surely let you know in this forum thread once some updates are available in this regard. Please spare us some time.