Trying to serach the content using TextFragmentAbsorber, Getting text fragements for single paragraph, a word but if search mutiple paragraph content its not working, Please provide any work around, Refer the img the content I need search and add highlight annoatation.
Capture.PNG (145.8 KB)
Would you please share your sample PDF with us as well? We will test the scenario in our environment and address it accordingly.
Sample.pdf (91.0 KB)
seraching the whole text in that file and getting empty text fragment.
refer the sample pdf file.
Can you please also share the code sample that you are using to extract the text? We will test the scenario in our environment and address it accordingly.
below sample code is used to search the text, If text content has two or more paragraph content then no text fragment in the list.
var tfa = new TextFragmentAbsorber(new Regex(textContent.Trim().Replace(" ", @"\s*").Replace("(", @"\(").Replace(")", @"\)").Replace(".", @"\.").Replace(":", @"\:").Replace("-", @"\-").Replace(' ', ' ').Replace("", "")), new TextSearchOptions(true));
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
tfa.TextSearchOptions = textSearchOptions;
doc.Pages.Accept(tfa);
We are assuming that you are copying all text from the PDF and assigning it to the textContent
variable. Right?
Yes, I will get the text content from html. thus html is converter to Pdf at first.
i.e string textContent = element.TextContent;
finding that text content in PDF. for single para, single word its working but for mutiple para is not searching.
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFNET-57723
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.