The code in c# below gets the inner text of a highlighted annotation in a pdf but if the highlight is across two lines it picks up extra text that is not needed. For Example, if I highlight text at the end of a line and beginning of the following line, it seems to grabbing all of the text below the highlighted first line on the second line and all of the text above what is highlighted on the second line on the first line. It appears to grab text from a rectangle. Is there any work around to get this to work?
if (currentAnnotation is HighlightAnnotation highlightAnnotation)
{
var commentText = currentAnnotation.Contents;
// Create a TextAbsorber to extract text fragments that intersect with the annotation’s rectangle
TextAbsorber absorber = new TextAbsorber();
absorber.TextSearchOptions = new TextSearchOptions(true); // Enable regular expression search
absorber.TextSearchOptions.Rectangle = highlightAnnotation.Rect;
// Accept the absorber on the page to get the text fragments
page.Accept(absorber);
// Get the extracted text from the absorber
var innerText = absorber.Text;
var replaceTextAnnotation = new ReplaceTextAnnotation
{
InnerText = "Inner Text: " + innerText,
Contents = "Comment Text: " + commentText,
AnnotationType = currentAnnotation.GetType().ToString()
};
annotations.Add(replaceTextAnnotation);
}
HiglightTest.pdf (55.5 KB)
Here is the full code:
SetLicenseExample();
Document document = new Document(“filepath.pdf”);
PdfAnnotationEditor annotationEditor = new PdfAnnotationEditor();
annotationEditor.BindPdf(document);
var annotationTypes = new[] { AnnotationType.StrikeOut, AnnotationType.Highlight, AnnotationType.FreeText, AnnotationType.Caret, AnnotationType.Text };
var annotations = new List();
foreach (Page page in document.Pages)
{
var pageAnnotations = annotationEditor.ExtractAnnotations(page.Number, page.Number, annotationTypes);
for (int i = 0; i < pageAnnotations.Count; i++)
{
var currentAnnotation = pageAnnotations[i];
if (currentAnnotation is HighlightAnnotation highlightAnnotation)
{
var commentText = currentAnnotation.Contents;
// Create a TextAbsorber to extract text fragments that intersect with the annotation’s rectangle
TextAbsorber absorber = new TextAbsorber();
absorber.TextSearchOptions = new TextSearchOptions(true); // Enable regular expression search
absorber.TextSearchOptions.Rectangle = highlightAnnotation.Rect;
// Accept the absorber on the page to get the text fragments
page.Accept(absorber);
// Get the extracted text from the absorber
var innerText = absorber.Text;
var replaceTextAnnotation = new ReplaceTextAnnotation
{
InnerText = "Inner Text: " + innerText,
Contents = "Comment Text: " + commentText,
AnnotationType = currentAnnotation.GetType().ToString()
};
Console.WriteLine(innerText);
//Console.WriteLine(commentText);
annotations.Add(replaceTextAnnotation);
}
}
We were unable to resolve ReplaceTextAnnotation in your code snippet. Can you please share which API version are you using? Can you please share the missing definitions so that we can proceed with the testing?
Im using aspose.pdf nuget package 23.7.0, liscenseVersion 3.0, let me know if there is any more information you need.
Not all of the code is in the code snippet some is part of the original comment
SetLicenseExample();
Document document = new Document(“filepath.pdf”);
PdfAnnotationEditor annotationEditor = new PdfAnnotationEditor();
annotationEditor.BindPdf(document);
var annotationTypes = new[] { AnnotationType.StrikeOut, AnnotationType.Highlight, AnnotationType.FreeText, AnnotationType.Caret, AnnotationType.Text };
var annotations = new List();
foreach (Page page in document.Pages)
{
var pageAnnotations = annotationEditor.ExtractAnnotations(page.Number, page.Number, annotationTypes);
for (int i = 0; i < pageAnnotations.Count; i++)
{
var currentAnnotation = pageAnnotations[i];
if (currentAnnotation is HighlightAnnotation highlightAnnotation)
{
var commentText = currentAnnotation.Contents;
// Create a TextAbsorber to extract text fragments that intersect with the annotation’s rectangle
TextAbsorber absorber = new TextAbsorber();
absorber.TextSearchOptions = new TextSearchOptions(true); // Enable regular expression search
absorber.TextSearchOptions.Rectangle = highlightAnnotation.Rect;
// Accept the absorber on the page to get the text fragments
page.Accept(absorber);
// Get the extracted text from the absorber
var innerText = absorber.Text;
var replaceTextAnnotation = new ReplaceTextAnnotation
{
InnerText = "Inner Text: " + innerText,
Contents = "Comment Text: " + commentText,
AnnotationType = currentAnnotation.GetType().ToString()
};
Console.WriteLine(innerText);
//Console.WriteLine(commentText);
annotations.Add(replaceTextAnnotation);
}
}
}
{
public string Contents { get; set; }
public string InnerText { get; set; }
public string AnnotationType { get; set; }
// Add any other properties you want to store for the ReplaceText annotation.
}
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFNET-55155
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
Hi @asad.ali . Thank you again for taking the time to look at this issue. Were you able to recreate an instance of this problem? Also, do you have an eta of when this problem may be resolved?
Yes, we were able to notice the issue in our environment while using Aspose.PDF for .NET 23.7. Therefore, an issue is logged for the rectification. We are afraid that we do not have any ETA information at the moment because the ticket is not yet investigated. We will look into its details on a first come first serve basis and let you know once we make some progress in this regard. Please spare us some time.
We wanted to follow up to see if this is still an open issue, if it has been resolved in a recent update to the software? If it’s still an issue do you have an ETA for the resolution?
We are afraid that the earlier logged ticket could not get resolved due to other issues in the queue. Nevertheless, your concerns have been recorded and we will surely inform you once we make some definite progress towards ticket resolution. We highly appreciate your patience and comprehension in this regard. We apologize for the inconvenience.