Hi Team,
Hi Ganesan,
Hello,
I am using TextFragment to retrieve all text in pdf document but the
highlighted text does not have the BackgroundColor property set to the
color used to highlight (yellow).
Here’s my code :
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"[\S]+"); //gets all texts
//set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
//get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
List<TextFragment> lst =
textFragmentCollection.Cast<TextFragment>()
.Where(o => o.TextState.BackgroundColor == Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Yellow)).ToList();
=> My list is empty. I have words highlighted in yellow (using the Adobe Highlight) Am i missing something here ?
thank you for your help.
Sincerely,
Lucas
Hi Lucas,
Hello,
please find attached the pdf file.
Thank you for the support.
Hi Lucas,
I
have tested the scenario and I am able to reproduce the same problem that background color information for TextFragment is not being returned. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWNET-37503. We will
investigate this issue in details and will keep you updated on the status of a
correction. <o:p></o:p>
We apologize for your inconvenience.
Hi Lucas,
Thanks for your patience. We have further looked into your document and would like to update you that it is not possible to retrieve background color of text in the Pdf document. The background is set usually in a different non-specified manner (drawing operations, annotation). For example, your document does not contain background, but highlight annotation.
Moreover, please note that the value is not preserved as a text characteristic within the document. The BackgroundColor property of an object can be retrieved if in case it was explicitly set previously with BackgroundColor setter for the object.
Please feel free to contact us for any further assistance.
Best Regards,
Hello,
thank you for the response. I am actually using Highlight class to retrieve my highlighted text. Therefore I am still having a problem since I cannot know the content of the text. In other word it’s impossible to retrieve the text that have been highlighted in a pdf file. Is that exact ?
Thank you
Lucas
Hi Lucas,
for the response. But I am still not getting it. I am using the HighlightAnnotation type to retrieve my Highlighted 'rectangles" and it works ! but once this I cannot retrieve the content of it (the text).
Thank you very much for your help.
here’s my code :
Document pdfDoc = new Document(originalDocumentName);
for (int y = 1; y <= pdfDoc.Pages.Count; y++)
{
Page page = pdfDoc.Pages[y];
List<Aspose.Pdf.InteractiveFeatures.Annotations.Annotation> annotations =
page.Annotations.Cast<Aspose.Pdf.InteractiveFeatures.Annotations.Annotation>()
.Where(o => o.Color == Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Yellow)).ToList();
foreach (Aspose.Pdf.InteractiveFeatures.Annotations.Annotation annotation in annotations)
{
if (annotation is HighlightAnnotation)
{
Rectangle rect = annotation.Rect;
// No infos on the text or the content it self.
}
}
}
Hi Lucas,
Document pdfDocument = new Document(myDir
- “testAspose.pdf”);<o:p></o:p>
foreach (Page aPage in pdfDocument.Pages)
{
foreach (Aspose.Pdf.InteractiveFeatures.Annotations.Annotation anAnnotation in aPage.Annotations)
{
if (anAnnotation is HighlightAnnotation)
{
HighlightAnnotation linkAnno = (HighlightAnnotation)anAnnotation;
Aspose.Pdf.Rectangle rect = linkAnno.Rect;
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
absorber.TextSearchOptions.LimitToPageBounds = true;
absorber.TextSearchOptions.Rectangle = rect;
// accept the absorber for first page
aPage.Accept(absorber);
// get the extracted text
string extractedText = absorber.Text;
Console.Out.WriteLine("HighlightAnnotation text: {0}",extractedText);
}
}
}
pdfDocument.Dispose();
Please feel free to contact us for any further assistance.
Best Regards,
Hi codewarior
How do you Highlight text in a PDF file. Do you perhaps have code for that particular function.
Thanks for contacting support.
Please check following code snippet, in order to search and highlight text inside PDF document.
Document document = new Document(dataDir + "TestInputPDF.pdf");
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"Garrett\sNevels");
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
document.Pages.Accept(textFragmentAbsorber);
TextFragmentCollection textFragmentCollection1 = textFragmentAbsorber.TextFragments;
if (textFragmentCollection1.Count > 0)
{
foreach (TextFragment textFragment in textFragmentCollection1)
{
Aspose.Pdf.Annotations.HighlightAnnotation freeText = new Aspose.Pdf.Annotations.HighlightAnnotation(textFragment.Page, new Aspose.Pdf.Rectangle(textFragment.Position.XIndent, textFragment.Position.YIndent, textFragment.Position.XIndent + textFragment.Rectangle.Width, textFragment.Position.YIndent + textFragment.Rectangle.Height));
freeText.Opacity = 0.5;
freeText.Color = Aspose.Pdf.Color.FromRgb(0.6, 0.8, 0.98);
textFragment.Page.Annotations.Add(freeText);
}
}
document.Save(dataDir + "TestInputPDF_out.pdf");
In event of any further query, please feel free to let us know.
Hello Asad
I tried the above code. But it seems I intended to ask you how to Highlight fields in a PDF file and also provide comments on the highlighted field.
Thanks for writing back.
Please clarify if you are asking about form fields inside PDF Document. Would you please add some more details to your requirements by sharing a sample PDF with us, so that we can check the respective details at our side and share our feedback.