Hello,
Is it possible to get the top and left positions of any paragraph without explicitly setting these values? I’ve tried getting these values by getting the .Top and .Left values but they’re returning -1, which I’m assuming is because I haven’t explicitly set them?
Hi James,
Thanks for your interest in our products.
Aspose.Pdf for .NET supports the feature to find/search particular Text Segment in a PDF document and it also supports the feature to get position attributes of the searched string. Please visit the following link for further information on Search and Get Text from a Single Page of a PDF Document
In case I have not properly understood your requirement, please share some further details.
Thanks for the reply, that’s roughly what I’m looking for.
Would I be right in saying that the .Position property returns the coordinates of the bottom-left of the string?
Hi James,
//open document<o:p></o:p>
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document("d:/pdftest/TestKDL.pdf");
//create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("betalen");
//accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
//get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
Console.WriteLine("Text : {0} ", textFragment.Text);
// get lower left X of TextFragment
Console.WriteLine("LowerLeft X : {0} ", textFragment.Rectangle.LLX);
// get lower left Y of TextFragment
Console.WriteLine("LowerLeft Y : {0} ", textFragment.Rectangle.LLY);
// get Height of Text Fragment
Console.WriteLine("Height of Text Fragment : {0} ", textFragment.Rectangle.Height);
// The Top-Left position will be LowerLeftY + Height of TextFragment
Console.WriteLine("Left-Top : {0} ", textFragment.Rectangle.LLY + textFragment.Rectangle.Height);
}
In case you still face any problem or you have any further query, please feel free to contact.
Hi Nayyer,
The rectangle property is exactly what I need- thank you.
However, I’ve found that it only seems to work for short strings (such as “hello hello hello”) and not for a longer string (such as “Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse
adipiscing lacus ac metus tristique laoreet. Nulla nec viverra arcu.
Nulla leo leo, tincidunt et congue a, elementum sed leo. Curabitur at
sapien at lectus scelerisque cursus. Praesent vitae nunc massa, non
condimentum ipsum. Cras a est arcu. Sed mattis quam et elit hendrerit
mollis. Nam scelerisque sem ac nisi placerat quis pretium orci vehicula.
Quisque laoreet hendrerit magna sed pretium. Nunc iaculis auctor
posuere. Ut nisi massa, tempus et rhoncus non, suscipit in purus.”).
Is this a limitation of the API, or am I doing something wrong? I’ve attached a copy of my code just in case it’s the latter.
Hi James,
Hi James,
have tested this scenario in details and have found that when trying to retrieve the position of TextFragment with more than 60 characters, the position information is not being retrieved. However if the length of TextFragment is less than or equal to 60, the position information is being returned. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWNET-34329. We
will investigate this issue in details and will keep you updated on the status
of a correction.
apologize for your inconvenience.
Hi Nayyer,
Thank you for the update. Do you have an estimate of how long this will take to correct?
Hi James,
Hi James,
Document pdfDocument = new Document(inputFile);<o:p></o:p>
TextFragmentAbsorber absorber = new TextFragmentAbsorber("Charges for Exchange of Bonds; Negotiability; Registration, Transfer, Exchange.");
pdfDocument.Pages[2].Accept(absorber);
TextFragmentCollection fragmentCollection = absorber.TextFragments;
foreach (TextFragment tf in fragmentCollection)
{
Debug.WriteLine(tf.Rectangle.LLY);
Debug.WriteLine(tf.Rectangle.Height);
}
The explanation of fact that text "AUTHORIZATION, EXECUTION, AUTHENTICATION, REGISTRATION AND D" can not be absorbed because the last character of string 'D' is located on the another line. However TextFragmentAbsorber works only in the bounds of one line of text. This isn't a bug in our API. In one of the similar issues reported by another customer where he faced following two issues.
- It doesn't recognize text while it splits on different line due to text wrapping
- It doesn't recognize second instance on a line but searches and updates twice first instance
In the matter of "second instance on a line" problem. We have improved some features and now several nearby fragments processed well.
Now concerning to the scenario of "searching of text that splits on different line" problem, unfortunately the structure of text in PDF is that an every line of text is separate unit. TestFragment cannot process two parts of phrase that are on different lines as organic whole. It is not a bug.
For such scenarios, we recommend using following code snippet which processes phrase and parts of phrase separately.
[C#]
Document doc = new Document(inputFile);
//Use TextFragmentAbsorber for regular expression search
//to find "RTF text" and "RTF" in the end of line and "text" in the head of line.
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"(RTF\stext)|(RTF\s\r\n)|(^text)");
textFragmentAbsorber.TextSearchOptions.IsRegularExpressionUsed = true;
doc.Pages[1].Accept(textFragmentAbsorber);
//Prepare collection for selected fragments
TextFragmentCollection textFragmentCollection = new TextFragmentCollection();
//Select apropriate fragments
for (int i = 1; i <= textFragmentAbsorber.TextFragments.Count; i++)
{
TextFragment fragment = textFragmentAbsorber.TextFragments[i];
string text = fragment.Text;
Debug.WriteLine(text);
if (text == "RTF text")
{
textFragmentCollection.Add(textFragmentAbsorber.TextFragments[i]);
}
if (i == 1) continue;
TextFragment prevFragment = textFragmentAbsorber.TextFragments[i - 1];
if (text == "text" && prevFragment.Text == "RTF ")
{
textFragmentCollection.Add(prevFragment);
textFragmentCollection.Add(fragment);
}
}
//loop through the selected fragments and create hyperlinks
foreach (TextFragment textFragment in textFragmentCollection)
{
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.Blue;
Aspose.Pdf.InteractiveFeatures.Annotations.LinkAnnotation link
= new Aspose.Pdf.InteractiveFeatures.Annotations.LinkAnnotation(textFragment.Page, textFragment.Rectangle);
Aspose.Pdf.InteractiveFeatures.Annotations.Border border
= new Aspose.Pdf.InteractiveFeatures.Annotations.Border(link);
border.Width = 1;
link.Border = border;
link.Destination = new Aspose.Pdf.InteractiveFeatures.XYZExplicitDestination(textFragment.Page, 0, 100, 0);
textFragment.Page.Annotations.Add(link);
}
doc.Save(outFile);