Position of paragraphs

Hello,

Is it possible to get the top and left positions of any paragraph without explicitly setting these values? I’ve tried getting these values by getting the .Top and .Left values but they’re returning -1, which I’m assuming is because I haven’t explicitly set them?

Hi James,

Thanks for your interest in our products.

Aspose.Pdf for .NET supports the feature to find/search particular Text Segment in a PDF document and it also supports the feature to get position attributes of the searched string. Please visit the following link for further information on
Search and Get Text from a Single Page of a PDF Document

In case I have not properly understood your requirement, please share some further details.

Thanks for the reply, that’s roughly what I’m looking for.

Would I be right in saying that the .Position property returns the coordinates of the bottom-left of the string?

Hi James,


Yes you are correct. The position property returns Bottom-Left coordinates. However we also have a property named Rectangle which provides the capability to get Height, Width, LowerLeftX, LoweLeftY values of TextFragment. So you may consider using textFragment.Rectangle.LLX property to get the Left position and textFragment.Rectangle.Height property to get the Top value of TextFragment. Please try using the following code snippet to accomplish your requirement.

[C#]

//open document<o:p></o:p>

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document("d:/pdftest/TestKDL.pdf");

//create TextAbsorber object to find all instances of the input search phrase

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("betalen");

//accept the absorber for all the pages

pdfDocument.Pages.Accept(textFragmentAbsorber);

//get the extracted text fragments

TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

//loop through the fragments

foreach (TextFragment textFragment in textFragmentCollection)

{

Console.WriteLine("Text : {0} ", textFragment.Text);

// get lower left X of TextFragment

Console.WriteLine("LowerLeft X : {0} ", textFragment.Rectangle.LLX);

// get lower left Y of TextFragment

Console.WriteLine("LowerLeft Y : {0} ", textFragment.Rectangle.LLY);

// get Height of Text Fragment

Console.WriteLine("Height of Text Fragment : {0} ", textFragment.Rectangle.Height);

// The Top-Left position will be LowerLeftY + Height of TextFragment

Console.WriteLine("Left-Top : {0} ", textFragment.Rectangle.LLY + textFragment.Rectangle.Height);

}


In case you still face any problem or you have any further query, please feel free to contact.

Hi Nayyer,

The rectangle property is exactly what I need- thank you.

However, I’ve found that it only seems to work for short strings (such as “hello hello hello”) and not for a longer string (such as “Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse
adipiscing lacus ac metus tristique laoreet. Nulla nec viverra arcu.
Nulla leo leo, tincidunt et congue a, elementum sed leo. Curabitur at
sapien at lectus scelerisque cursus. Praesent vitae nunc massa, non
condimentum ipsum. Cras a est arcu. Sed mattis quam et elit hendrerit
mollis. Nam scelerisque sem ac nisi placerat quis pretium orci vehicula.
Quisque laoreet hendrerit magna sed pretium. Nunc iaculis auctor
posuere. Ut nisi massa, tempus et rhoncus non, suscipit in purus.”).

Is this a limitation of the API, or am I doing something wrong? I’ve attached a copy of my code just in case it’s the latter.

Hi James,


I am testing this scenario over my end and will get back to you soon. We are sorry for this inconvenience.

Hi James,


Thanks for your patience.

I
have tested this scenario in details and have found that when trying to retrieve the position of
TextFragment with more than 60 characters, the position information is not being retrieved. However if the length of TextFragment is less than or equal to 60, the position information is being returned. For the
sake of correction, I have logged it in our issue tracking system as
PDFNEWNET-34329. We
will investigate this issue in details and will keep you updated on the status
of a correction.

We
apologize for your inconvenience.

Hi Nayyer,

Thank you for the update. Do you have an estimate of how long this will take to correct?

Hi James,


As we just have been able to discover this issue, so I am afraid currently we are not able to share any ETA regarding its resolution. However as soon as we have made some progress towards its resolution, we would be more than happy to update you with the status of correction. Please be patient and spare us little time. We are sorry for this delay and inconvenience.

Hi James,


Thanks for your patience.

The development team has further investigated the issue and we are unable to reproduce the issue with latest release of Aspose.Pdf for .NET 10.0.0. There is no any problem with absorption of text fragment containing more than 60 characters. For example “Charges for Exchange of Bonds; Negotiability; Registration, Transfer, Exchange.” fragment from the same page contains 79 characters. It can be absorbed successfully and returns correct position while using following code snippet.

[C#]

Document pdfDocument = new Document(inputFile);<o:p></o:p>

TextFragmentAbsorber absorber = new TextFragmentAbsorber("Charges for Exchange of Bonds; Negotiability; Registration, Transfer, Exchange.");

pdfDocument.Pages[2].Accept(absorber);

TextFragmentCollection fragmentCollection = absorber.TextFragments;

foreach (TextFragment tf in fragmentCollection)

{

Debug.WriteLine(tf.Rectangle.LLY);

Debug.WriteLine(tf.Rectangle.Height);

}

The explanation of fact that text "AUTHORIZATION, EXECUTION, AUTHENTICATION, REGISTRATION AND D" can not be absorbed because the last character of string 'D' is located on the another line. However TextFragmentAbsorber works only in the bounds of one line of text. This isn't a bug in our API. In one of the similar issues reported by another customer where he faced following two issues.

  • It doesn't recognize text while it splits on different line due to text wrapping
  • It doesn't recognize second instance on a line but searches and updates twice first instance

In the matter of "second instance on a line" problem. We have improved some features and now several nearby fragments processed well.

Now concerning to the scenario of "searching of text that splits on different line" problem, unfortunately the structure of text in PDF is that an every line of text is separate unit. TestFragment cannot process two parts of phrase that are on different lines as organic whole. It is not a bug.


For such scenarios, we recommend using following code snippet which processes phrase and parts of phrase separately.

[C#]

Document doc = new Document(inputFile);

//Use TextFragmentAbsorber for regular expression search

//to find "RTF text" and "RTF" in the end of line and "text" in the head of line.

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"(RTF\stext)|(RTF\s\r\n)|(^text)");

textFragmentAbsorber.TextSearchOptions.IsRegularExpressionUsed = true;

doc.Pages[1].Accept(textFragmentAbsorber);

//Prepare collection for selected fragments

TextFragmentCollection textFragmentCollection = new TextFragmentCollection();

//Select apropriate fragments

for (int i = 1; i <= textFragmentAbsorber.TextFragments.Count; i++)

{

TextFragment fragment = textFragmentAbsorber.TextFragments[i];

string text = fragment.Text;

Debug.WriteLine(text);

if (text == "RTF text")

{

textFragmentCollection.Add(textFragmentAbsorber.TextFragments[i]);

}

if (i == 1) continue;

TextFragment prevFragment = textFragmentAbsorber.TextFragments[i - 1];

if (text == "text" && prevFragment.Text == "RTF ")

{

textFragmentCollection.Add(prevFragment);

textFragmentCollection.Add(fragment);

}

}

//loop through the selected fragments and create hyperlinks

foreach (TextFragment textFragment in textFragmentCollection)

{

textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.Blue;

Aspose.Pdf.InteractiveFeatures.Annotations.LinkAnnotation link

= new Aspose.Pdf.InteractiveFeatures.Annotations.LinkAnnotation(textFragment.Page, textFragment.Rectangle);

Aspose.Pdf.InteractiveFeatures.Annotations.Border border

= new Aspose.Pdf.InteractiveFeatures.Annotations.Border(link);

border.Width = 1;

link.Border = border;

link.Destination = new Aspose.Pdf.InteractiveFeatures.XYZExplicitDestination(textFragment.Page, 0, 100, 0);

textFragment.Page.Annotations.Add(link);

}

doc.Save(outFile);