TextFragment Rectangle must be rotated 90° when used as TextAbsorber Rectangle

I have been testing out Aspose PDF for use with an internal project that requires parsing some PDFs. The general method is to use a TextFragmentAbsorber to search for a label, then use a TextAbsorber to search around that label for possible values.


For most PDFs this works fine, but for certain PDFs I’ve found through extensive debugging that the Rectangle passed to the TextAbsorber must be rotated 90° clockwise around the center of the PDF page in order to find the same text found with the TextFragmentAbsorber.

For example, if I have a 1700x2200 PDF page and my TextFragmentAbsorber returns a fragment for the text “Name:” with a Rectangle of (LLX: 160, LLY: 1960, URX: 235, URY: 1980), the TextAbsorber search in that same Rectangle will return some text from elsewhere in the page. However, if I pass the TextAbsorber a rotated Rectangle like so, then it works:

private Rectangle rotateRect(int page, Rectangle rect)
{
Rectangle pageRect = new Document(Filename).Pages[page].Rect;

return new Rectangle(rect.LLY, pageRect.Width - rect.URX, rect.URY, pageRect.Width - rect.LLX);
}

That is, if I run the TextAbsorber with a search Rectangle of (1960, 1465, 1980, 1540), then it returns "Name:"

This only happens for certain PDFs, and I have found no way of determining when it’s going to happen ahead of time–the rotation on the Document is always zero. I would give more actual examples of this happening but my trial license has expired so I can no longer run the test program; however, this is basically what has stopped us from purchasing a full license since it prohibitively complicates things.

Any guesses as to why this is happening?

Hi Jonathan,

Thanks for using our API’s.

Can you please share the input PDF files causing this problem along with sample code/project, which can help us in replicating this issue in our environment.

Now concerning to trial license expiry, you may consider request another trial license to continue your evaluation. For more information, please visit Get a temporary license

We just went ahead and purchased a full license because our workaround seems to be working, but I would still like to resolve the issue.


Here’s sample code to demonstrate the problem; I will private message you an example PDF.

static void sampleRotationProblem()
{
Document doc = new Document(“sampleRotationProblem.pdf”);

// use TextFragmentAbsorber to find fragments with “Customer Name:”
TextFragmentAbsorber tfa = new TextFragmentAbsorber();
tfa.Phrase = “Customer Name:”;
tfa.Visit(doc.Pages[1]);

// get rectangle from first fragment
Rectangle tfaRect = tfa.TextFragments[1].Rectangle;

// use TextAbsorber to search in that same Rectangle
TextAbsorber textAbsorber = new TextAbsorber();
textAbsorber.TextSearchOptions.LimitToPageBounds = false;
textAbsorber.TextSearchOptions.Rectangle = tfaRect;
textAbsorber.Visit(doc.Pages[1]);

// write out found text
Console.WriteLine(textAbsorber.Text.Trim()); // Output = text from elsewhere in the page

// now rotate the Rectangle 90°
Rectangle pageRect = doc.Pages[1].Rect;
double widthAdjustment = pageRect.Width + 10; // needs slight adjustment
Rectangle rotatedRect = new Rectangle(tfaRect.LLY, widthAdjustment - tfaRect.URX, tfaRect.URY, widthAdjustment - tfaRect.LLX);

// use TextAbsorber to search in the rotated Rectangle
textAbsorber.TextSearchOptions.Rectangle = rotatedRect;
textAbsorber.Visit(doc.Pages[1]); // Output = “Customer Name:”

// write out found text
Console.WriteLine(textAbsorber.Text.Trim());
}

Hi Jonathan,


Thanks for sharing the code snippet and input document.

I have tested the scenario and have managed to reproduce same problem. For the sake of correction, I have logged it as PDFNET-41517 in our issue tracking system. We will further look into the details of this problem and will keep you posted on the status of correction. Please be patient and spare us little time. We are sorry for this inconvenience.

The issues you have found earlier (filed as PDFNET-41517) have been fixed in Aspose.PDF for .NET 20.4.