I have been testing out Aspose PDF for use with an internal project that requires parsing some PDFs. The general method is to use a TextFragmentAbsorber to search for a label, then use a TextAbsorber to search around that label for possible values.
For most PDFs this works fine, but for certain PDFs I’ve found through extensive debugging that the Rectangle passed to the TextAbsorber must be rotated 90° clockwise around the center of the PDF page in order to find the same text found with the TextFragmentAbsorber.
For example, if I have a 1700x2200 PDF page and my TextFragmentAbsorber returns a fragment for the text “Name:” with a Rectangle of (LLX: 160, LLY: 1960, URX: 235, URY: 1980), the TextAbsorber search in that same Rectangle will return some text from elsewhere in the page. However, if I pass the TextAbsorber a rotated Rectangle like so, then it works:
private Rectangle rotateRect(int page, Rectangle rect)
Rectangle pageRect = new Document(Filename).Pages[page].Rect;
return new Rectangle(rect.LLY, pageRect.Width - rect.URX, rect.URY, pageRect.Width - rect.LLX);
That is, if I run the TextAbsorber with a search Rectangle of (1960, 1465, 1980, 1540), then it returns "Name:"
This only happens for certain PDFs, and I have found no way of determining when it’s going to happen ahead of time–the rotation on the Document is always zero. I would give more actual examples of this happening but my trial license has expired so I can no longer run the test program; however, this is basically what has stopped us from purchasing a full license since it prohibitively complicates things.
Any guesses as to why this is happening?