The Rectangle value of PDF file

65.pdf (610.4 KB)

I want what in the red box, but what I get is what in red underline, How can I set the the Rectangle value to extract the ( all title part)? I tried to set different value but I don’t know how to set the value to the part that I want from the PDF.
this is the code:

TextAbsorber absorber = new TextAbsorber();
absorber.getTextSearchOptions().setLimitToPageBounds(true);
absorber.getTextSearchOptions().setRectangle(new Rectangle(200, 400, 400, 600));

// accept the absorber for first page
doc.getPages().get_Item(3).accept(absorber);

String extractedText = absorber.getText();

// create a writer and open the file
BufferedWriter writer = new BufferedWriter(new FileWriter(new java.io.File("ExtractedText.txt")));
writer.write(extractedText);
writer.close();

It’s extract the Arabic words correctly, but not from the part that I want.

How can I set the the Rectangle value to extract the (title part)? I tried to set different value but I don’t know how to set the value to the part that I want from the PDF.

or is there another method to get it?

JDK version is 8
thank you.

@layal.khalid1,

Kindly send us your source PDF and highlight the target area with the help of a snapshot. Please also let us know which JDK version you are using. We will investigate and share our findings with you.

Thank you for your reply, I edited the post.

@layal.khalid1,

The rectangle constructor takes 4 parameters with respect to the bottom right corner as origin. Please set the rectangle coordinates as follows:

[Java]

// Parameters:
//   llx:
//     X of lower left corner.
//   lly:
//     Y of lower left corner.
//   urx:
//     X of upper right corner.
//   ury:
//     Y of upper right corner.
absorber.getTextSearchOptions().setRectangle(new Rectangle(130, 600, 470, 800));

Thank you, it’s works.

but can I ask you how do you do the calculation?
How do you know the size of the corners and then get the exact location?

@layal.khalid1,

In order to adjust the horizontal width, you need to change 1st (lower left X) and 3rd (upper right X) parameters (130 and 470) and for the vertical height, you need to change the 2nd (lower left Y) and 4th (upper right Y) parameters (600 and 800). You can also retrieve the rectangle coordinates of the page by calling getRect() method of the Page instance, and then formulate the region area.

[Java]

Rectangle recangle = doc.getPages().get_Item(1).getRect();

Thank you, I get it now.
and I can get more than one line, right?

but the coding doesn’t get the line till the end.
Is it because of the temp version?

Thank you

@layal.khalid1

Thanks for contacting support.

In case you are not able to process more than 4 pages of a PDF document, through the API, then it is limitation of trial version. Please note that, while using trial version of the API, you can only process 4 elements of any collection (e.g Pages, Annotations, Attachments. etc.). Please consider applying and using temporary license, in order to have full access to API features and in case you still face any issue, feel free to contact us.