We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extract text with rectangle, the result has lots of break line

I want to copy the select text from pdf.
First I choose a Rectangle area and using its coordinates to init TextAbosrber.TextSearchOptions.Rectangle. and then I selected a page to accept this textAbsorber.
Finally, I found that the textAbsorber.Text has lots of break lines in the end. sometime it also will be displayed between each line.
here is sample code:

     var absorber = new TextAbsorber();
     absorber.TextSearchOptions.LimitToPageBounds = true;
     var ltPoint = CalculateSelectedAreaOnPage(SelectRectLeftTopPoint);
     var rbPoint = CalculateSelectedAreaOnPage(SelectRectRightBottomPoint);
     absorber.TextSearchOptions.Rectangle = (new Aspose.Pdf.Rectangle(ltPoint.X, ltPoint.Y, >rbPoint.X, rbPoint.Y));
     string extractedText = absorber.Text;

Please ingore the function CalculateSelectedAreaOnpage(), it just a function to convert point.
Please help me check the extract text result. thanks.


Could you please explain a bit more by sharing your sample source PDF and expected output text information. We will test the scenario in our environment and address it accordingly.

Thanks for your replay,
Here.pdf (95.8 KB)
and a select the rectange such as SelectedRect.png (55.5 KB)
and the extracted text result is extractedTextResult.png (12.2 KB)
you can see I have selected all the text, so that you can see the extra breakline
sometimes linebreak will appear in line space, please check it.
if you have any problem, please let me know.


We need these values which you have specified in your above line of code. This way we will be able to test the scenario in our environment accordingly and share our feedback with you.

the ltPoint.X = 71.250475;
the ltPoint.Y = 485.250638;
the rbPoint.X = 543.749519;
the rbPoint.Y = 425.250763;


Please try to use the code snippet as below in order to extract text without spaces:

Document doc = new Document(dataDir + "extractTextTextPDF.pdf");
TextAbsorber ta = new TextAbsorber();
ta.TextSearchOptions.Rectangle = (new Aspose.Pdf.Rectangle(71.250475, 485.250638, 543.749519, 425.250763));
ta.ExtractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw);
string text = ta.Text;