Extract text according to coordinate

Hi

Does Aspose.Pdf support extract text according to coordinate?
i want to extract the text in the specific region of the pdf by setting(x,y,width,height). is there any solution for this?

Hi,

Thanks for contacting support.

I’m afraid the requested feature is currently not supported. However, we have logged this requirement in our issue tracking system under New Features list as PDFNEWNET-35107. We will investigate this requirement in detail and keep you updated on the status of a correction. We apologize for your inconvenience.

As a workaround, you may consider cropping certain page region and try extracting text from that region.

[C#]

//open document
Document document = new Document(@"source.pdf");
Aspose.Pdf.Rectangle cropBox = document.Pages[1].CropBox;

// update page's crop box
document.Pages[1].CropBox = new Aspose.Pdf.Rectangle(cropBox.LLX + 140, cropBox.LLY + 140, cropBox.URX - 140, cropBox.URY - 140);

//save output document
document.Save(@"Cropped.pdf");

You may consider visiting the following link for further details on Update Page Dimensions.

Hi,

Thanks for your patience.

The feature to extract text from a particular page region reported earlier as PDFNEWNET-35107 has been implemented. Please try using the following code snippet to accomplish this requirement. Please try using the latest release of Aspose.Pdf for .NET 8.1.0 and in case you encounter some issue or you have any further query, please feel free to contact.

[C#]

// open document
Document doc = new Document("input.pdf");

// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
absorber.TextSearchOptions.LimitToPageBounds = true;
absorber.TextSearchOptions.Rectangle = new Aspose.Pdf.Rectangle(200, 200, 450, 350);

// accept the absorber for the first page
doc.Pages[1].Accept(absorber);

// get the extracted text
string extractedText = absorber.Text;

The issues you have found earlier (filed as PDFNEWNET-35107) have been fixed in Aspose.Pdf for .NET 8.2.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.
(1)

Hey yea, like he said its not really working so well, I’ve been using http://pdftoword.pro/ to carry this out, but this will not always work for every pdf…

Apart from that ive been trying to figure out a way to do it with aspose, but no chance so far

Hi Sam,


Thanks for your inquiry. The subjected feature was implemented and is working fine. Can you please share you sample code and problematic PDF document here? So we will investigate and provide you more information.

We are sorry for the inconvenience caused.

Best Regards,