Extracting Text based on sections

donhoonre · June 4, 2013, 1:23pm

Hi

I am using Aspose pdf , its a very good component. Currently I am using it for searching and replacing text but I need to extract particular text based on sections for. e.g

I have a Header title which is generic in input.pdf I want to extract this generic Header and replace it it another pdf document (input2.pdf)

Also , can we extract part from input.pdf and name them as Sectionheader and use this Sectionheader to replace the text in other input2.pdf

Thanks

codewarior · June 10, 2013, 10:33am

Hi Don,

Thanks for contacting support and sorry for the delayed response.

Aspose.Pdf for .NET supports the feature to extract text from a particular page region and you can use the same text to replace contents of another file. Please try using the following code snippet to fulfill this requirement.

C#

// open document
Document doc = new Document("c:/pdftest/HTMLConversion.pdf");

// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
absorber.TextSearchOptions.LimitToPageBounds = true;
absorber.TextSearchOptions.Rectangle = new Aspose.Pdf.Rectangle(200, 200, 450, 350);

// accept the absorber for the first page
doc.Pages[1].Accept(absorber);

// get the extracted text
string extractedText = absorber.Text;