Free Support Forum - aspose.com

Extracting Specific Text From Word Doc

Hi,

We want to use Aspose.Words to extract specific text from some Word documents. The documents are formatted as follow: (also see attached file)

----------------------------------

First Name: Suzanne
Last Name: Test

Home Phone: 905-123-4567
Work Phone: 905-234-5678
Other Phone: 905-333-2222
Email: abcd@test.ca
Comments: Oct/07: Comment line1.
Aug 10/07: Comment line2.
Aug 12/03: Comment line3

Some more text

----------------------

Is there any way to search for First Name: and retreive the text "Suzanne", search for Comments: and retreive the 3 lines of text, and etc.?

Thanks for your help.

Dave

Hi

Thank you for your interest in Aspose.Words. I think that you can achieve this using regular expressions and ReplaceEvaluator. For example see the following code. This code extracts first name.

public void TestReplaceEvaluator_109307()

{

//Open document

Document doc = new Document(@"458_109307_queuesystems\in.doc");

//Create regular expression

Regex regex = new Regex(@"First Name:(?.*?)\r");

//Find string

doc.Range.Replace(regex, new ReplaceEvaluator(ReplaceAction_109307), true);

}

static ReplaceAction ReplaceAction_109307(object sender, ReplaceEvaluatorArgs e)

{

//Get First name from document

string firstName = e.Match.Groups["value"].Value;

return ReplaceAction.Skip;

}

The following Regex you can use for extracting comments.

Regex regex = new Regex(@"Comments:(?.*?)\f");

As you can see “\r” – paragraph break character, and “\f” – page break character.

I hope that this will help you.

Best regards

Thanks for the quick response!

Your code work great!

Hi Alexey,

Do you know how I can setup the regex from Comments: to the end of the document? Is there an end of document special char?

Regex regex = new Regex(@"Comments:(?.*?)\END?");

Thanks

Dave

Hi

I think that you can try using the following Regex.

Regex regex = new Regex(@"Comments:(?.*)");

Hope this helps.

Best regards.