Hi,
We want to use Aspose.Words to extract specific text from some Word documents. The documents are formatted as follow: (also see attached file)
First Name: Suzanne
Last Name: Test
Home Phone: 905-123-4567
Work Phone: 905-234-5678
Other Phone: 905-333-2222
Email: abcd@test.ca
Comments: Oct/07: Comment line1.
Aug 10/07: Comment line2.
Aug 12/03: Comment line3
Some more text
Is there any way to search for First Name: and retreive the text “Suzanne”, search for Comments: and retreive the 3 lines of text, and etc.?
Thanks for your help.
Dave
Hi
Thank you for your interest in Aspose.Words. I think that you can achieve this using regular expressions and ReplaceEvaluator. For example see the following code. This code extracts first name.
public void TestReplaceEvaluator_109307()
{
//Open document
Document doc = new Document(@"458_109307_queuesystems\in.doc");
//Create regular expression
Regex regex = new Regex(@"First Name:(?.\*?)\r");
//Find string
doc.Range.Replace(regex, new ReplaceEvaluator(ReplaceAction_109307), true);
}
static ReplaceAction ReplaceAction_109307(object sender, ReplaceEvaluatorArgs e)
{
//Get First name from document
string firstName = e.Match.Groups["value"].Value;
return ReplaceAction.Skip;
}
The following Regex you can use for extracting comments.
Regex regex = new Regex(@"Comments:(?.\*?)\f");
As you can see “\r” – paragraph break character, and “\f” – page break character.
I hope that this will help you.
Best regards
Thanks for the quick response!
Your code work great!
Hi Alexey,
Do you know how I can setup the regex from Comments: to the end of the document? Is there an end of document special char?
Regex regex = new Regex(@"Comments:(?.\*?)\END?");
Thanks
Dave
Hi
I think that you can try using the following Regex.
Regex regex = new Regex(@"Comments:(?.\*)");
Hope this helps.
Best regards.