Extract Content from Word 2003 document and import it into sharepoint document library

My company is in need of a product that can accept Word 2003 documents. Extract the needed data from that document and use it to populate columns in SharePoint.

We receive 100s of emails daily for job requisitions from our customer.

The customer created the format, so we cannot alter how we receive the data. The format of the requisitions is standard, so that helps.

Fields have headers followed by a colon and are always in the same order.

For example…

Location: Houston, TX
Rate: $50/hr
Description: blah blah blah
Required Skills: blah blah blah

We need a way to extract the information in these Word documents and populate columns in SharePoint.

Can Aspose.Words help us do this?

Hi James,

Thanks for your query. Please accept my apology for late response. It would be great, If you explain your following statement. what do you mean by “populate columns in SharePoint”?

Extract the needed data from that document and use it to populate columns in SharePoint.
Please read following documentation links for your kind reference. I have tried to understand your problem statement and based on my understanding the requested feature is not supported in Aspose.Words for SharePoint.

http://docs.aspose.com/display/wordssharepoint/Features
http://docs.aspose.com/display/wordssharepoint/Introducing+Aspose.Words+for+SharePoint

I answered part of my question. You have to do it with Aspose.Words for .Net

I am trying to take that data and load it into custom columns in a SharePoint document library.

I have just about figured it out by using the following code:

// Gets the document just uploaded to SharePoint and loads the content into a text string
Document doc = new Document(varDoc);
string text = doc.Range.Text;

And parsing the text string using Substring functions

The problem I have now is that when I load Word 2003 documents it cuts off the top half of my document, which is really starting to piss me off because Word 2007/2010 docx files load just fine.

For example, let’s say the content below is what is in my document.

If i set a breakpoint to see what the value of “text” is with a Word 2003 document, it would only load the content starting at “Description”. If I save the document as a docx, it will load everything into the variable. WTF!!!

Job Number: 4556-456

Status: Open

Resource Type: SE - Systems Analyst

Part/Full Time: Part Time

Skill Level: Level 3

Priority: High

Location: Houston

Description:

blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah

Mandatory Skills:

blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah

Hi James,

I regret to share with you that the requested feature is not available in Aspose.Words for SharePoint. Please read the features of Aspose.Words for SharePoint.

We apology for your inconvenience.

I know it isn’t available in Aspose.Words for SharePoint.

It is, however, available in Aspose.Words for .Net

I can do the SharePoint column updates without your product.

I do, however, need your product to work correctly when opening and parsing Word 2003 documents.

This is what I need help with…can you help me???

Please use the following code snippet for your requirements. I recommend you to read our documentation. Please read following documentation links for your kind reference.

Product Overview
Find and Replace
Aspose.Words Document Object Model
Working with Document
Working with Bookmarks
Working with Ranges

I figured it out on my own. It takes a lot more code with IndexOf and Substring functions to parse it correctly.

Your code doesn’t work because I have fields at the bottom (e.g. Description and Skills) that contain more than one paragraph. Your code skips that content.

Hi James,

It is nice to hear from you that you have figured out this issue. We always appreciate positive feedback from our customers.