Pulling information from Word documents

I am wanting to pull the information from Word documents and save this information to an excel file. How can I achieve this?

Hi Katherine,

Thanks for your inquiry. It would be great if you pelase share some more detail about your query. What kind of information you want to extract from Word document and insert into Excel document? We will then provide you more information about your query along with code.

In your case, I suggest you please check the attached utility which convert the Word document to Excel document. This utility extracts the contents from Words document e.g table, text, images etc and insert into Excel document. Hope this helps you.

Please use the following code snippet to convert Doc/Docx to Excel file.

Document doc = new Document("in.doc");
ConverterDoc2Xls converter = new ConverterDoc2Xls();
Workbook wb = converter.Convert(doc);
wb.Save("out.xls");

There won’t be images but just text. There would be multiple word documents that I would need to put on different lines on only one excel file. Would I do this the same way?

Hi Katherine,

Thanks for your inquiry. Yes, you can extract text from Word document using Aspose.Words and insert it in Excel document using Aspose.Cells.

It would be great if you please share following detail here for our reference. We will then provide you more information about your query along with code.

  • Please attach your input Word document.
  • Please attach your target Excel document. I will investigate as to how you are expecting your final document be generated like.

Please read following documentation links for your kind reference.
https://docs.aspose.com/words/net/how-to-extract-selected-content-between-nodes-in-a-document/
https://docs.aspose.com/cells/net/view-and-edit-excel-data/

Attached are some examples. I am wanting to conserve the merge fields and other type of fields. I am not sure how this can be accomplished but if you could guide me in the right direction, that would be helpful.

Hi Katherine,

Thanks for sharing the documents. Please use the following code example to achieve your requirement. Hope this helps you.

Following code example replace the Fields with field codes and insert Document’s text in Excel file. Please let us know if you have any more queries.

Document doc = new Document(MyDir + "Lorem ipsum.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
NodeList fieldStarts = doc.SelectNodes("//FieldStart");
foreach(FieldStart fieldStart in fieldStarts)
{
    builder.MoveToField(fieldStart.GetField(), true);
    builder.Write("{ " + fieldStart.GetField().GetFieldCode() + " }");
    fieldStart.GetField().Remove();
}
// doc.Save(MyDir + "Out.docx");
// Instantiating a Workbook object
Aspose.Cells.Workbook workbook = new Aspose.Cells.Workbook();
// Adding a new worksheet to the Excel object
int i = workbook.Worksheets.Add();
// Obtaining the reference of the newly added worksheet by passing its sheet index
Aspose.Cells.Worksheet worksheet = workbook.Worksheets[i];
// Adding a string value to the cell
worksheet.Cells["A1"].PutValue(doc.ToString(SaveFormat.Text));
// Saving the Excel file
workbook.Save(MyDir + "book1.xls");