Is it possible to make template like regex to extract data?


Im interested in next thing: is it possible to make some template to use it in extracting data from word document or its better to make regex to extract data? Please, give me some optimal direction how to do it. Attached sample document from where I need to exctrat each column and row data. Using Aspose.Word trial from NuGet.

Hi there,

Thanks for your inquiry. Please note that MS Word document is flow document and does not contain any information about its layout into lines and pages. However, you can extract the contents from a Word document using Aspose.Words. Please read following documentation link for your kind reference.
How to Extract Selected Content Between Nodes in a Document

Could you please share some more detail about your query along with expected output? We will then provide you more information on this along with code.

Than You for answer!

Sorry for my english, let me clarify question. I’ve attached document, where colored text with Red is the data i need to get. And Underlined colored red text is the data which parser should specify its Credit or Debit. Debits located in center, Credits in right-alignment: I’ve marked them in green color. How can I do this? Is that possible?

I’ve posted sample screentshot of what i need to get.

Thank You!

Hi there,

Thanks for your inquiry. Please note that Aspose.Words is quite different from the Microsoft Word’s Object Model in that it represents the document as a tree of objects more like an XML DOM tree. If you worked with any XML DOM library you will find it is easy to understand and work with Aspose.Words. When you load a Word document into Aspose.Words, it builds its DOM and all document elements and formatting are simply loaded into memory. Please read the following article for more information on DOM:

Aspose.Words Document Object Model

In your case, we suggest you please save the document to text and extract the text using .NET APIs according to your requirements. Hope this helps you.

Document doc = new Document(MyDir + "in.docx");
doc.Save(MyDir + "Out.txt", SaveFormat.Text);