Navigaring through Word document


#1

Can you help?<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Is there an example of navigating (ie reading) an existing word document?

I could not find one in the demos provided.

Thanks,

Viv


#2

Hi Viv,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Sorry I don’t fully understand what you want to achieve.

Aspose.Word is mainly used to open, populate with date, modify and save MS Word documents. Access to the existing content is somewhat limited. It is possible to enumerate over document content if you create a class that implements IDocumentVisitor interface. Here is a brief example on using it http://www.aspose.com/forums/ShowPost.aspx?PostID=13556

Please let me know about the tasks you are facing and we will try to help.


#3

Thanks for the quick response.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Is it possible to navigate the document and extract the Sections/text/images/tables etc information (to insert into a database)?


#4

In what form do you want to extract the data?<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

If you use IDocumentVisitor, your code will be called for each document element such ad section, image, paragraph, run of text and so on. You will have access to the properties of the document element such as ParagraphFormat or Font and plain text of each text run. IDocumentVisitor does not have handlers to receive information about tables, rows and cells, but we can quickly add them if this solution is suitable for you.

Maybe you want to save the document in HTML format and extract data from it then? It is also possible to save data in Aspose.Pdf.Xml format which is an XML format used by Aspose.Pdf on input to produce PDF files. You can just load this XML file and parse it the way you want and store in the database. We plan to support WordML in the near future, let me know if this is a suitable format for you.


#5

The main aim would be to extract the text, including section names, from the document.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Being able to navigate the tables and extract the text from each cell would be great.

...

Would I be able to re-create the table from the table information in the word document i.e. reconstruct cell and text information into a plain HTML table?

It sounds like this is not possible in this version.


#6

If you save the document in HTML format, it will create HTML tables from MS Word tables for you.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

IDocumentVisitor does not provide table and cell information at the moment, but we were going to do this anyway and it can be added quickly. I'll mark this feature to be implemented asap.


#7

See new IDocumentVisitor.TableStart, TableEnd, RowStart, RowEnd, CellStart and CellEnd methods in Aspose.Word 2.1.12, http://aspose.com/blogs/Roman.Korchagin/archive/2005/02/22/528.aspx