I’ve been evaluating Apose.Word. I have a document made up of a number of sections. I need to be able to retrieve the text from each of these sections. This is possible through the Range object associated with each section, but unfortunately only returns it back as plain-text, rather than with the formatting still applied.
Is it possible to retrieve the formatted contents of a section? If not can this be added?
How would you like to obtain the formatted content? Do you need in HTML, RTF format or you need it in objects and attributes or something else? What you do you want to do with it?
HTML format would be ideal.
What I am trying to achieve is to read in formatted text from sections defined in a Word document. This formatted text would then be used to create other Word documents and to display on web pages. Creating other documents would be possible by using the DocumentBuilder and parsing the HTML to ensure the formatting went in correctly.
At the moment HTML can only be product when saving the whole document, but I can see that producing HTML for a section would be useful so I will add this request to our task list.
In the meantime, if you need a workaround you can delete all sections from the document apart from the one you want to convert to HTML and save it in HTML format.
Beware, however, that not all HTML formatting is supported on import and not all DOC features are exported to HTML so you might not get exact result if you import and then export.
You can just copy and move sections between Word documents using Sections and Section classes, don’t need to do conversion to HTML for that. This way you can do HTML conversion only if you want to render into a web page.