Extract RTF from between two nodes

I'm using the following code snippet to extract the rtf text between startNode and endNode
from a .doc word document.

Dim docStream As New FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite) Dim doc As New Document(docStream) Dim extractedNodes As New ArrayList extractedNodes.Clear() Dim currentNode As Node = startNode Dim flag As Boolean = True Dim par As New Paragraph(doc) Do While currentNode IsNot Nothing AndAlso flag Dim nextNode As Node = currentNode.NextPreOrder(currentNode.Document) If currentNode.NodeType = NodeType.Run Then par.AppendChild(currentNode) End If If currentNode.NodeType = NodeType.Paragraph Then extractedNodes.Add(par) : par = New Paragraph(doc) : sb.Append(vbCrLf) If currentNode.Equals(endNode) Then flag = False Exit Do End If currentNode = nextNode Loop extractedNodes.Add(par) Dim EditableRangeContent As Document = GenerateDocument(doc, extractedNodes) dim RTFText as String = getString(EditableRangeContent, SaveFormat.Rtf))

The RTFText I'm getting when using this has way more tags and code than the RTF I get
when doing a simple copy paste from the word document itself into a new RTF file.

is there a better way to extract rtf text between two nodes?


Hi Akram,

Thanks for your inquiry. Please see this article on Extract Content Between Paragraphs in a Document:

Hope this helps you. Please let us know if you have any more queries.

even with the suggested code (which is the same as mine), the output rtf file has still a lot of info and code compared to a regular rtf
Please see attached files, file size is different and rtf content is different

OriginalDocument.doc is the file where I extracted the RTF from
AsposeExtract.rtf is the file generated using the suggested code
CopyPaste.rtf is the file created by copying the content and pasting it to a new rtf file (using wordpad)

Hi Akram,

Thanks for your inquiry.

Please note that Aspose.Words tries to mimic the same behavior as MS Word does. If you copy the contents of OriginalDocument.doc by using MS Word into new document and save these contents to RTF, you will get the same output.

Please let us know if you have any more queries.

is there a way I can get just the RTF of a paragraph without saving to rtf file and then reading the ascii content?

and do you have any function that would convert an html sting to rtf string

Hi Akram,

Thanks for your request.

We will provide an ability to export individual nodes ( e.g. Document, Paragraph, Table nodes ) to RTF format using Node.ToString(SaveFormat.Rtf) method. Your thread has been linked to the appropriate issue ( WORDSNET-9111 ) in our issue tracking system and you will be notified as soon as this feature is available. Sorry for the inconvenience.

Once this API is available, you can then use DocumentBuilder.InsertHtml method to insert HTML string into the document and implement custom logic over inserted HTML content in the document by using INodeChangingCallback interface. You can save each inserted HTML Node to RTF, concatenate all RTF strings and return the final string.

Best regards,

Thanks for the update.
do you know when I can be expecting this change

Hi Akram,

Thanks for your inquiry.

I have verified the status of this feature from our issue tracking system and regret to share with you that the implementation of this feature has been postponed till a later date. We will inform you as soon as there are any further developments.

We apologize for your inconvenience.