I would like to know how to retrieve the text of a paragraph without the comments’ text?
If I do
paragraph.ToTxt();
it includes the text of all children including comments. GetText() seems to give the same results. How can I get the text of the paragraph only without its children?
Using the DocumentVisitor would require some code refactoring that I would like to avoid if possible…
Using the debugger, the following seems to give similar results:
public string Text
{
get
{
// paragraph is an instance of Aspose.Words.Paragraph
if (paragraph.HasChildNodes)
{
string runsText = string.Empty;
foreach(Aspose.Words.Run run in paragraph.Runs)
runsText += run.ToTxt();
return runsText;
}
else
return paragraph.ToTxt();
}
}
Is it safe to use this? Will it gives the same results (i.e. will it give exactly the text entered in Word in the paragraph)?
Thanks for your request. Not exactly because paragraphs can also contain fields. Each field consists of FieldStart, FieldEnd, FieldSeparator nodes and field code, field value text. Field code and field value are also represented by Runs. So the text returned by your code will also contain field codes, which, I suppose, are unnecessary in your case.
Best regards.
It seems to me that Marie3’s solution is now correct, because the current version of Aspose.Words keeps FieldSeparators, etc., in their own type of child nodes. That is, it seems to me that as it stands now, traversing all the Runs directly under a paragraph should return only the actual text of the paragraph (including deletions and insertions); anything else (comments, fields, etc.) would be excluded.