Paragraph text without include comments' text

Hi!

I would like to know how to retrieve the text of a paragraph without the comments’ text?

If I do

paragraph.ToTxt();

it includes the text of all children including comments. GetText() seems to give the same results. How can I get the text of the paragraph only without its children?

Thanks,
Marie

Hi

Thanks for your request. I think, you can use DocumentVisitor to achieve this. You can find a very good example, which demonstrates the technique, here:
https://docs.aspose.com/words/net/how-to-extract-selected-content-between-nodes-in-a-document/
Hope this helps. Please let me know in case of any issues.
Best regards.

Hi Alexey,

Thanks for the quick answer.

Using the DocumentVisitor would require some code refactoring that I would like to avoid if possible…

Using the debugger, the following seems to give similar results:

public string Text
{
    get
    {
        // paragraph is an instance of Aspose.Words.Paragraph
        if (paragraph.HasChildNodes)
        {
            string runsText = string.Empty;
            foreach(Aspose.Words.Run run in paragraph.Runs)
            runsText += run.ToTxt();

            return runsText;
        }
        else
            return paragraph.ToTxt();
    }
}

Is it safe to use this? Will it gives the same results (i.e. will it give exactly the text entered in Word in the paragraph)?

Thank you!

Hi

Thanks for your request. Not exactly because paragraphs can also contain fields. Each field consists of FieldStart, FieldEnd, FieldSeparator nodes and field code, field value text. Field code and field value are also represented by Runs. So the text returned by your code will also contain field codes, which, I suppose, are unnecessary in your case.
Best regards.

Hi!

Thanks for your solution. I am surprised that an easier way to do this simple task doesn’t exist, but I guess this gives more control…

Thanks for your help,
Marie

It seems to me that Marie3’s solution is now correct, because the current version of Aspose.Words keeps FieldSeparators, etc., in their own type of child nodes. That is, it seems to me that as it stands now, traversing all the Runs directly under a paragraph should return only the actual text of the paragraph (including deletions and insertions); anything else (comments, fields, etc.) would be excluded.

Please advise.

Hi Avi,

Thanks for your inquiry. Perhaps, you can also use the following approach to display plain text of a Paragraph excluding the text of Comment nodes:

Paragraph clonedPara = (Paragraph) doc.FirstSection.Body.FirstParagraph.Clone(true);
NodeCollection comments = clonedPara.GetChildNodes(NodeType.Comment, true);
comments.Clear();
Console.WriteLine(clonedPara.ToString(SaveFormat.Text));

I hope, this helps.

Best regards,