Comment.getText() does not return the complete comment text (java)

kml2020 · June 20, 2022, 6:56pm

Hello, We have noticed that when using comment.getText() method, it does not return the complete comment string always. If the comment string is a simple one line, example “This is a comment on line one” it works. However, if the comment is with multiple lines and formatting, example,
"This is a comment with multiple lines.

Unable to get the complete comment text here.

What is the best way to the get the complete comment text?"

The getText() call returns with the last line of the comment. What is the best or recommended way to get the comment text.

Thank you!

vadim.saltykov · June 21, 2022, 4:08am

@kml2020 Please elaborate how you get Comment.getText() output. A multiline string contains the carriage return character \r, and depending on the output method, you may need to remove it.

string.replace("\r", "")

kml2020 · June 21, 2022, 4:52pm

Hi @Vadim.Saltykov, Thank you for a quick revert! In these cases, I have observed, we get the last line of the comment message. Going by the above example it will be,
“What is the best way to the get the complete comment text?”

vadim.saltykov · June 22, 2022, 3:51am

@kml2020 Please consider the following code.

string text = comment.getText();
text = text.replace("\r", " ").trim();
System.out.println(text);

kml2020 · June 22, 2022, 4:57am

@Vadim.Saltykov, that seems to work. However, it gets the whole comment in one lined due to replacing “\r”. We also tried using Visitor approach and overriding visitParagraphEnd and are able to get the complete message which also keeps the line break as such, allowing to extract the comment message as is.
Is using Visitor a reasonable approach or should we use replacing “\r” ? I guess, we are looking to confirm, a reliable approach.
Additionally, could you please confirm, a reliable approach to get the selected text for a comment as well ( that would be between a CommentRangeStart and CommentRangeEnd ) ?

vadim.saltykov · June 22, 2022, 7:25am

@kml2020 In fact, I don’t see a problem with either multi-line or single-line comment text. Depending on the task, one or the other way may be required. To choose the optimal way, you must first formulate the task. In any way, the Visitor functionality for these tasks seems completely redundant for me.
The text content of the Comment class returned by getText() does not depend of the CommentRange anchor position. CommentRange uses to specify a region of text that is to be commented. You can read more information about working with comments here.

kml2020 · June 22, 2022, 7:50pm

@Vadim.Saltykov, Using the replace("\r", “\n”) in getText() seems to preserve the new lines as well. Because we want to extract the comment and reply message as is without losing line breaks if any.

vadim.saltykov · June 23, 2022, 4:36am

@kml2020

As far as I can see, you have found a suitable solution for displaying a multipage text. Do you still have questions about Aspose.Words?

kml2020 · June 23, 2022, 5:49am

Hi @Vadim.Saltykov, Thank you for your assistance so far ! I do have follow up question. What is the best way to get the selected text for a comment ? Also consider complex scenarios like:
1> Overlapping comments.
2> Or a word that has it’s own comments and this selected word is present within a larger sentence, which has another comment.
3> Also, consider third scenario, where a comment is on word “solution” and in a para. And there is another “solution” in second para, which has a different comment ( so two “solution” words have comments ).

I have looked at various sample but could not find any comprehensive solution for this.

vadim.saltykov · June 23, 2022, 7:54am

@kml2020

What do you mean by “get the selected text for a comment” (see screenshot (1) or (2))?

To clarify the situation, please, create a separate document for each scenario and write in the post the expected text that Aspose.Words shall return.

kml2020 · June 24, 2022, 4:23am

@Vadim.Saltykov, I am referring to #1 in your screen snapshot, the commented text. I have attached document with some comments and for each, what is the best way to get commented text. If we consider this sample code:
NodeCollection comments = doc.getChildNodes(NodeType.COMMENT, true);
I will get four comment nodes. Then for each of the node, we want to get the commented text.Sample Template A1.docx (17.4 KB)

vadim.saltykov · June 24, 2022, 7:05am

@kml2020 Please consider the following code

StringBuilder rangeText = new StringBuilder();
Comment comment = (Comment)doc.getChild(NodeType.COMMENT, 0, true);
NodeCollection startColl = doc.getChildNodes(NodeType.COMMENT_RANGE_START, true);
NodeCollection endColl = doc.getChildNodes(NodeType.COMMENT_RANGE_END, true);
CommentRangeStart start = null;
for (CommentRangeStart cmtNode : startColl)
{
    if (cmtNode.getId() == comment.getId())
        start = cmtNode;
}
CommentRangeEnd end = null;
for (CommentRangeEnd cmtNode : endColl)
{
    if (cmtNode.getId() == comment.getId())
        end = cmtNode;
}

Node node = start;
while (node != end)
{
    if (node.getNodeType() == NodeType.RUN)
    {
        Run run = (Run)node;
        rangeText.append(run.getText());
    }

    node = node.nextPreOrder(doc);
}
System.out.println(rangeText.toString());

kml2020 · June 24, 2022, 4:16pm

Thank you for the sample code, @Vadim.Saltykov.

The code actually seems to include the comment runs as well. Is it possible to avoid those ? Consider the attached document as a sample docx. Sample Template1.docx (17.7 KB)

Here the commented text for “Comment 2” is:
search online for the video that best fits your document. To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other

But instead, the previous comment seems to include runs from Comment 1 and the string builder outputs is:
Comment 1 Next LineComment 2 Formatted LineReply to comment 1search online for the video that best fits your document. To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other

I see the same issue in following comments as well, where the commented text is nested.

alexey.noskov · June 25, 2022, 4:57am

@kml2020 In this case, I think, you should use DocumentVisitor, which provides more flexible way to iterate the nodes in the document. I have created a simple code that demonstrates the techinique:

Document doc = new Document("C:\\Temp\\in.docx");
CommentedRangeTextCollector collector = new CommentedRangeTextCollector();
doc.accept(collector);

// Get all comments.
Iterable<Comment> comments = doc.getChildNodes(NodeType.COMMENT, true);
for (Comment c : comments)
{
    System.out.println(collector.getCommentedText(c.getId()));
    System.out.println("==========================");
}

private static class CommentedRangeTextCollector extends DocumentVisitor
{
    @Override
    public int visitCommentStart(Comment comment) throws Exception {
        // Skip comments
        return VisitorAction.SKIP_THIS_NODE;
    }

    @Override
    public int visitCommentRangeStart(CommentRangeStart commentRangeStart) throws Exception {
        mCommentedTextMap.put(commentRangeStart.getId(), new StringBuilder());
        mCurrentRanges.add(commentRangeStart.getId());
        return VisitorAction.CONTINUE;
    }

    @Override
    public int visitCommentRangeEnd(CommentRangeEnd commentRangeEnd) throws Exception {
        mCurrentRanges.remove(mCurrentRanges.indexOf(commentRangeEnd.getId()));
        return VisitorAction.CONTINUE;
    }

    @Override
    public int visitParagraphEnd(Paragraph paragraph) throws Exception {

        for (Integer id : mCurrentRanges) {
            mCommentedTextMap.get(id).append("\r\n");
        }

        return VisitorAction.CONTINUE;
    }

    @Override
    public int visitRun(Run run) throws Exception {

        for (Integer id : mCurrentRanges) {
            mCommentedTextMap.get(id).append(run.getText());
        }

        return VisitorAction.CONTINUE;
    }

    public String getCommentedText(Integer commentId)
    {
        if(!mCommentedTextMap.containsKey(commentId))
            return "";

        return mCommentedTextMap.get(commentId).toString();
    }

    private Map<Integer, StringBuilder> mCommentedTextMap = new HashMap<Integer, StringBuilder>();
    private ArrayList<Integer> mCurrentRanges = new ArrayList<Integer>();
}

You might note, that some comment ranges are collected twice, this occurs because they wrapped in both the main comment ranges and comment reply range:

<w:commentRangeStart w:id="0"/>
<w:commentRangeStart w:id="1"/>
<w:r>
	<w:t xml:space="preserve">If you need to stop reading before you reach the end, Word remembers where you left off - even on another device. </w:t>
</w:r>
.....

kml2020 · June 27, 2022, 6:08pm

@alexey.noskov, Thank you for providing the sample code. If you consider same docx I have attached to this ticket, Sample Template1.docx, I do not see the correct selected being returned. Especially with the overlapping or nested comments, the code is returning the complete paragraphs instead of just the selected text. For “Comment 2” in the docx, the visitor is returning:

Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document. To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other. For example, you can add a matching cover page, header, and sidebar. Click Insert and then choose the elements you want from the different galleries.

Themes and styles also help keep your document coordinated. When you click Design and choose a new Theme, the pictures, charts, and SmartArt graphics change to match your new theme. When you apply styles, your headings change to match the new theme. Save time in Word with new buttons that show up where you need them. To change the way a picture fits in your document, click it and a button for layout options appears next to it. When you work on a table, click where you want to add a row or a column, and then click the plus sign. Reading is easier, too, in the new Reading view.

Sample Template1.docx (17.7 KB)

alexey.noskov · June 28, 2022, 4:45am

@kml2020 On my side output is correct. Here is my output of the code:

If you need to stop reading before you reach the end, Word remembers where you left off - even on another device. Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to 
==========================
If you need to stop reading before you reach the end, Word remembers where you left off - even on another device. Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to 
==========================
Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document. To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other
==========================
Video provides a powerful way to help you prove your point. When you click Online Video, you can paste in the embed code for the video you want to add. You can also type a keyword to search online for the video that best fits your document. To make your document look professionally produced, Word provides header, footer, cover page, and text box designs that complement each other
==========================
Theme, the pictures
==========================
Themes and styles also help keep your document coordinated. When you click Design and choose a new Theme, the pictures, charts, and SmartArt graphics change to match your new theme. When you apply styles, your headings change to match the new theme. Save time in Word with new buttons that show 
==========================

As I can see content of commented text collected by the visitor is correct.

kml2020 · July 8, 2022, 8:47am

@alexey.noskov, The solution is able to get the commented text. Thank you for your assistance!