Free Support Forum - aspose.com

Text extraction includes tracked changes

When a Word document contains revisions from tracked changes the Document.ToTxt() method returns the text from those revisions as well as the final text. The workaround is to call Document.AcceptAllRevisions(), but I’ll bet that most developers won’t know about this until they run across such a document in their production environments.

Can you add an overload of ToTxt() with parameters that control which pieces of text are extracted, such as revisions, comments, or any other non-final-document bits of text that may exist? I assume that most developers would normally want “final document text”, but it should also be possible to include/exclude the other pieces.

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. If you need, you can get text from each node in the document. For example:

doc.ToTxt();

doc.FirstSection.ToTxt();

doc.FirstSection.Body.ToTxt();

etc.

Also we plan to implement RejectRevisions() method. So I don’t think that creating an overload for ToTxt() method is necessary.

Best regards.

I think you’ve missed the point. If the original document had the word “apple”, then revision tracking was turned on, and the word “apple” was changed to “banana”, then ToTxt returns “applebanana”. RejectRevisions would not help in this case, because we want the word banana but not the word apple. Also, navigating down into each document object isn’t helpful, since the doc.FirstSection.Body.ToTxt() is still going to return the text for all revisions, smooshed together.

It’s not a big deal, because I have a workaround that works fine for my situation. I was merely suggesting that 1) this issue be documented on ToTxt and similar methods so that people are aware of the issue, and 2) there should be some way to get the Final Text from ToTxt without having to AcceptAllRevisions.

Hi<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. I understood what you mean. But I think that creating an overload for ToTxt() method is not needed in this case. Calling AcceptAllRevisions() solve this problem.

Best regards.