Request for Assistance: Accessing Old and New Content of Revisions in Aspose

Dear Aspose Support Team,

I hope this message finds you well. Our team is currently working on a project that involves the extraction of old and new content from revisions within DOCX files using Aspose for Python. We have found Aspose to be a reliable tool for document processing, and we believe it can help us achieve our specific objectives.

To provide you with more context, our primary goal is to retrieve the old content and new content of revisions within DOCX files. We aim to access and store this content for further analysis and processing.

Having reviewed the Aspose documentation and code samples, we kindly request your expert guidance and support to help us achieve the following objectives:

  1. Retrieve the old content and new content of revisions within DOCX files using Aspose for Python.
  2. Access and store this content in a structured format for subsequent analysis and processing.

If there are any specific Aspose APIs, methods, or code examples that you could recommend to efficiently extract the old and new content from DOCX file revisions, it would be greatly appreciated.

Additionally, if there are any best practices or considerations unique to working with DOCX file revisions using Aspose, we would be eager to learn from your expertise.

Please let us know if you require any additional information about our project or the specific DOCX files we are working with. Your prompt assistance will be invaluable to us, and we look forward to your guidance and support in achieving our goals.

Thank you for your time and expertise in advance. We are enthusiastic about using Aspose to enhance our DOCX file revision analysis.

Example for reference
We have tried using doc.revisions but its not giving proper old and new text
By old text we mean in track changes the text which is deleted its showing in track changes and by new text we mean to have update text
U can do compare merge on any two files and check for reference

@alexey.noskov can u help us on this??

@cacglo You can access revisions in the document using Document.revisions collection. But in MS Word revisions usually are grouped. So to access revisions in the same way as they are represented in MS Word, it is more convenient to use RevisionGroup. For example see the following code:

doc = aw.Document("C:\\Temp\\in.docx")
for group in doc.revisions.groups:
    print(f"Revision author: {group.author};\r\nRevision type: {group.revision_type}\r\nRevision text: {group.text}")
    print("=============================")

RevisionGroup.revision_type property you can determine type of revision.

we have used this but we need to know that. this is the old text and this new text against it

we need to create a dictionary which is having
{oldtext: new text}

The order in which we are getting this is not appropriate

@cacglo I am afraid, MS Word document revisions does not have information about relations between inserted and deleted text. So Aspose.Words also cannot provide such information.

@alexey.noskov Is there any way that we could keep only revision text in the document and remove all the not changed text ???

@cacglo You can use Inline.is_delete_revision, Inline.is_insert_revision, Inline.is_format_revision, Inline.is_move_from_revision and Inline.is_move_to_revision to check whether inline node has revision. Paragraph node has similar properties. So you can loop through the nodes in the document and remove them if node is not revision.