Getting incorrect revision information using Aspose.words Java library

Hi Team,

I’m using Aspose.words for Java to get revision info (like the number of insertions, deletions, and format changes) with the code below. But I found that sometimes, the revision info is incorrect since the counts don’t match MS Word review counts. I’ve attached files and screenshots for you to check out. It’s really important for us to show this info accurately, so any help would be great.

Code to check revision information:

public static void main(String[] args) throws Exception {
	loadAsposeLicense();
	Document original = new Document("./<file-with-revision-info>.docx");
	revisionInformation(original);
}

public static void revisionInformation(Document doc) throws Exception {
	HashMap<String, Integer> revisionMap = new HashMap<>();
	RevisionCollection revisions = doc.getRevisions();
	Iterator<RevisionGroup> e = revisions.getGroups().iterator();
	while (e.hasNext()) {
		RevisionGroup revisionGroup = e.next();
		int revisionIndex = revisionGroup.getRevisionType();
		String revisionName = RevisionType.toString(revisionIndex);
		if (revisionMap.containsKey(revisionName)) {
			revisionMap.put(revisionName, revisionMap.get(revisionName) + 1);
		} else {
			revisionMap.put(revisionName, 1);
		}
	}
	// printing counts
	System.out.println("===== Revision information =====");
	for (String key : revisionMap.keySet()) {
		System.out.println(key + ": " + revisionMap.get(key));
	}
}

Sample file:
test_sample.docx (99.6 KB)

MS word revision information for sample file

Aspose.words revision information for sample file

@Kldv

It seems that you are experiencing discrepancies between the revision information reported by Aspose.Words for Java and what is shown in Microsoft Word. This can happen due to various reasons, including differences in how revisions are tracked and reported by the two applications.

Here are a few suggestions to help you troubleshoot and potentially resolve the issue:

  1. Check Document Loading Options: Ensure that you are using the correct loading options when opening the document. You might want to specify the MS Word version that matches the document’s format. This can be done using the LoadOptions class, where you can set the mswVersion property to match the version of MS Word that was used to create or edit the document.

    LoadOptions loadOptions = new LoadOptions();
    loadOptions.setMswVersion(MsWordVersion.WORD_2019); // or the appropriate version
    Document doc = new Document("path/to/document.docx", loadOptions);
    
  2. Review Revision Types: You can iterate through the revisions in the document and print out their types and details. This will help you understand what types of revisions are being tracked and if any are being missed.

    for (Revision revision : doc.getRevisions()) {
        System.out.println("Type: " + revision.getRevisionType());
        System.out.println("Author: " + revision.getAuthor());
        System.out.println("Date: " + revision.getDateTime());
    }
    
  3. Compare with MS Word: If possible, manually compare the revision counts from Aspose.Words with those in MS Word. This can help identify specific types of revisions that are not being counted correctly.

  4. Update Aspose.Words: Ensure that you are using the latest version of Aspose.Words for Java, as updates may include bug fixes and improvements related to revision tracking.

  5. Consult Documentation: Review the Aspose.Words documentation for any specific notes on revision handling. There may be additional settings or methods that can help align the revision information with what is shown in MS Word.

If the issue persists after trying these suggestions, consider providing a minimal reproducible example along with the document in question to the Aspose support team for further assistance.

Feel free to reach out if you have any more questions or need further clarification!

It looks like this issue is happening with .doc files too. Can someone check if it’s a known bug or if there’s something wrong with my code that’s affecting the revision info ?

Sample file:
sample_file.zip (37.9 KB)

MS word revision info:

Aspose.words revision info:

@Kldv
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-27982

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

1 Like

Hi @alexey.noskov is there any update on this issue ?

@Kldv We have completed analyzing the issue. The problem occurs because MS Word build revisions groups for deleted cells in very strange way. It makes revision group for any row containing deleted cell plus one revision titled “Deleted Cells”. As for formatting revision difference it looks that MS Word ignores SizeBi attribute for paragraph break mark.
We will keep you update and let you know once the issue is resolved.

Thanks for your reply @alexey.noskov, Is there any workaround for this issue until an actual fix is made ?

@Kldv Unfortunately, there is no workaround, that we can suggest you right now.

Hi @alexey.noskov since you identified the issue, I am not fully aware of the internal workings. Could you please categorize the specific changes or types where this issue occurs? For example, is it only related to table cells or other specific change areas? This information will greatly help us narrow down the pain point.

@Kldv As mentioned above, the problem occurs because MS Word build revisions groups for deleted cells in very strange way. It makes revision group for any row containing deleted cell plus one revision titled “Deleted Cells”. As for formatting revision difference it looks that MS Word ignores SizeBi attribute for paragraph break mark.

Thanks for your reply @alexey.noskov, we can see it is showing correct revision info in MS word review section, can we get the same revision info count by calculating programmatically like from meta data etc.

@Kldv The logged defect is exactly about this, i.e. that Aspose.Words incorrectly calculates number of revisions in the document. Unfortunately, we cannot suggest you a programmatic workaround of the issue. So I would suggest you to wait for an actual fix. We will keep you updated and let you know once the issue is resolved.

@alexey.noskov were this working earlier and failed with latest version i.e. v25.2, if yes can you provide version for which it was working

If no could you please prioritise this at earliest because we bought license for Aspose.words for java. we are in the process of getting the license for paid support, we have a release planned in coming days and this is blocker for that release and we are heavily relying on Aspose.words

Hi @Kldv
We are also facing same issue , @alexey.noskov
Could you please ask your team to check on this as our customer uses a lot of docx files with tables and this is giving us very wrong picture in customer front.

@Kldv @abhishek.sonkar We have already completed analyzing the issue, but it is not yet scheduled for development. So we cannot provide you an estimate right now. We will keep you updated and let you know once it is resolved.
Once you get license for paid support, you can escalate the issue in paid support, this will push the issue upper in the queue.

Hi @alexey.noskov

In the following file, I also found that the revision information does not match the Aspose output.

Aspose shows the format changes count as 8, but according to MS Word, it is only 5.

Aspose output docx file:

aspose-output-format-changes.docx (29.2 KB)

A post was split to a new topic: Aspose.Words identifies changes that were not made upon comparing

@Kldv
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-28065

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Hi @alexey.noskov

Here I am providing more redlined files where we are encountering a count mismatch issue with MS word for revision information (highlighted in bold). As you mentioned in Getting incorrect revision information using Aspose.words Java library - #10 by alexey.noskov it is not only due to the Deleted Cells and SizeBi attribute issue. I found discrepancies in the insertions as well. Could you please review the files mentioned below and let us know what might be causing this issue?

File Name Insertions (Aspose/MS word) Deletions (Aspose/MS word) Format Changes (Aspose/MS word)
test_127.docx 04/05 02/03 06/05
test_128.docx 02/03 01/01 00/00
test_129.docx 01/01 01/01 24/02
test_132.docx 00/00 00/00 02/01
test_134.docx 18/18 04/04 43/103
test_135.docx 17/16 06/05 04/04
test_136.docx 15/14 06/05 04/04
test_140.docx 02/02 03/03 06/03
test_142.docx 18/21 25/25 1127/1110
test_144.docx 01/01 01/01 11/10
test_145.docx 11/10 08/07 29/18
test_146.docx 04/04 00/00 12/52
test_147.docx 04/05 00/00 03/03

Aspose-comparison output files listed above

count-mismatch.zip (6.7 MB)

this is the main feature we are offering for our users for the current release, and we are unsure in which scenarios Aspose might produce incorrect results as shown above.

@Kldv
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-28090, WORDSNET-28091, WORDSNET-28092, WORDSNET-28093, WORDSNET-28094, WORDSNET-28095, WORDSNET-28096, WORDSNET-28097, WORDSNET-28098, WORDSNET-28099, WORDSNET-28100, WORDSNET-28101

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

The problem is not reproducible on my side with the latest 25.3 version of Aspose.Words for test_135.docx and test_136.docx documents.