Strict per-word comparison

With Granularity = WORD_LEVEL, compare frequently groups several adjacent words into a single delete/insert, instead of clear per-word revisions.

CHAR_LEVEL — not suitable for our use case and significantly slower on larger docs.

Environment:
Aspose.Words for Python via .NET v25.7.0 · Python 3.11.0 · Windows 11 · Word 2016 compatibility

Is multi-word grouping at WORD_LEVEL by design? Any way to enforce stricter per-word behavior?

@SKDev

The behavior you’re experiencing with Granularity = WORD_LEVEL, which groups adjacent words into a single delete/insert, is indeed by design. This optimization in the comparison process can lead to multiple words being treated as a single unit during revisions.

Currently, there is no built-in option to enforce stricter per-word behavior while using WORD_LEVEL. If CHAR_LEVEL is not suitable due to performance concerns, I recommend implementing a custom comparison logic that processes the text at a more granular level after obtaining the results from the WORD_LEVEL comparison.

Here’s a clear plan to address this:

  1. Perform the comparison using WORD_LEVEL to get the initial results.
  2. Post-process the results to split any grouped changes into individual word changes. This involves iterating through the results and checking for adjacent words that were modified together.
  3. Reconstruct the changes to reflect individual word modifications.

Here’s a minimal code snippet to illustrate the idea:

from aspose.words import Document, CompareOptions

# Load documents
doc1 = Document("original.docx")
doc2 = Document("modified.docx")

# Set comparison options
options = CompareOptions()
options.granularity = CompareOptions.WordLevel

# Compare documents
doc1.compare(doc2, "Author", options)

# Post-process the changes
for change in doc1.get_changes():
    # Logic to split changes into individual words if needed
    pass

This approach allows you to maintain the performance benefits of WORD_LEVEL while achieving the granularity you need. If you have any further questions or need additional assistance, please feel free to ask!

@SKDev This is by design. Aspose.Words behavior is the same as MS Word behavior when you use word level comparison granularity:

Unfortunately, there is no way to enforce stricter per-word behavior.

Thanks for confirmation. I’m trying to replicate MS Word Compare behavior using Aspose.Words (Python, v25.7.0) with these options:

opts = aw.comparing.CompareOptions()
opts.granularity = aw.comparing.Granularity.WORD_LEVEL
opts.compare_moves = True
opts.ignore_case_changes = False
opts.ignore_comments = False
opts.ignore_formatting = False
opts.ignore_tables = False
opts.ignore_fields = False
opts.ignore_footnotes = False
opts.ignore_textboxes = False
opts.ignore_headers_and_footers = False  

adv = opts.advanced_options
adv.ignore_dml_unique_id = True
adv.ignore_store_item_id = True

Even with these flags, Aspose often clubs multiple words into a single insert/delete, whereas in MS Word this rarely happens.

Questions:

  1. Do these flags best replicate MS Word’s Compare settings?
  2. In MS Word there’s check box for “white space”. What’s the equivalent in Aspose?

@SKDev Could you please attach your problematic input document where Aspose.Words comparison behavior does not match MS Word behavior? We will check the issue and provide you more information.

ORGINAL.docx (17.1 KB)

REVISED.docx (16.7 KB)

MS_Compare_RL.docx (20.3 KB)

example_from_ASPOSE_compare

Please find attached the original and revised sample documents, along with the redlined document generated using MS Word Compare.

Additionally, I have included a screenshot highlighting section 4, where the phrase “Promptly notify the Disclosing Party in writing of any” is entirely struck off and re-inserted by Aspose.Words, whereas MS Word Compare handles it differently by keeping the changes inline without striking off the entire phrase.

For reference, here is the snippet we used for document comparison:

org.compare(
    document=mod,
    author="SKDev",
    date_time=datetime.datetime.now(),
    options=opts
)

(The opts configuration has already been shared earlier.)

Please note that MS Word Compare settings are identical to the ones shown in your screenshot, except that we did not enable the Headers and Footers option.

Could you please advise what changes are required in the comparison options or configuration so that Aspose.Words produces results consistent with MS Word Compare behavior?

@SKDev
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-28607

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.