I am doing comparison between two docx files. The resultant files formatting changes a bit especially when it comes to lists or nested items.
original_content = await original.read()
modified_content = await modified.read()
original_doc = aw.Document(BytesIO(original_content))
original_doc.update_fields()
original_doc.accept_all_revisions()
modified_doc = aw.Document(BytesIO(modified_content))
modified_doc.update_fields()
modified_doc.accept_all_revisions()
compare_options = CompareOptions()
compare_options.ignore_formatting = True
compare_options.ignore_footnotes = True
compare_options.ignore_headers_and_footers = True
original_doc.compare(modified_doc, "Comparer", datetime.datetime.now(), compare_options)
output_stream = BytesIO()
original_doc.save(output_stream, aw.SaveFormat.DOCX)
output_stream.seek(0)
return StreamingResponse(output_stream, media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document")
@LikithaThatikonda
When comparing two DOCX files using Aspose.Words, it is common to encounter formatting changes, particularly with lists or nested items. To minimize these formatting issues during the comparison, you can utilize the CompareOptions
class to ignore formatting differences.
In your code, you have already set compare_options.ignore_formatting = True
, which is a good approach. This setting ensures that the comparison focuses solely on the textual content, disregarding any formatting changes.
If you still notice formatting discrepancies, consider the following:
-
Review Other Options: Ensure that other relevant options, such as ignore_headers_and_footers
and ignore_footnotes
, are configured according to your needs.
-
Granularity: Adjusting the granularity of the comparison may yield better results, especially for nested lists. For instance, comparing at the word level can enhance accuracy.
Here’s an example of how to set the granularity:
compare_options.granularity = aw.Granularity.WordLevel # Compare at word level
By focusing on the content rather than formatting, you should achieve a more accurate comparison result.
If you continue to experience issues, please provide more details about the specific formatting changes you are observing.
@LikithaThatikonda Could you please attach the problematic input documents here for testing? We will check the issue and provide you more information.
The comparison is good. But the document formatting changes making it look different than the original document.
Sure, @alexey.noskov
Attached the two documents and the response after comparison. Here the position of the text changes. for the list
Original document:
doc_1.docx (5.0 MB)
Modified document:
doc_2.docx (46.9 KB)
Response:
doc_res.docx (3.9 MB)
In the output the ordering of the lists is off. This is just one sample document. There are multiple scenarios where in case of nested lists the document formatting gets affected.
@LikithaThatikonda As I can see numbering in the output document is the same as in the comparison output produced by MS Word: ms.docx (49.3 KB)
Could you please check on your side?
Hi @alexey.noskov,
Thank you for your response.
I agree numbering is correct. My enquiry is about the formatting - where there are additional tabs and spaces and the document looks odd than before.
This is happening wherever there are nested lists.
Adding screenshots for your reference.
This is one scenario. Any case which has such lists the formatting goes wrong. The alignment gets distorted a lot. Is there any way to avoid this?
@LikithaThatikonda This occurs because node structure in the compared documents is different. Please see the screenshot:
As you can see in one of the documents content is moved right using tabs, but in another is is moved right using left indent.
You should note, Aspose.Words, the same as MS Word, does note compare documents visually, the documents nodes structure is compared.
Thank you for the explanation. It helps me to understand the issue better. Is there a way to overcome this?
@LikithaThatikonda I am afraid there is no way to overcome this. The behavior is expected and Aspose.Words properly detects changes in the document.