Style Definition Change Issue when comparing two DOCX documents

Hi,

When comparing two xml(word 2003 wordml) documents it is working fine, but I am facing issue when comparing two DOCX documents.

I am not comparing the two documents directly. Instead I am fetching bookmarks from the source document and extract the content and generate the document. Same way I am extracting the respective bookmark content from the destination document and generate the document. Then comparing the source and destination bookmark documents and checking the revisions.

The requirement is,User will upload wordml document(sample1.xml), I need convert to docx and save the docx content in DB. So using Aspose I have loaded the file as worml and converted that to docx using Saveformat.DOCX.

If user download the saved document, edit the content of the document and upload this.I need to compare the original document stored in DB with the modified document.

Here I have attached two documents one is the source document(xml converted to docx saved in DB sample.docx) and another one is destination(edited and uploaded docx document sample1.docx).

When comparing these two, the revisions show user edits along with style definition change for all the bookmarks which I haven’t modified.

In the below code, loaded the docx documents with load option as docx and compared the two docx documents.

The only difference is source document generated using Aspose is not edited using MSword, but destination document generated using ASPOSE and modified using MSword.


com.aspose.words.LoadOptions loadOptions = new LoadOptions();
loadOptions.setLoadFormat(LoadFormat.DOCX);
Document srcDoc = new Document(“C:\Users\muthu\Desktop\1454\Sample.docx”,loadOptions);
Document destDoc = new Document(“C:\Users\muthu\Desktop\1454\Sample1.docx”,loadOptions);
BookmarkCollection srcBookMarks = srcDoc.getRange().getBookmarks();
for (int i = 0; i < srcBookMarks.getCount(); i++) {
Bookmark srcBookMark = srcBookMarks.get(i);
String srcBookMarkName = srcBookMark.getName();
System.out.println(srcBookMarkName);
ArrayList srcExtractedNodesInclusive = extractContent(
srcBookMark.getBookmarkStart(),
srcBookMark.getBookmarkEnd(), true);
Document srcChoreoDoc = generateDocument(srcDoc,
srcExtractedNodesInclusive);
srcChoreoDoc
.save(“C:\debug\src_”
+ srcBookMarkName + “.docx”);
Bookmark destBookmark = destDoc.getRange().getBookmarks()
.get(srcBookMarkName);
if (destBookmark != null) {
ArrayList destExtractedNodesInclusive = extractContent(
destBookmark.getBookmarkStart(),
destBookmark.getBookmarkEnd(), true);
Document desChoreoDoc = generateDocument(destDoc,
destExtractedNodesInclusive);
desChoreoDoc
.save(“C:\debug\dst_”
+ srcBookMarkName + “.docx”);
srcChoreoDoc.compare(desChoreoDoc, “ChoreCompare”,
new Date());
for (Revision rev : srcChoreoDoc.getRevisions()) {
System.out.println(rev.getRevisionType());
}
}

Please let me know why the style definition changes are shown.

Thanks,
Muthulakshmi


Hi Muthu,


Thanks for your inquiry. Well, you’re using general purpose methods in your code such as “extractContent” and “generateDocument”. For your special needs you need to know all aspects of these methods and customize them to be able to meet your requirements. I have modified “generateDocument” method for example to improve the results.

public static Document generateDocument(Document srcDoc, ArrayList nodes) throws Exception
{
Document dstDoc = (Document)srcDoc.deepClone();
dstDoc.getFirstSection().getBody().removeAllChildren();

NodeImporter importer = new NodeImporter(srcDoc, dstDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
for (Node node : (Iterable) nodes)
{
Node importNode = importer.importNode(node, true);
dstDoc.getFirstSection().getBody().appendChild(importNode);
}

return dstDoc;
}

Hope, this helps.

Best regards,