Java SDK to Compare Word Documents Offline | Get Differences as Insertion Deletion Edit and Format Change Revisions

Hi Team,

There is some urgent requirement for me to compare two or more documents (pdf/word) and generate an output summary document that will highlight the changes or the differences of all the documents. Is there any offline JAVA SDK available in aspose so that I can use that to meet the requirement.

I don’t want to upload my documents to any third party vender’s cloud or website as its confidential.

Currently, I found an online SDK i.e. GROUPDOCS to perform the same requirement but it uploaded my documents to the Group doc cloud and also generating the summary document on their cloud platform only.

The client has already owned aspose.words license. Is there any service needs to be purchase for this requirement then please let us know.

Documents.zip (30.7 KB)

Please find the attached sample documents.
target document,source document and the final output document.

Any help on this will be appreciated.

Thanks in Advance!!.

@pratik.uddeshya1,

You can compare only Word documents (see supported Word file formats) by using Aspose.Words for Java API and yes, you can perform all document processing tasks offline. And there is no need to purchase any additional APIs to be able to compare Word documents.

You can use the Document.Compare method to compare two Word documents to see the difference between them. This method mimics Microsoft Word’s Compare feature (see ms word compare.png (12.2 KB)) and produces document difference as a number of edit and format revisions. The main idea is that if we reject all revisions then we get a document which is equal to the original document. On the contrary, if we accept all revisions then we get the final (comparison target) document. For more details, please refer to the following section of documentation:

Regarding comparison of PDF files using Java APIs of Aspose, please post a separate thread in Aspose.PDF Product Family where you will be guided appropriately.

1 Like

@awais.hafeez
we’ve gone through the code snippets [How to Compare Two Word Documents] which only tells that files are equal or not , However we couldn’t find code snippets which generate a document which contains differences of both the files.

Can you please share the code snippet/ link. if offline SDK does support the requirement.

(https://docs.aspose.com/words/java/compare-documents/)

@naveekhan,

The following Java code example shows how to apply the compare method to two documents, use the results and then save the document with revisions on disk.

Document doc1 = new Document();
DocumentBuilder builder = new DocumentBuilder(doc1);
builder.writeln("This is the original document.");

// You can load it from disk as well
// Document doc1 = new Document("C:\\Temp\\Original.docx");

Document doc2 = new Document();
builder = new DocumentBuilder(doc2);
builder.writeln("This is the edited document.");

// You can load it from disk as well
// Document doc2 = new Document("C:\\Temp\\Revised.docx");

// If either document has a revision, an exception will be thrown
if (doc1.getRevisions().getCount() == 0 && doc2.getRevisions().getCount() == 0)
{
    doc1.compare(doc2, "authorName", new Date());
}

// If doc1 and doc2 are different, doc1 now has some revisions after the comparison, which can now be viewed and processed
for (Revision r : doc1.getRevisions())
{
    System.out.println("Revision type: " + RevisionType.getName(r.getRevisionType()) +
            ", on a node of type " + NodeType.getName(r.getParentNode().getNodeType()));
    System.out.println("\tChanged text: " + r.getParentNode().getText());
}

// All the revisions in doc1 are differences between doc1 and doc2, so accepting them on doc1 transforms doc1 into doc2
// doc1.getRevisions().acceptAll();

// doc1, when saved, now resembles doc2
doc1.save("C:\\Temp\\Document.Compare.docx");

@awais.hafeez,

Thanks for the sample code. It is working as expected. However, we have identified one issue.
We are comparing sourceDoc.docx and targetDoc.docx, in which we haven’t changed anything in their table but still we are getting a formatted changed comment in the outputDoc.docx and the radio button image is coming as strikes in the outputDoc.docx as you can see in the attached document.

I am attaching the sample document for your reference.

Please let us know why this behavior is coming into the output document.
And if there is any solution for this then please let us know.

SampleDoc.zip (218.3 KB)

@pratik.uddeshya1,

For the sake of any correction in Aspose.Words API, we have logged this problem in our issue tracking system with ID WORDSNET-20987. We will further look into the details of this problem and will keep you updated on the status of linked issue. We apologize for your inconvenience.

I have compared the two documents by using MS Word 2019 and attached the output DOCX here for your reference:

Do you see this problem in above MS Word 2019 generated DOCX? Please also create and attach a comparison screenshot (which highlights (encircles) the problematic areas in Aspose.Words 20.8 generated DOCX (with respect to MS Word generated DOCX)) here for our reference. We will then investigate this issue further on our end and provide you more information.

@awais.hafeez,

Please find the attached required document.
I have tried with Aspose version 20.8 and MS office as well.
Still, the issue persists in both the documents.

Please let me know if anything will be required from my end regarding the same.

RequiredDoc.zip (59.8 KB)

@pratik.uddeshya1,

It seems the issue on your end is appearing because you have set Review > “Display for Review” option of MS Word to “All Markup”. The issue should go away when you change it to “Simple Markup”.

@awais.hafeez

The issue still persists you can test on the above-shared sample documents.
After comparison once we are clicking on the red sidebar on the left end of the output document.
The unchanged image is still showing as changed.

Best Regards,
Uddeshya Pratik

Hey @awais.hafeez,

Please help us to get the solution for this, it will be great if you’ll give some input on this.

Best Regards,
Uddeshya Pratik

@pratik.uddeshya1,

MS Word 2019 produces a total of 24 Revisions in output document after comparing and Aspose.Words 20.8 produces a total of 25 Revisions in output document. The problem of extra (Formatted Table) Revision will be addressed by WORDSNET-20987. Other than that, Aspose.Words mimics the behavior of MS Word 2019 in this case.

Can you please also summarize your remaining problem/requirement with the help of comparison screenshot along with the source code that you are currently using to compare documents on your end?

@awais.hafeez

Please find the required sample documents ,code and the issue description sheet.

Note : The issue is coming only for images irrespective of it is coming as part of the header , table, and footer even comments are not getting generated for the same.

Best Regards,
Uddeshya PratikSamples.zip (750.7 KB)

@pratik.uddeshya1,

Aspose.Words tries to mimic the behavior of MS Word i.e. if you compare these source and target Word documents by using MS Word, you will see that MS Word will also inject Revisions against Table in final document (see Comparison by ms word 2019.zip (102.5 KB)). However, you can instruct Aspose.Words not to compare Tables during comparison by using the following code;

Document doc = new Document("C:\\temp\\Samples\\sourceDoc.docx");
Document doc2 = new Document("C:\\temp\\Samples\\targetDoc.docx");

CompareOptions compareOptions = new CompareOptions();
compareOptions.setIgnoreTables(true);
doc.compare(doc2, "AW", new Date(), compareOptions);

doc.save("C:\\temp\\Samples\\awjava-20.8-IgnoreTables.docx");

Please open output.docx with MS Word. Go to Review tab > Reviewing Pane Vertical to learn about the Revision inserted against image (see image insertion revision.png (260.9 KB))

The issues you have found earlier (filed as WORDSNET-20987) have been fixed in this Aspose.Words for .NET 21.4 update and this Aspose.Words for Java 21.4 update.