We have 2 identical PDF documents that when compared using the code below return 6 differences which are in fact either an insertion or deletion of null text.
Dim _Document1 As New Aspose.Words.Document(iDocument1)
Dim _Document2 As New Aspose.Words.Document(iDocument2)
_Document1.Compare(_Document2, "Caseflow", DateTime.Now)
oDifferences = _Document1.Revisions.Count
The 2 documents are identical with Adobe (full version) reporting no differences and also online comparison tools (draftable.com & diffchecker.com all reporting no differences.
Why is Aspose showing differences?
Thanks
Summary
Document 1.pdf (252.8 KB)
Document 2.pdf (252.8 KB)
@caseflow,
Aspose code finds differences in images containing signatures located on the third page. Despite the fact that in both pdf documents these images are identical pair-wise, their definitions order is different. The source documents can be simplified to demonstrate the essence of the issue. Please consider the attached documents Doc1.docx and Doc2.docx, both documents are visually indistinguishable and contain two absolutely identical shapes. The only difference is that the circle-shape is written first, and the square-shape is written second in Doc1.docx, and the square-shape is written first, and the circle is written second in Doc2.docx. If we now perform the comparison using VBA in MS Word editor for the first document Doc1.docx
Sub CompareDocument()
ActiveDocument.Compare Name:="Doc2.docx", _
CompareTarget:=wdCompareTargetNew
End Sub
we can see that MS Word shows the differences by suggesting to remove the circle-shape, add the square-shape, etc. Aspose mimics MS Word behavior in this case. You will reveal the same differences after running your source code with these simplified documents.
Doc1.docx (14.5 KB)
Doc2.docx (14.5 KB)
Thanks Vadim for the prompt response.
Our issue is that we are using the Words.Net compare feature to compare 2 signed PDF documents from DocuSign. The PDF’s are returned to us by the DocuSign API and we need to determine if we have a new version of a signed document or just another copy of the same document (if it’s the same we can ignore).
Based on this requirement, is there a way to ignore the changes that you observed in our 2 PDF documents within the current Aspose function set?
@caseflow
Unfortunately, Document.Compare() process is not very suitable for comparing shapes written in different document positions.
Otherwise, it would be necessary to carry out a rather deep analysis of the revisions obtained as a result of the comparison, namely, analyze _Document1.Revisions[index].ParentNode values. In this case, there can only be Shape objects. It is necessary to analyze the identity of these shapes content. The identity can be established by calculating Shape.ImageData.ImageBytes checksum.
In your case, identical shapes are in revisions 0 and 4, 1 and 2. But there can be more than two swapped shapes, and in this case the analysis will be even more difficult.
Comparing documents before their processing by DocuSign is also worth consideration.