Hi team
We are using Aspose words java where we are seeing in table content
- Compare attached File3, File4 and validate docx resulted file.
Actual:- Table format changed but actually user not updated the same, please refer resulted file 2 for more reference.
File 3.docx (58.9 KB)
File 4.docx (38.3 KB)
PFA files for reference .
We used following code snipped.
private void compareDocuments(Document original, Document revised, ComparisonRequest comparisonRequest)
throws Exception {
if (original.hasRevisions())
original.acceptAllRevisions();
if (revised.hasRevisions())
revised.acceptAllRevisions();
original.compare(revised, "Author", new java.util.Date());
// Save the comparison result
updateRevisionProperties(original);
original.save(
comparisonRequest.getOutputFileDirectory() + File.separator + comparisonRequest.getOutputFileName());
PdfSaveOptions saveOptions = new PdfSaveOptions();
// for accessibility
saveOptions.setCompliance(PdfCompliance.PDF_UA_1);
original.save(comparisonRequest.getOutputPDFFileDirectory() + File.separator
+ comparisonRequest.getOutputPDFFileName(), saveOptions);
}
@abhishek.sonkar As I can see MS Word also detects changes in the table. Here is the comparison result produced by MS Word on my side:
ms.docx (39.6 KB)
So Aspose.Words behavior is expected.
I inspected your documents and as I can see your File 3.docx
has been produced by an old 16.2 version and File 4.docx
has been edited or simply open/saved by MS Word or other tool. This operation changed some properties in the table.
Before (File 3.docx
):
<w:tblPr>
<w:tblStyle w:val="TableGrid" />
<w:tblW w:w="0" w:type="auto" />
<w:tblInd w:w="0" w:type="dxa" />
<w:tblBorders>
<w:top w:val="none" w:sz="0" w:space="0" w:color="auto" />
<w:left w:val="none" w:sz="0" w:space="0" w:color="auto" />
<w:bottom w:val="none" w:sz="0" w:space="0" w:color="auto" />
<w:right w:val="none" w:sz="0" w:space="0" w:color="auto" />
<w:insideH w:val="none" w:sz="0" w:space="0" w:color="auto" />
<w:insideV w:val="none" w:sz="0" w:space="0" w:color="auto" />
</w:tblBorders>
<w:tblLayout w:type="fixed" />
<w:tblCellMar>
<w:top w:w="0" w:type="dxa" />
<w:left w:w="108" w:type="dxa" />
<w:bottom w:w="0" w:type="dxa" />
<w:right w:w="108" w:type="dxa" />
</w:tblCellMar>
<w:tblLook w:val="04A0" />
</w:tblPr>
After (File 4.docx
):
<w:tblPr>
<w:tblStyle w:val="TableGrid"/>
<w:tblW w:w="0" w:type="auto"/>
<w:tblBorders>
<w:top w:val="none" w:sz="0" w:space="0" w:color="auto"/>
<w:left w:val="none" w:sz="0" w:space="0" w:color="auto"/>
<w:bottom w:val="none" w:sz="0" w:space="0" w:color="auto"/>
<w:right w:val="none" w:sz="0" w:space="0" w:color="auto"/>
<w:insideH w:val="none" w:sz="0" w:space="0" w:color="auto"/>
<w:insideV w:val="none" w:sz="0" w:space="0" w:color="auto"/>
</w:tblBorders>
<w:tblLayout w:type="fixed"/>
<w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/>
</w:tblPr>
As you can see table properties differs. Most likely this differences are detected by both Aspose.Words and MS Word comparison algorithms.
Thank you for this response @alexey.noskov , could you please help me how we can see this XML data so that we don’t encounter it again . Thank you for quick reply .
I’ll check it again from my side .
@abhishek.sonkar You can simply unzip DOCX document (DOCX is a ZIP archive with XML files inside) and inspect document.xml
.
Hi @alexey.noskov
I tried opening the docx file with unzip folder but I am failing to do so , could you please help me how you are doing it . Thank you .
@abhishek.sonkar You can simply change file extension to .zip
and unzip as any other zip archive.
Hi @alexey.noskov
I tried doing it but it is still not opening the file , could you share a small video to show that?
@abhishek.sonkar I usually use 7-Zip:
but if change extension to ZIP, you can use standard Windows tools to unpack archive:
Actually I was facing this issue as I was opening from macOS , I got this link checking if this works or not
https://superuser.com/questions/278260/how-do-i-see-the-xml-of-my-docx-document
@abhishek.sonkar You are right, for some reason standard Archive Utility
on MacOS cannot unzip DOCX document. but I can successfully unzip it using Unzip One
tool.
Hi @alexey.noskov
Thank you for this inputs , I have one more doubt how you are comparing the files in case xmls are big is there any defined tools for comparison of xml?
@abhishek.sonkar I usually use Turtoise SVG
to compare text files. in case of XML it is also convenient to make them to be pretty formatted. I use XML Notepad
or Visual Studio
(Ctrl+K
Ctrl+D
).
Hi @alexey.noskov
We have talked to our stakeholders and they mentioned that extra lines issue should be not visible if the any customer opened and saved docx file from any other version of docx because it shows incorrect information about the comparison details.
@abhishek.sonkar As it was mentioned above Aspose.Words mimics MS Word behavior upon comparing documents. MS Word also shows difference in the documents you have provided in your initial post. So there is no problem with Aspose.Words comparison algorithm here.
Hi @alexey.noskov can we connect over a call as in docx also I can see the difference to be only 1 insertion for few files but code is showing different values.
@abhishek.sonkar Unfortunately, I do not see to a way to correct this via code. Since making changes to the document might lead to other unexpected revisions in the comparison output.