Convert DOCX to HTML & HTML to Word using Java | Compare Final Document with Original DOCX | Track Changes (Revisions)

Simple example:

Document doc = new Document("C:\\bug\\test1.docx");

String pHtml  = doc.getFirstSection().getBody().toString(SaveFormat.HTML);
DocumentBuilder builder = new DocumentBuilder();
builder.insertHtml(pHtml);
Document newDoc = builder.getDocument();
newDoc.save("C:\\bug\\test11.docx");

doc.compare(newDoc, "me", new Date());
doc.save("C:\\bug\\test123.docx");

problem: after export to html document and reimport into aspose, in result documet space symbol changed. And after compare I have revision, but document didn’t change. How to fix this case? http://joxi.ru/a2XaaZBHwylYXA - visual explain
bug.zip (22.9 KB)

@handmade,

You can workaround this issue by using the following code:

Document doc = new Document("D:\\Temp\\bug\\test1.docx");

HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.HTML);
String pHtml  = doc.getFirstSection().getBody().toString(opts);

DocumentBuilder builder = new DocumentBuilder();
builder.insertHtml(pHtml);
Document newDoc = builder.getDocument();
newDoc.save("D:\\temp\\bug\\html-doc.docx");

doc.compare(newDoc, "me", new Date());

doc.save("D:\\Temp\\bug\\awjava-18.7.doc");

We tested the scenario and have managed to reproduce the same problem on our end. For the sake of correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSNET-17198. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

@handmade,

Please try running the following code with latest Aspose.Words for Java i.e. 19.6 on your end:

Document doc = new Document("E:\\Temp\\bug\\test1.docx");

HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.HTML);
String pHtml = doc.getFirstSection().getBody().toString(opts);

DocumentBuilder builder = new DocumentBuilder();
builder.insertHtml(pHtml);
Document newDoc = builder.getDocument();
newDoc.save("E:\\temp\\bug\\html-doc.docx");

doc.compare(newDoc, "me", new Date());

doc.save("E:\\Temp\\bug\\awjava-19.6.doc"); 

Attachments: Docs.zip (17.0 KB)

Please see revision in msw-2019-compared.docx document (using Revision pane). There are few formatting revisions but you originally complained about that MS Word 2016 do not make revisions. Aspose.Words output pretty much matches with the output of MS Word 2019. Please tell if there still exist any problems?

Moreover, please see details below to know how msw-2019-compared.docx was produced on our end.

  • Open original test1.docx with MS Word 2019. For us it is opened in Compatibility Mode by default.
  • Go to Review -> Compare
  • Specify ‘Original Document’ as test1.docx and ‘Revised Document’ as html-doc.docx (it is also attached)
  • Press OK.
  • Go to File -> Save As -> and save as .docx file This was how msw-2019-compared.docx was produced.

@vhostt,

Regarding WORDSNET-17198, it is to inform you that we have decided to close this issue because of lack of further information from your end. In case you may have further inquiries or may need any help in future, please let us know. Then we will create a new ticket and look into it further.

The issues you have found earlier (filed as WORDSNET-17198) have been fixed in this Aspose.Words for .NET 20.6 update and this Aspose.Words for Java 20.6 update.