Converting .doc to .html to .doc

Hye,
I want to give the possibility to the user of my java application to open a word document and add mergefields to it. To do so, I presume i have to use that workflow :

  • convert .doc to .html so I can open the document in a JPanel
  • user place the cursor where he wants to add the mergefield and click to insert
  • my app add the mergefield in html code then take the all html source code as a string and convert it back to .doc

Is that workflow the right way to do what i want to ?
I’m using aspose words 13.12.0.0 and i’m facing severals problems : bad converting .doc to .html (especially with tables), .OutOfMemoryError when converting .html to .doc or if i succeed, the new word document contains the html markup…
I’m just starting with aspose api and i’m not convince yet that what i want to do is possible…
Would I be able to properly convert my .doc to .html, insert new markup, then convert it back to .doc without making a mess of it ?
Regards
Yoann

Hi Yoann,

Thanks for your inquiry. I suggest you please upgrade to the latest version of Aspose.Words for Java 15.5.0 and let us know if you face any issue while using Aspose.Words.

Please use HtmlSaveOptions.ExportRoundtripInformation to specify whether to write the roundtrip information when saving to HTML, MHTML or EPUB. Default value is true for HTML and false for MHTML and EPUB.

Saving of the roundtrip information allows to restore document properties such as tab stops, comments, headers and footers during the HTML documents loading back into a Document object. When true, the roundtrip information is exported as -aw-* CSS properties of the corresponding HTML elements.

Could you please share which merge field code you are inserting in Html and save the output to Docx? We will then provide you more information on this along with code.

Hi,

I didn’t try yet to insert any code as i’m not even able to convert properly my doc to html then html to doc and find back my original word document.

I tried many things, working with Stream, with files etc…

Here is the code i’m using :

  • import and convert .doc :
File file = jFileChooser.getSelectedFile();
InputStream stream = new FileInputStream(file);
doc = new Document(stream);
HtmlFixedSaveOptions options = new HtmlFixedSaveOptions();
options.setExportEmbeddedCss(true);
options.setExportEmbeddedFonts(true);
options.setExportEmbeddedImages(true);
options.setExportEmbeddedSvg(true);
ByteArrayOutputStream dstStream = new ByteArrayOutputStream();
doc.save(dstStream, options);
ByteArrayInputStream srcStream = new ByteArrayInputStream(dstStream.toByteArray());
InputStreamReader reader = new InputStreamReader(srcStream);
int data = reader.read();
StringBuilder texte = new StringBuilder();
while(data != -1){
    char theChar = (char) data;
    data = reader.read();
    texte.append(theChar);
}
reader.close();
editeurHtmlJPanel.setHTMLText(texte.toString());

I join an image of what i get in my htmleditorJPanel.

  • convert html to doc :
File fichierHtml = DesktopUtil.createFile("out.html", true, Session.getInstance().getAppDirTmp());
File fichierDoc = DesktopUtil.createFile("out.doc", true, Session.getInstance().getAppDirTmp());
HtmlFixedSaveOptions options = new HtmlFixedSaveOptions();
options.setExportEmbeddedCss(true);
options.setExportEmbeddedFonts(true);
options.setExportEmbeddedImages(true);
options.setExportEmbeddedSvg(true);
ByteArrayInputStream srcStream = new ByteArrayInputStream(editeurHtmlJPanel.getHTMLText().getBytes());
Document docu = new Document(srcStream);
docu.save(fichierHtml.getPath(),options);
docu = new Document(fichierHtml.getPath());
docu.save(fichierDoc.getPath(),SaveFormat.DOC);
DesktopUtil.open(fichierDoc);

create a OutofMemoryException…

I’m for now doing a study of what would we be able to do with aspose… I don’t need to succeed right now, but i want to know if it worth spending time on it… Are you confident enough on your product to say that aspose wil be able to perform these tasks properly ?

Hi Yoann,

Thanks for sharing the detail. Please note that
Aspose.Words mimics the same behavior as MS Word does. Aspose.Words
converts the MS Word documents to html (MS Word save option “Web Page, Filtered”). If you convert your document to HTML by using MS Word, you will get the same output.

Moreover, upon processing HTML, some
features of HTML might be lost. You can find a list of limitations upon
HTML exporting/importing here:
https://docs.aspose.com/words/java/load-in-the-html-html-xhtml-mhtml-format/

In your case, I suggest you please use HtmlSaveOptions and use following solution to achieve your requirements.

  1. Insert bookmark in html e.g html_mergefield.
  2. Open the html in Aspose.Words DOM
  3. Move the cursor to the bookmark using DocumentBuilder.moveToBookmark method
  4. Insert mail merge field using DocumentBuilder.insertField method

Hope this helps you. Please let us know if you have any more queries.