Aspose.Words to HTML

Hi,
We are trying to convert docx to html, but facing different issues regarding layout. Following are the two ways we tried converting:

  1. Using HtmlSaveOptions:

         htmlSaveOptions.setCssStyleSheetType(CssStyleSheetType.EMBEDDED);
         htmlSaveOptions.setExportImagesAsBase64(true);
         htmlSaveOptions.setExportFontsAsBase64(true);
         htmlSaveOptions.setUseHighQualityRendering(true);
         htmlSaveOptions.setEncoding(StandardCharsets.UTF_8);
         htmlSaveOptions.setPrettyFormat(true);
         
         com.aspose.words.Document wordsDocument = new com.aspose.words.Document(new ByteArrayInputStream(FileUtils.readFileToByteArray(new File("C:\\Project_Data\\sblc\\Enhancement at TCFE322.docx"))));
    
       wordsDocument.save(outputStreamHtml, htmlSaveOptions);
    

With this approach, the converted html is not properly aligned and has missing fields as well as many bookmarks attached.

So we tried another approach:

  1. Using HtmlFixedSaveOptions:

        com.aspose.words.HtmlSaveOptions htmlSaveOptions = new com.aspose.words.HtmlSaveOptions();
         htmlSaveOptions.setCssStyleSheetType(CssStyleSheetType.EMBEDDED);
         htmlSaveOptions.setExportImagesAsBase64(true);
         htmlSaveOptions.setExportFontsAsBase64(true);
         htmlSaveOptions.setUseHighQualityRendering(true);
         htmlSaveOptions.setEncoding(StandardCharsets.UTF_8);
         htmlSaveOptions.setPrettyFormat(true);
    
         com.aspose.words.Document wordsDocument = new com.aspose.words.Document(new ByteArrayInputStream(FileUtils.readFileToByteArray(new File("C:\\Project_Data\\sblc\\Enhancement at TCFE322.docx"))));
    
        wordsDocument.save(outputStreamHtml, htmlSaveOptions);
    

We this approach the converted html looks fine but still first two fields are getting overlapped. And also the converted html size is 1135 Kb, whereas with first approach it was just 64 Kb.

===================================================================

I’ve attached docx file. Please help with the same.

Thanks!Enhancement at TCFE322.docx (25.9 KB)

@surajnayak57
We have reproduced the issue with the incorrect layout. Could you please send us the correct code that you use in HtmlFixed conversion case, so that the fix could exactly meet your expectations? The code you sent seems to be just a copypaste of the first case. Please also send us the output documents you get in both Html and HtmlFixed cases.
Please ZIP and upload these files here.
We will check the issue and provide you more information.

Hi,

PFA for the requested files.

Also please notice at converted file size apart from the layout problems.

Thanks!Aspose_WordToHtml_Issue.zip (231.5 KB)

@surajnayak57 Thank you for additional information. I have logged it as WORDSNET-23411, WORDSNET-23412 and WORDSNET-23413. We will keep you informed and let you know once it has been resolved.
As for the differences in bookmarks, this is due to the fact that you process them differently: in one case you delete them all, and in the other you don’t delete them, since your condition is never met.

Hi,

No, I had tried removing that condition too. In HtmlSaveOptions, anchors are getting created inside

, may be because of that it is able to detect them as bookmarks. Please check after removing that condition.

Thanks!

@surajnayak57 I open the original “Enhancement at TCFE322.docx”, then open “Enhancement_at_TCFE322_Using_HtmlSaveOptions.html” with commented lines for removing bookmarks. At the output I get the same number of bookmarks - 6 pieces. Please write the name of the bookmark that you think gets lost.

Hi,

Yes those 6 bookmarks only, I wanted to remove them but below code is not detecting it.

for(Bookmark bm: wordsDocument.getRange().getBookmarks()) {
bm.remove();
}

I am guessing that it because those bookmarks are in

tag. Let me know how can I remove those bookmarks. Because I’m printing this html content inside TincyMCE in UI. so there I am getting black bookmarks for all these anchor tags.

@surajnayak57 In my case, all bookmarks are deleted.Enhancement_at_TCFE322_Using_HtmlSaveOptions.zip (10.1 KB)

@surajnayak57 As regards increase of the output document size in HtmlFixed format, this is due to Shading conversion of runs. In case of export in HtmlFixed format, in the background of each run symbol an image is drawn that implements Shading effect. When you perform the conversion with ExportEmbeddedImages option turned off, the image is present as a separate file in a single copy and gets a link in all places of use in the resulting html. When you perform the conversion with ExportEmbeddedImages option enabled, the image is included in html text as Base64 code in all places of use. This leads to increase in the resulting html document size.
The rest of your questions are being discussed now. We will inform you within this forum thread as soon as we have more information.

The issues you have found earlier (filed as WORDSNET-23412) have been fixed in this Aspose.Words for Java 22.4 update.