HTML to docx conversion is missing fonts and images

Hi Team, Am using ASPOSE words to load docx to html in browser to edit , its shows as one page in browser , page separation is not shown in browser. Please let me know your suggestions to show page separation with SaveFormat.HTML.

Note:- I have tried with SaveFormat.HTMLFIXED, which is having page separation( writing file as .html and loaded the same into browser to edit). After edit in browser, when i write HTML to docx , font , images are missing the docx file. i have tried to set the font, images folder path explicitly or copied fonts to windows, still issue is there . Looks like path is not recognized

Regards
Arul

@arulsundarama

Can you please provide more details about how you are loading the DOCX file into HTML and the specific code you are using for the conversion? Additionally, what method are you using to save the edited HTML back to DOCX?

Hi , Below is the code to load docx to html and html to docx. please review
Code to convert Docx to html

Document doc = new Document(myDir + fileName);
// Create a new document for printing
Document printDoc = doc.deepClone();
DocumentBuilder docBuilder = new DocumentBuilder(printDoc);
// Set margins for the print document
PageSetup pageSetup = printDoc.getFirstSection().getPageSetup();
pageSetup.setLinesPerPage(pageSetup.getLinesPerPage());
//Page Orientation
pageSetup.setLayoutMode(Orientation.PORTRAIT);
//Paper Size
pageSetup.setPaperSize(pageSetup.getPaperSize());
//margin
pageSetup.setTopMargin(pageSetup.getTopMargin());
// Set top margin
pageSetup.setBottomMargin(pageSetup.getBottomMargin()); // Set bottom margin
pageSetup.setLeftMargin(pageSetup.getLeftMargin());
// Set left margin
pageSetup.setRightMargin(pageSetup.getRightMargin());
// Set right margin
printDoc.updatePageLayout();

HtmlSaveOptions htmlSaveOptions = new HtmlSaveOptions(SaveFormat.HTML);
htmlSaveOptions.setExportFontResources(true);
htmlSaveOptions.setExportFontsAsBase64(true);
htmlSaveOptions.setExportRoundtripInformation(true);
htmlSaveOptions.setExportDocumentProperties(true);
htmlSaveOptions.setExportPageSetup(true);
htmlSaveOptions.setExportPageMargins(true);
htmlSaveOptions.setExportImagesAsBase64(true);
htmlSaveOptions.setPrettyFormat(true);
htmlSaveOptions.setExportTocPageNumbers(true);
htmlSaveOptions.setCssStyleSheetType(CssStyleSheetType.EMBEDDED);
htmlSaveOptions.setExportHeadersFootersMode(ExportHeadersFootersMode.FIRST_SECTION_HEADER_LAST_SECTION_FOOTER);
htmlSaveOptions.setEncoding(Charset.defaultCharset());// Encoding.UTF8; how to set "chunked"
String html = printDoc.toString(htmlSaveOptions);

Code to convert HTML to Docx

DocSaveOptions outputDocOptions = new DocSaveOptions(); 
outputDocOptions.setPrettyFormat(true);
Document inputDocx = new Document();
DocumentBuilder builder = new DocumentBuilder(inputDocx);
// Insert HTML
builder.insertHtml(htmlData);
inputDocx.save(myDir+docFileName, SaveFormat.DOCX);

@arulsundarama You should note that Aspose.Words is designed to work with MS Word documents. And MS Word documents and HTML documents object models are quite different. This makes it not always possible to provide 100% fidelity after conversion one format to another. In most cases Aspose.Words mimics MS Word behavior when work with HTML documents.
Also, as you may know, MS Word document are flow by their nature and there is no “page” concept in them. The consumer application, like MS Word or OpenOffice, reflows the document content into pages on the fly. So there is no way to export document to HTML with page separation. Though, Aspose.Words provides Document.ExtractPages method, which allows to extract content from the document page by page.

If the output HTML is for viewing purposes, i.e. it is not supposed to be edited or processed, you can consider using HtmlFixed format. In this case the output should look exactly the same as it looks in MS Word. HtmlFixed format is designed to preserve original document layout for viewing purposes only. So if your goal is to display the HTML on page, then this format can be considered as an alternative. But unfortunately, it does not support roundtrip to DOCX at all.

Thank you so much for the prompt reply .

1 Like