Header footer image issue

The image in the headers is removed when I extract HTML from the document and create a document from that HTML. Additionally, there is an issue with some footer text.

My code below.

 public static void main(String[] args) throws Exception {
        docToHtml(new Document(new FileInputStream("/home/hari/Downloads/msa-amended-and-restated-w-settlement-changes-073116-36.docx")), "/home/hari/redline-sku/original-html/3d449eed-44a4-416d-9626-2b3bf11e3c47_msa-amended-and-restated-w-settlement-changes-073116-36.html");
        String html = FileUtil.readStringFromFile("/home/hari/3d449eed-44a4-416d-9626-2b3bf11e3c47_msa-amended-and-restated-w-settlement-changes-073116-36.html");
        com.aspose.words.License wordLicense = new com.aspose.words.License();
        wordLicense.setLicense(new FileInputStream("/home/aspose-licence"));
        htmlToDoc(new Document(new ByteArrayInputStream(html.getBytes())), "/home/original-html/test.docx");
    }

    public static void docToHtml(Document document, String outputDirectory) {
        try {
            HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.HTML);
            opts.setExportPageSetup(true);
            opts.setExportDocumentProperties(true);
            opts.setExportListLabels(ExportListLabels.BY_HTML_TAGS);
            opts.setExportImagesAsBase64(true);
            opts.setExportFontsAsBase64(true);
            opts.setCssStyleSheetType(CssStyleSheetType.EMBEDDED);
            opts.setExportTocPageNumbers(true);
            ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
            document.save(byteArrayOutputStream, opts);
            String html = byteArrayOutputStream.toString(StandardCharsets.UTF_8);
            html = html.replaceAll("[\uFEFF-\uFFFF]", "");
            FileUtil.writeToFile(outputDirectory, html.getBytes());
        } catch (Exception ex) {
            ex.printStackTrace();
            System.out.println("Exception occurred while converting document to html");
        }
    }
    public static void htmlToDoc(Document document, String outputDirectory) {
        try {
            document.save(outputDirectory, SaveFormat.DOCX);
        } catch (Exception ex) {
            ex.printStackTrace();
            System.out.println("Exception occurred while converting document to html");
        }
    }

msa-amended-and-restated-w-settlement-changes-073116-36.docx (49.0 KB)

@hariomgupta73 This is an expected behavior. It is hard to meaningfully output headers and footers to HTML because HTML is not paginated. By default Aspose.Words exports only primary headers/footers of the document per section when saving to HTML. In your case, however, there is first page header, so it is not exported to HTML. You can try changing ExportHeadersFootersMode to ExportHeadersFootersMode.FIRST_PAGE_HEADER_FOOTER_PER_SECTION to preserve the first section the first page header:

Document doc = new Document("C:\\Temp\\in.docx");
HtmlSaveOptions opt = new HtmlSaveOptions();
opt.setExportHeadersFootersMode(ExportHeadersFootersMode.FIRST_PAGE_HEADER_FOOTER_PER_SECTION);
doc.save("C:\\Temp\\out.html", opt);

You should note, however, that HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one format to another.

Thank you very much @alexey.noskov

1 Like