Convert Word Document to HTML Stream and back to HTML String using Java | Preserve Multi-layer Numbering

Gptrnt · May 21, 2020, 12:25pm

Hi,
Form an uploaded document I am extracting content between hidden character.Then I converting this extracted node to an html string and storing in the database. I have to create another document same as input doc( all the content I will take from the uploaded doc) . While creating this another document some numbering is not coming like,

Testing
1.1. tests
1.2. sample
1.3. Items

In the newly created document with the xtracted html content showing

Testing
1 tests
2 sample
3 Items
Like this. Not the exact numbering i put. I am attacheing my source code, input_doc,output_doc and expected_output.
sample_doc.zip (596.4 KB)

Please check and give me a solution for it.

awais.hafeez · May 22, 2020, 7:53am

@Gptrnt,

We are working on your query and will get back to you soon.

Gptrnt · June 8, 2020, 6:00am

Hi,
Any update on this issue?

awais.hafeez · June 8, 2020, 9:52am

@Gptrnt,

You can use any of the following two ways to fix this problem:

Workaround 1:

Inside your ‘getHtmlContentFromBookMark’ method, instead of getting HTML String by using Document.toString() method, please save the document to HTML stream by using Document.save method. Please check the last lines of this function:

private static String getHtmlContentFromBookMark(String field, BookmarkCollection bookmarkCollection, Document document, SaveOptions saveOptions, int slNo) {
    try {
        ...
        ...
        Document dstHTML = generateDocument(document, extractedNodes);

        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        dstHTML.save(baos, saveOptions);
        return baos.toString("UTF-8");

        //return dstHTML.toString(saveOptions);
    }catch (Exception e){
        System.out.println("error while fetching bookmark of preamble "+ field + slNo );
    }
    return "";
}

Workaround 2:

If it is required that you must use the Document.toString() method, then please only replace your ‘htmlSaveOption’ method with the following:

private static HtmlSaveOptions htmlSaveOption() throws Exception {
    HtmlSaveOptions options = new HtmlSaveOptions();
    options.setSaveFormat(SaveFormat.HTML);
    options.setExportImagesAsBase64(true);
    options.setExportListLabels(ExportListLabels.AS_INLINE_TEXT);
    return options;
}

Hope, this helps.

Gptrnt · June 8, 2020, 6:06pm

Hi
Thank You so much. Its working fine.

awais.hafeez · July 4, 2020, 5:56am

A post was split to a new topic: Extract Content & Preserve Font Name & Size during Converting Word to HTML using Java | CKEditor