DOCX>HTML>DOCX conversion issue with bullet rendering using Java

Gptrnt · June 23, 2020, 8:23am

Hi,

I am importing word document(input doc) and from that I am extracting the nodes between the hidden characters.And then converting this extracted node to an html string and storing in the database.Using this content i will create another document as same as input doc.

In input words if there any bullet or numbered paragraph is there then, it’s not properly aligned in the download document.
Attaching the sample doc to reproduce the same issue.BulletRulerIssue.zip (603.2 KB)

Thank you

tahir.manzoor · June 23, 2020, 7:09pm

@Gptrnt

You are using ExportListLabels.AS_INLINE_TEXT in your code as shown below. We suggest you please use ExportListLabels.BY_HTML_TAGS to get the correct bullet list.

HtmlSaveOptions options = new HtmlSaveOptions();
options.setSaveFormat(SaveFormat.HTML);
options.setExportImagesAsBase64(true);
options.setExportListLabels(ExportListLabels.AS_INLINE_TEXT);

Gptrnt · June 24, 2020, 5:37am

Hi,
I have tried with your solutio and issue is solved. But it creating another issue with bullet.eg:

Test Item 1
1.1. Test sub Item 1
1.2. Test sub Item 2
1.3. Test sub Item 3
1.3.1. Test inner sub Item 1
1.3.2. Test inner sub Item 2

This bullet is coming

Test Item 1
1. Test sub Item 1
2. Test sub Item 2
3. Test sub Item 3
  1. Test inner sub Item 1
  2. Test inner sub Item 2

Is there any way both issue can solve?

tahir.manzoor · June 24, 2020, 3:28pm

@Gptrnt

We have remove the following line of code from your application and tested the scenario using the latest version of Aspose.Words for Java 20.6. The output generated by your application is expected output. Please check the attached DOCX. 20.6.zip (13.8 KB)

options.setExportListLabels(ExportListLabels.AS_INLINE_TEXT);

Gptrnt · June 25, 2020, 12:38pm

Hi,

I upgrade the version to 20.6. But after upgrade the extracted bullet from the input document is not coming as bullet (its coming as text) in the output document. Please check with the same source. If there any setting I have to missed let me know.

tahir.manzoor · June 25, 2020, 5:24pm

@Gptrnt

Could you please check the attached Word document (20.6.zip ) in my previous post? We have not found any issue with it.

Gptrnt · July 14, 2020, 11:28am

Hi,
I checked your attached word document. There it looking fine. But I upgrade the version of same sample code to 20.6 and hit the same input file for testing. But my output document bullet is not coming as bullet. It’s coming as normal text. Attaching the whole code, input and output below.
BulletRulerIssue (2).zip (603.4 KB)

tahir.manzoor · July 14, 2020, 6:03pm

@Gptrnt

We have used ExportListLabels.AUTO in your code and generated output is expected. Please read the detail of ExportListLabels.

In your code, you are inserting the HTML into document. So, you need to use HtmlSaveOptions.ExportListLabels property according to your requirement.

Gptrnt · July 15, 2020, 6:17am

Hi,

I tried your suggestion, now bullet is coming as bullet. But below bullet format is not coming the same bullet in output.

Test 1
1.2. Test 1.2
sample code BulletRulerIssue.zip (604.5 KB)

tahir.manzoor · July 15, 2020, 11:06am

@Gptrnt

We have gone through your complete project and noticed that you are extracting content from the document and inserting it back as HTML.

In your code, you are saving document to HTML using Node.ToString. The list labels are exported incorrectly by Node.ToString. We have logged this issue as WORDSNET-20795 in our issue tracking system. You will be notified via this forum thread once this issue is resolved. We apologize for your inconvenience.

tahir.manzoor · July 15, 2020, 12:26pm

@Gptrnt

Further to my previous post, you can use following modified code to get the desired output. These methods save the document to HTML using Document.Save method. Hope this helps you.

private static String getHtmlContentFromBookMark(String field, BookmarkCollection bookmarkCollection, Document document, SaveOptions saveOptions, int slNo) {
    try {
        Bookmark bookmark = bookmarkCollection.get(field + slNo);
        if (bookmark == null) return "";
        Node startNode = bookmark.getBookmarkStart();
        Node endNode = bookmark.getBookmarkEnd();
        ArrayList<Node> extractedNodes = extractContent(startNode, endNode, false);
        Document dstHTML = generateDocument(document, extractedNodes);
        ByteArrayOutputStream docStream = new ByteArrayOutputStream();
        dstHTML.save(docStream, saveOptions);

        return docStream.toString();
        //return takeOnlyBodyContent(dstHTML.toString(saveOptions));
    }catch (Exception e){
        System.out.println("error while fetching bookmark of preamble "+ field + slNo );
    }
    return "";
}

private static HtmlSaveOptions htmlSaveOption() throws Exception {
    HtmlSaveOptions options = new HtmlSaveOptions();
    options.setSaveFormat(SaveFormat.HTML);
    options.setExportImagesAsBase64(true);
    return options;
}

Gptrnt · July 20, 2020, 10:55am

Hi,

Thank you so much above code is working.

tahir.manzoor · July 20, 2020, 4:00pm

@Gptrnt

Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

Gptrnt · July 28, 2020, 3:14pm

Hi,

I changed my code with your solution, and it fixed the above all issue. But facing another serious issue. Now my bullet number is continuing while downloading. My uploaded input document continue number _input.zip (16.8 KB) and getting the output as output.zip (12.9 KB). Thus is my changed source code wrdHtmlWithReplacePoc (2).zip (38.1 KB).

kindly check this issue.

Thank you

tahir.manzoor · July 28, 2020, 8:18pm

@Gptrnt

We are checking this use case and will get back to you soon.

tahir.manzoor · July 31, 2020, 2:27pm

@Gptrnt

Please use ImportFormatOptions.KeepSourceNumbering property as shown below to get the desired output.

private static  void addAgendaItemContent(DocumentBuilder builder, String htmlString, String field) throws Exception {
    try {
        if (!htmlString.equals("")) {
            ByteArrayInputStream bais = new ByteArrayInputStream(htmlString.getBytes());
            LoadOptions opts = new LoadOptions();
            opts.setLoadFormat(LoadFormat.HTML);
            Document tempDoc = new Document(bais, opts);

            ImportFormatOptions importFormatOptions = new ImportFormatOptions();
            importFormatOptions.setKeepSourceNumbering(true);

            builder.insertDocument(tempDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING, importFormatOptions );
        }
    }catch (Exception e){
        builder.insertHtml(htmlString + "</p>", Constant.USE_BUILDER_FORMATTING);
        System.out.println("Error while creating document for inserting " + field);
        e.printStackTrace();
    }
}

aspose.notifier · February 18, 2023, 9:45am

The issues you have found earlier (filed as WORDSNET-20795) have been fixed in this Aspose.Words for Java 23.2 update.