Remove Extra Line Break while Inserting HTML in Word Document after a Hidden Character using Java Code

Hi,
While creating a document, I am adding a hidden word (using builder.write()) , then I am inserting a html string (using builder.insertHtml()) and after that again i am adding an another hidden word with builder.write(). I want the html to start just after the hidden word (without any line break ) and the ending hidden character also have to bind with the html end (No need of any line space after the html content). This making lot of unwanted line break.

I am attaching My sample code input and output below.
sample_doc.zip (596.4 KB)
It’s a very critical issue for me. So please help me to find out a solution ASAP.

Thank You

@Gptrnt,

This seems to be an expected behavior. But, you can workaround this problem by using the following code:

public static Document generateDocument(Item item) throws Exception {
    // Create a blank document.
    Document dstDoc = new Document();
//        dstDoc.protect(ProtectionType.READ_ONLY);
    // Creating builder for the document
    DocumentBuilder builder = new DocumentBuilder(dstDoc);
    double tokenSize = builder.getFont().getSize();
    Color tokenColor = builder.getParagraphFormat().getStyle().getFont().getColor();
    String tokenFontName = builder.getFont().getName();
    int tokenParagraphAlignment = builder.getParagraphFormat().getAlignment();
    boolean isTokenInItalicFont = builder.getFont().getItalic();
    Style tokenStyle = builder.getParagraphFormat().getStyle();
    item.getItemDetails().forEach(details -> {
        try {
            builder.getFont().setBold(true);
            builder.getParagraphFormat().setAlignment(ParagraphAlignment.LEFT);
            builder.getParagraphFormat().setKeepTogether(true);
            String title = null;
            if (details.getTitle() != null) {
                String serialNum = details.getSlNo() + ": ";
                title = serialNum + details.getTitle();
            }
            builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_1);
            builder.getParagraphFormat().getStyle().getFont().setColor(tokenColor);
            builder.getFont().setSize(tokenSize);
            builder.getParagraphFormat().getStyle().getFont().setUnderline(Underline.NONE);
            builder.getFont().setName(tokenFontName);
            builder.getParagraphFormat().setAlignment(tokenParagraphAlignment);
            builder.getFont().setItalic(isTokenInItalicFont);
            builder.getParagraphFormat().setSpaceAfterAuto(true);
            addHiddenWord(builder, details.getSlNo(), false, Constant.HIDDEN_TITLE_KEY);
            builder.write(title);
            addHiddenWord(builder, details.getSlNo(), true, Constant.HIDDEN_TITLE_KEY);

            builder.getParagraphFormat().setStyle(tokenStyle);
            builder.getFont().setBold(false);
            builder.getFont().setUnderline(Underline.NONE);
            builder.getParagraphFormat().setSpaceAfterAuto(false);

            if (details.getNote() != null) {
                addHiddenWord(builder, details.getSlNo(), false, Constant.HIDDEN_NOTE_KEY);
                // builder.insertHtml(details.getNote().trim() + "</p>", Constant.USE_BUILDER_FORMATTING);

                String htmlString = details.getNote().trim();
                if (htmlString != "") {
                    ByteArrayInputStream bais = new ByteArrayInputStream(htmlString.getBytes());
                    Document tempDoc = new Document(bais);
                    builder.insertDocument(tempDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
                }

                addHiddenWord(builder, details.getSlNo(), true, Constant.HIDDEN_NOTE_KEY);
            }
            addHiddenWord(builder, details.getSlNo(), false, Constant.HIDDEN_DESCRIPTION_KEY);
            builder.insertHtml((details.getDescription() != null ? details.getDescription().trim() : Constant.ONE_SPACE) + "</p>", Constant.USE_BUILDER_FORMATTING);
            addHiddenWord(builder, details.getSlNo(), true, Constant.HIDDEN_DESCRIPTION_KEY);

            builder.getFont().setUnderline(Underline.NONE);
            builder.getParagraphFormat().setSpaceAfterAuto(false);
        } catch (Exception e) {
            System.out.println("Error while insert html to the doc");
        }
    });
    return dstDoc;
}

Hi awais,

I tried with your solution. I am also using ckeditor html content for download in default(If the user not import the word). That time this solution getting issue. I am attaching the issued output along with sample code to reproduce(ckeditor content hard coded inside the code.). and expected output.

SampleIssue.zip (591.1 KB)

For removing thus issue, I used a solution(commented in the above uploaded sample code)

    if (!htmlString.startsWith("<html>")){
                Document tempDoc = new Document();
                DocumentBuilder tempBuilder = new DocumentBuilder(tempDoc);
                tempBuilder.insertHtml(htmlString);
                htmlString = tempDoc.toString(saveOption);
     }

But with this code content is printing but at the end getting one extra enter(line break). Please help me with an optimized solution.

Thank you

@Gptrnt,

Please try specifying the HTML Load Options and then load the HTML document/string. Here is sample code:

...
...
if (htmlString != "") {
    ByteArrayInputStream bais = new ByteArrayInputStream(htmlString.getBytes());
    LoadOptions opts = new LoadOptions();
    opts.setLoadFormat(LoadFormat.HTML);
    Document tempDoc = new Document(bais, opts);
    builder.insertDocument(tempDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
}
...
...

Hope, this helps.

Hi awais,

Thank you so much for your help its working good.

Hi,
I am formatting a template document with some data.In the template for Heading_1 and normal style identifier, i defined all the styles need for resp. I am replacing a special word with content.I put the style identifier Heading_1 for the heading and normal for the content. But style defined in the style identifier is not applied in the heading and content in the downloaded document. Please help me to find this solution for this issue

I am attaching sample code, output and expected output.styleIdentifierIssue.zip (580.0 KB)

Thank You

@Gptrnt,

We are checking this scenario and will get back to you soon.

Hi awais,
Any updates on this issue?

@Gptrnt,

We have logged the following issue in our issue tracking system:

  • WORDSNET-20644: Unable to set Heading 1 or Normal Styles to Content of a Document

We will further look into the details of this problem and will keep you updated on the status of the linked issue. We apologize for your inconvenience.

A post was split to a new topic: Convert Table in Word Document to PDF using Java

A post was split to a new topic: Extract content from Word document and convert it to HTML

Any updates on this issue?

@Gptrnt,

As can be seen in following screenshot, the formatting is applied via Style and by using “direct attributes”:

So, the reason is that you set both the Style and Paragraph properties, therefore some of the style properties are overridden. To avoid this, you should simply either use only style or only paragraph properties. We suggest to use only Style, remove all this ‘builder.getFont().setSize(tokenSize);’ and leave only ‘builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_1);’. It should work as expected.

Also some code looks a bit strange, and you may want to remove it. For example:

Color tokenColor = builder.getParagraphFormat().getStyle().getFont().getColor();
skipped
builder.getParagraphFormat().getStyle().getFont().setColor(tokenColor);

We are planning to close linked issue (WORDSNET-20644) with “Not a Bug” status.

P.S. Please note that Formatting is applied on a few different levels. For example, let’s consider formatting of simple text. Text in documents is represented by Run element and a Run can only be a child of a Paragraph. You can apply formatting 1) to Run nodes by using Styles e.g. a Glyph Style, 2) to the parent of those Run nodes i.e. a Paragraph node (possibly via paragraph Styles) and 3) you can also apply ‘direct formatting’ to Run nodes by using Run attributes (Font). In this case the Run will inherit formatting of Paragraph Style, a Glyph Style and then direct formatting.