Line Height issue within EMF Image for Japanese characters when using NotoSans font

Hello team,

We are using the Japanese free font named “NotoSans JP - Regular.ttf” (Open Font License) which was downloaded from this link. We are converting the source HTML to EMF image using this library and it does a pretty good job. But, what we noticed from the EMF output is that there is huge gap between the lines in the Image probably causing the last line to be truncated or hidden within the EMF image. We did not notice this issue when using other Japanese fonts such as “MS Mincho” where the line spacing was very very less and all the text content was visible.

How do we fix the line gap or height issue when using the NotoSans JP font? Is their some property that needs to be set during the EMF generation?

Line Height difference between MS Mincho and NotoSans JP font:

Attachment: Sample Test Case - LineSpacing issue with NotoSans Font.zip (92.1 KB)

I have attached the HTML input, EMF Image output and the Word Document generated by the following code for both the fonts.

Sample code:

public void getRenderedDocument(String inpFile) {
    try {
        // here the input file is first passed as "text1 - NotoSans JP.html" and then "text1 - MS Mincho.html"
        byte[] htmlBytes = Files.readAllBytes(inpFile + ".html"); 
        // Define HTML loadoptions to load the HTML bytes into Word Document
        HtmlLoadOptions options = new HtmlLoadOptions();
        options.setConvertMetafilesToPng(false);
        options.setEncoding(StandardCharsets.UTF_8);

        // Initialize Word Document with HTML bytes.
        Document doc = new Document(htmlBytes, options);
        
        // Get DocumentBuilder instance to update document properties
        // such as Size, Alignment, Format, etc.
        DocumentBuilder builder = new DocumentBuilder(doc);
        PageSetup pageSetup = builder.getPageSetup();
        Section section = doc.getFirstSection();
        Body body = section.getBody();

        // update the Page Properties such as Margin and Size
        updatePageProperties(pageSetup, contentLayout);

        TableCollection tables = body.getTables();
        if (tables.getCount() == 1) {
            Table table = tables.get(0);
            updateTableProperties(table);
        }

        // reset document last paragraph formatting properties
        resetLastParagraphProperties(body);

        //Save docx as EMF image
        ImageSaveOptions emfOptions = new ImageSaveOptions(SaveFormat.EMF);
        emfOptions.setPageSet(new PageSet(0));        
        doc.save(inpFile + ".emf", emfOptions);
    } catch (Exception ex) {
        throw new IllegalStateException(ex.getMessage(), ex);
    }
}

private void updatePageProperties(PageSetup pageSetup) {
    double imgHeight = 78.0;
    double imgWidth = 495.0;
    double margin = 0;

    // reset page margin
    pageSetup.setLeftMargin(margin);
    pageSetup.setRightMargin(margin);
    pageSetup.setTopMargin(margin);
    pageSetup.setBottomMargin(margin);

    // Set header and footer distance to default 0. Required to ensure no
    // extra spacing is coming from header or footer.
    pageSetup.setFooterDistance(0);
    pageSetup.setHeaderDistance(0);

    // Default paper type is LETTER so change to CUSTOM when setting new
    // size.
    pageSetup.setPaperSize(PaperSize.CUSTOM);
    pageSetup.setPageWidth(imgWidth);
    pageSetup.setPageHeight(imgHeight);
}

private void updateTableProperties(Table table) throws Exception{
    // Reset left/right indent since table might be shifted left
    table.setLeftIndent(0);
    
    // Set Table {@code AutoFitBehavior} value
    table.autoFit(AutoFitBehavior.FIXED_COLUMN_WIDTHS);
    
    double rowHeight = 78.0;
    double cellWidth = 495.0;

    for (Row row : table.getRows()) {
    	RowFormat rowFmt = row.getRowFormat();    		
    	rowFmt.setAllowBreakAcrossPages(false);		
    	rowFmt.setHeight(rowHeight);
    	rowFmt.setHeightRule(HeightRule.EXACTLY);   
    		
    	for(Cell cell : row.getCells()) {
    		CellFormat cellFmt = cell.getCellFormat();
    		cellFmt.setWidth(cellWidth);
    		cellFmt.setTopPadding(0);
    		cellFmt.setBottomPadding(0);
    	}
    }
    
    Node node = table.getLastRow().getLastChild();
    if (node != null && node.getNodeType() == NodeType.PARAGRAPH &&
        "\uFEFF \r".equals(node.getText())) {
        node.remove();
    }
}

private void resetLastParagraphProperties(Body body) {
    Paragraph lastPara = body.getLastParagraph();
    String paratext = lastPara.getText();
    if (StringUtils.isNullOrBlank(paratext)) {
        ParagraphFormat paraFmt = lastPara.getParagraphFormat();
        paraFmt.setPageBreakBefore(true);
    }
}

@oraspose Different fonts have different font metrics, so distance between lines and between glyphs might vary depending on the font used. This is a normal behavior. If render MS Word documents using MS Word with the specified fonts the result is the same as Aspose.Words’ result:
text1 - MS Mincho.pdf (25.4 KB)
text1 - NotoSans JP.pdf (37.2 KB)

So I do not see any problem in Aspose.Words here.

@alexey.noskov Thank you for the confirmation.

1 Like

@alexey.noskov The HTML input contains line-height as “1.2” for all the 3 lines, so how does ASPOSE Words interpret it as during the EMF Image generation? Because, when I save the generated WORD document before the EMF image conversion starts, I do notice that each line has the LineSpacing Rule as “At Least” and with a value of 16.8pt.

Is it advisable to change the LineSpacing property of the Document? Can we change it at Table level?

@oraspose line-height attribute is interpreted as paragraph line spacing. for example see the following simple HTML:

<html>
<body>
    <p style="line-height:100%">line-height:100%</p>
    <p style="line-height:110%">line-height:110%</p>
    <p style="line-height:120%">line-height:120%</p>
    <p style="line-height:130%">line-height:130%</p>
</body>
</html>

And the output produced by the following code:

Document doc = new Document("C:\\Temp\\in.html");
doc.save("C:\\Temp\\out.docx");

out.docx (7.5 KB)

You can notice line spacing is increasing in the output DOCX document.

No, there is no way to change it on table level, you should change the property on the paragraph level.

Please note, Aspose.Words is designed to work with MS Word documents. HTML documents and MS Word documents object models are quite different and it is not always possible to provide 100% fidelity after conversion one format to another. In most cases Aspose.Words mimics MS Word behavior when work with HTML documents.

We agree that for different line-height the spacing has an effect but what we are trying to understand is how ASPOSE Words converts an HTML css “line-height” property with a value of 1.2 during their Word document generation before it is converted to an EMF Image.

I have attached 2 HTML files: 1 is the source HTML input which our product generates and the other HTML is the one generated by ASPOSE Words when we save the Word Document as HTML format.

If you look into the source HTML, you should notice that the lines contain a line-height of 1.2 while within the ASPOSE generated HTML the same lines contain different line-height. For Example, the 1st line has a line-height of 18pt while the remaining lines have line-height of 16.8pt and the LineSpacing Rule for these lines in the ASPOSE generated WORD Document is set to “At Least”.

Couple of queries for our understanding:

  1. How does ASPOSE Words library arrive at these line-height of 16.8pt or 18pt based on the original line-height which is 1.2 for all the lines?
  2. Why does the ASPOSE Words library set LineSpacing Rule as “At-Least” always? Is this something like a default value for this property?

HTML files for reference: Source HTML and ASPOSE generated HTML.zip (1.5 KB)

@oraspose

  1. line-height value of 1.2 is interpreted as 120% and is calculated based on the paragraph font size. For example see the following simple HTML:
<html>
<body>
    <p style="line-height: 1; font-size: 18pt">line-height: 1; font-size: 18pt</p>
    <p style="line-height: 1.2; font-size: 18pt">line-height: 1.2; font-size: 18pt</p>
    <p style="line-height: 1.4; font-size: 18pt">line-height: 1.4; font-size: 18pt</p>
    <p style="line-height: 1.6; font-size: 18pt">line-height: 1.6; font-size: 18pt</p>
    <p />
    <p style="line-height: 1; font-size: 12pt">line-height: 1; font-size: 12pt</p>
    <p style="line-height: 1.2; font-size: 12pt">line-height: 1.2; font-size: 12pt</p>
    <p style="line-height: 1.4; font-size: 12pt">line-height: 1.4; font-size: 12pt</p>
    <p style="line-height: 1.6; font-size: 12pt">line-height: 1.6; font-size: 12pt</p>
</body>
</html>

If print paragraph font size and line spacing you will see that line spacing is calculated based on the paragraph’s font size:

Document doc = new Document("C:\\Temp\\in.html");

for (Paragraph p : doc.getFirstSection().getBody().getParagraphs())
{
    System.out.println("Size: " + p.getParagraphBreakFont().getSize() + "pt; " +
            "Line spacing: " + p.getParagraphFormat().getLineSpacing() + "pt; " +
            "Ratio: " + p.getParagraphFormat().getLineSpacing() / p.getParagraphBreakFont().getSize());
    System.out.println("------------------");
}

the output is the following:

Size: 18.0pt; Line spacing: 18.0pt; Ratio: 1.0
------------------
Size: 18.0pt; Line spacing: 21.6pt; Ratio: 1.2000000000000002
------------------
Size: 18.0pt; Line spacing: 25.2pt; Ratio: 1.4
------------------
Size: 18.0pt; Line spacing: 28.8pt; Ratio: 1.6
------------------
Size: 12.0pt; Line spacing: 12.0pt; Ratio: 1.0
------------------
Size: 12.0pt; Line spacing: 14.4pt; Ratio: 1.2
------------------
Size: 12.0pt; Line spacing: 16.8pt; Ratio: 1.4000000000000001
------------------
Size: 12.0pt; Line spacing: 19.2pt; Ratio: 1.5999999999999999
------------------

If you save your input HTML to DOCX, you will notice that paragraph break font size is 14pt, so 120% is 16.8pt, the value used as line spacing for the paragraphs. Upon saving to HTML however, since there is no At Least rule in HTML, Aspose.Words uses the following formula to calculate line height:

double lineSpacing = System.Math.Max(pf.LineSpacing, maxFontSize);

So in case of the first paragraph, since font size is greater than 16.8pt, font size is used as line height, i.e. 18pt.

  1. Yes, At Least line spacing rule is a default value. Upon reading line-height attribute Aspose.Words uses either At Least rule or Multiple rule if value of line-height attribute is specified in percenage:
<html>
<body>
    <p style="line-height: 100%; font-size: 18pt">line-height: 100%; font-size: 18pt</p>
    <p style="line-height: 120%; font-size: 18pt">line-height: 120%; font-size: 18pt</p>
    <p style="line-height: 140%; font-size: 18pt">line-height: 140%; font-size: 18pt</p>
    <p style="line-height: 160%; font-size: 18pt">line-height: 160%; font-size: 18pt</p>
</body>
</html>

@alexey.noskov Really appreciate your support and help in understanding this problem. Thank you for this information.

1 Like

So to confirm; ASPOSE is working as expected when it comes to line-height/line-spacing and is determined by the font and font size; correct?

@oraspose Yes, that is right.

Just a follow-up question using the same Noto-Sans Japanese font.

Does ASPOSE play a role in font fidelity during EMF generation using WORDS library? Based on the screenshot attached at the top, we could notice that EMF image generated using MS Mincho has its characters shown clearly when compared to NOTO SANS font. The fidelity of MS Mincho of Text is sharper and cleaner when compared to NOTO Sans font.

@oraspose No, as far as I know, Aspose.Words does not play a role in font fidelity during EMF generation.