Hello Team,
I have 2 HTML input containing Japanese Texts with different font size and formatting but using the same font family name “MS Mincho”. I have installed this font in the linux system where the HTML is being converted to EMF image using ASPOSE Words library v24.1 and also within the Windows machine where the EMF image is being embedded into the MS Excel Document.
I do notice that the “text2” HTML input gets converted to EMF image while the “text1” HTML input is not converted correctly but shows garbled characters.
Sample Logic:
public void getRenderedDocument(String inpFile) {
try {
byte[] htmlBytes = Files.readAllBytes(inpFile + ".html"); // here the input file is passed "text1" and "text2"
// Define HTML loadoptions to load the HTML bytes into Word Document
HtmlLoadOptions options = new HtmlLoadOptions();
// To avoid converting Metafile images to PNG image.
options.setConvertMetafilesToPng(false);
// Initialize Word Document with HTML bytes.
Document doc = new Document(htmlBytes, options);
// Get DocumentBuilder instance to update document properties
// such as Size, Alignment, Format, etc.
DocumentBuilder builder = new DocumentBuilder(doc);
PageSetup pageSetup = builder.getPageSetup();
Section section = doc.getFirstSection();
Body body = section.getBody();
// update the Page Properties such as Margin and Size
updatePageProperties(pageSetup, contentLayout);
// The source HTML passed within {@code Document} will have atleast
// 1 <table> element present when it is a Text/Note/Grid object.
TableCollection tables = body.getTables();
if (tables.getCount() == 1) {
Table table = tables.get(0);
updateTableProperties(table);
}
// reset document last paragraph formatting properties
resetLastParagraphProperties(body);
//Save docx as EMF image
ImageSaveOptions emfOptions = new ImageSaveOptions(SaveFormat.EMF);
emfOptions.setPageSet(new PageSet(0));
doc.save(inpFile + ".emf", emfOptions);
} catch (Exception ex) {
throw new IllegalStateException(ex.getMessage(), ex);
}
}
private void updatePageProperties(PageSetup pageSetup) {
double imgHeight = ConvertUtil.pixelToPoint(70.0);
double imgWidth = ConvertUtil.pixelToPoint(385.0);
double margin = 0;
// reset page margin
pageSetup.setLeftMargin(margin);
pageSetup.setRightMargin(margin);
pageSetup.setTopMargin(margin);
pageSetup.setBottomMargin(margin);
// Set header and footer distance to default 0. Required to ensure no
// extra spacing is coming from header or footer.
pageSetup.setFooterDistance(0);
pageSetup.setHeaderDistance(0);
// Default paper type is LETTER so change to CUSTOM when setting new
// size.
pageSetup.setPaperSize(PaperSize.CUSTOM);
pageSetup.setPageWidth(imgWidth);
pageSetup.setPageHeight(imgHeight);
}
private void updateTableProperties(Table table) throws Exception{
// Reset left/right indent since table might be shifted left
table.setLeftIndent(0);
// Set Table {@code AutoFitBehavior} value
table.autoFit(AutoFitBehavior.FIXED_COLUMN_WIDTHS);
double rowHeight = ConvertUtil.pixelToPoint(70.0);
double cellWidth = ConvertUtil.pixelToPoint(385.0);
for (Row row : table.getRows()) {
RowFormat rowFmt = row.getRowFormat();
rowFmt.setAllowBreakAcrossPages(false);
rowFmt.setHeight(rowHeight);
rowFmt.setHeightRule(HeightRule.EXACTLY);
for(Cell cell : row.getCells()) {
CellFormat cellFmt = cell.getCellFormat();
cellFmt.setWidth(cellWidth);
cellFmt.setTopPadding(0);
cellFmt.setBottomPadding(0);
}
}
Node node = table.getLastRow().getLastChild();
if (node != null && node.getNodeType() == NodeType.PARAGRAPH &&
"\uFEFF \r".equals(node.getText())) {
node.remove();
}
}
private void resetLastParagraphProperties(Body body) {
Paragraph lastPara = body.getLastParagraph();
String paratext = lastPara.getText();
if (StringUtils.isNullOrBlank(paratext)) {
ParagraphFormat paraFmt = lastPara.getParagraphFormat();
paraFmt.setPageBreakBefore(true);
}
}
Attachments: SampleText.zip (17.7 KB)
Any idea, why is the Japanese characters not showing up correctly only for the input file “text1”? Both of them using the same fonts and they are passed to the ASPOSE Words library during the conversion. The issue will be clear when the HTML and EMF files for “text1” are compared visually.