Greek characters change to question marks when converting docx template to pdf with json data source(java)

In a maven java spring project at windows 10 Enterprise, I’m trying to generate a pdf using a word template and a json data source.
Greek characters convert to boxes with question marks.
Seeing a similar issue here I used Arial Unicode MS.
I tried to install it in my system and use it programmatically as well.
The boxes just change to rhombuses.
Other fonts do nothing.

Here is my code:

FontSourceBase[] originalFontSources = FontSettings.getDefaultInstance().getFontsSources();
// Create a font source from a folder that contains fonts.
FolderFontSource folderFontSource = new FolderFontSource("C:\\AsposeFonts", false);
// Apply a new array of font sources that contains the original font sources, as well as our custom font.
FontSourceBase[] updatedFontSources = {originalFontSources[0], folderFontSource};
FontSettings.getDefaultInstance().setFontsSources(updatedFontSources);
JsonDataSource dataSource = new JsonDataSource(new ByteArrayInputStream("{\"greekWords\":\"Ελληνικές λέξεις.\"}".getBytes()));
Document doc = new Document("C:\\TemplateTest\\TestTemplate.docx");
ReportingEngine engine = new ReportingEngine();
engine.buildReport(doc, dataSource);
doc.save("genFile.pdf");

Here are alternative solutions I’ve tried(skipping common code):

doc.setFontSettings(new FontSettings());
SystemFontSource systemFontSource = (SystemFontSource) doc.getFontSettings().getFontsSources()[0];
FolderFontSource folderFontSource = new FolderFontSource("C:\\AsposeFonts", false);
doc.getFontSettings().setFontsSources(new FontSourceBase[]{systemFontSource, folderFontSource});
doc.setFontSettings(new FontSettings());
doc.getFontSettings().getSubstitutionSettings().getFontInfoSubstitution().setEnabled(true);
doc.getFontSettings().getSubstitutionSettings().getTableSubstitution().addSubstitutes("arial-unicode-ms");
LoadOptions loadOptions = new LoadOptions();
loadOptions.setEncoding(java.nio.charset.Charset.forName("UTF-8"));//also tried UTF8 following [this](https://forum.aspose.com/t/cyrillic-and-greek-alphabets-garbage-characters/127354) post
Document  doc = new Document("C:\\TemplateTest\\TestTemplate.docx", loadOptions);
FontSettings.getDefaultInstance().getFontsSources()[1].getAvailableFonts().get(0).getFullFontName()
DocumentBuilder builder = new DocumentBuilder(doc);
builder.getFont().setName(FontSettings.getDefaultInstance().getFontsSources()[1].getAvailableFonts().get(0).getFullFontName());

Fonts I’ve tried:

NotoSans-ExtraLight.ttf
NotoSans-ExtraLightItalic.ttf
NotoSans-Italic.ttf
NotoSans-Light.ttf
NotoSans-LightItalic.ttf
NotoSans-Medium.ttf
NotoSans-MediumItalic.ttf
NotoSans-Regular.ttf
NotoSans-SemiBold.ttf
NotoSans-SemiBoldItalic.ttf
NotoSans-Thin.ttf
NotoSans-ThinItalic.ttf
times new roman bold italic.ttf
times new roman bold.ttf
times new roman italic.ttf
times new roman.ttf
arial-unicode-ms.ttf
FreeSerif.ttf
FreeSerifBold.ttf
FreeSerifBoldItalic.ttf
FreeSerifItalic.ttf
NotoSans-Black.ttf
NotoSans-BlackItalic.ttf
NotoSans-Bold.ttf
NotoSans-BoldItalic.ttf
NotoSans-ExtraBold.ttf
NotoSans-ExtraBoldItalic.ttf

This zip contains the pdf I generated with Arial Unicode MS and the word template:
TemplateTest.zip (109.8 KB)

@NickZaf Unfortunately, I cannot reproduce the problem on my side. Even if I put only once Arial Unicode MS font into the font’s folder, Greek works are rendered properly.

FontSettings.getDefaultInstance().setFontsSources(new FontSourceBase[] {new FolderFontSource("C:\\Temp\\fonts", true)});
    
JsonDataSource dataSource = new JsonDataSource(new ByteArrayInputStream("{\"greekWords\":\"Ελληνικές λέξεις.\"}".getBytes()));
Document doc = new Document("C:\\Temp\\in.docx");
ReportingEngine engine = new ReportingEngine();
engine.buildReport(doc, dataSource);
doc.save("C:\\Temp\\out.pdf");

Here is output PDF document produced on my side: out.pdf (25.6 KB)

I have Arial Unicode MS 1.01 version.

@alexey.noskov can you give me that exact font?
What version of aspose words are you using?
I’m using the latest(23.8), so it should not matter.

@NickZaf Please find the attached font. You should remove .zip extension before unzipping the file.
ARIALUNI.zip.001.zip (5 MB)
ARIALUNI.zip.002.zip (5 MB)
ARIALUNI.zip.003.zip (4.1 MB)

I just created a new java maven project and everything works fine there.
Any idea as to what could be going wrong with my project?

@NickZaf Unfortunately, it is difficult to guess what is wrong with your project. Have you checked the input JSON string? Maybe is is damaged for some reason and Greek characters are replaced with garbage characters.

@alexey.noskov They seemed fine, but json.getBytes(StandardCharsets.UTF_8) fixed the problem! Thanks!

1 Like