Set Encoding to Preserve Japanese Unicode Characters in Word DOCX Document when Converting DOCX to PDF Java Linux

Hi Aspose team,

We have a serious issue on the word conversion. It is a Japanese word document, and we are replacing some word and convert to pdf.

Here is our java code:

public static byte[] replaceStringAndConvertToPDF(byte[] fileContent, Map<String, String> replaceMap, List blackList) throws Exception{

String fileType = MimeUtil.detectFileType(fileContent);
if(fileType.equalsIgnoreCase("doc") || fileType.equalsIgnoreCase("docx")){
  ByteArrayInputStream inStream = new ByteArrayInputStream(fileContent);
  com.aspose.words.Document doc = new com.aspose.words.Document(inStream);
  for(String search : replaceMap.keySet()){
    if(!blackList.contains(search)){
      String replace = replaceMap.get(search);
      if(!StringUtils.isBlank(replace))
        doc.getRange().replace(search, replace, new FindReplaceOptions(FindReplaceDirection.FORWARD));
    }
  }
  ByteArrayOutputStream out = new ByteArrayOutputStream();
  doc.save(out, SaveFormat.PDF);
  return out.toByteArray();
}
return null;

}

It works pretty well on local desktop, however the Japanese character display get messed up when we move to the linux server.

We compare the byte length, the output byte length become much smaller.

Could you kindly look into this and advise? Is there any setting / configuration we can do to allow the Japanese Character Unicode?

Thanks
Jing

@sai_potluri,

To ensure a timely and accurate response, please ZIP and attach the following resources here for testing:

  • Your simplified input Word document you are getting this problem with
  • Font files used in Word document
  • Instead of PDF, save the final output to DOCX format on Linux and share DOCX with us
  • Aspose.Words 20.2 generated PDF file showing the undesired behavior (Linux version)
  • Aspose.Words 20.2 generated PDF file showing the correct output (Desktop version)
  • Please also create a standalone simple Java application (source code without compilation errors) that helps us to reproduce your current problem on our end and attach it here for testing. Please do not include Aspose.Words JAR files in it to reduce the file size.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

Aspose input output.zip (118.1 KB)

Please find the attachment as you required.
The font library is Yu Gothic, which is in the default font list of Word. Idon;t know how to get the font file, but you can find the info through here.

The Aspose Word version we use is 17.7

Thanks

@sai_potluri,

We are working on your query and will get back to you soon.

@sai_potluri,

Your input Word document uses the following fonts and for the correct rendering of this document to PDF, please install the required fonts at your Linux machine. You can simply copy these fonts from your Windows machine to Linux Server.

  • Yu Gothic
  • 等线
  • Symbol
  • Calibri

Please check the following sections of documentation: