Retain Chinese Text during Word DOC to PDF Conversion using Java over Red Hat OpenShift

kslau · April 8, 2021, 10:31am

We are using aspose-words-21.2-jdk17 (.jar).

We convert a WORD doc to PDF. However, we found in Red Hat Openshift environment. Some chinese words are missing/blank in the PDF.

WORD:
image.png (1.5 KB)

PDF:
image.png (10.4 KB)

Note, not all Chinese words are blank.

Any recommendation?

Code snippet:

@Service
class PdfConverter {

    private val log = LoggerFactory.getLogger(PdfConverter::class.java)

    private val pdfSaveOptions = PdfSaveOptions().apply {
        fontEmbeddingMode = EMBED_ALL
        compliance = PDF_A_1_B
    }

    fun convert(wordDocument: Document): Document {
        val outputStream = ByteOutputStream()
        val executionTime = measureTimeMillis {
            val document = com.aspose.words.Document(wordDocument.contentStream())
            document.save(outputStream, pdfSaveOptions)
        }

        log.info("PDF conversion completed in ${executionTime / 1000.0} seconds")
        return Document(PDF, outputStream.bytes)
    }
}

awais.hafeez · April 8, 2021, 2:06pm

@kslau,

Please try the latest (21.3) version of Aspose.Words for Java and see how it goes on your end? In case the problem still remains, then please ZIP and upload your input Word document and Aspose.Words generated PDF file showing the undesired behavior here for testing. We will then investigate the issue on our end and provide you more information.

kslau · April 13, 2021, 3:10am

Hi

I have upgraded to use latest 21.3 version on our red hat OpenShift environment.

The problem is still remains – some Chinese characters are missing.

Let me know if you have any recommendation.

Regards

Sun

LA_SLD_Index_Chinese.zip (4.53 MB)

awais.hafeez · April 13, 2021, 11:33am

@kslau,

We have logged this problem in our issue tracking system. The ID of this issue is WORDSNET-22111. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

awais.hafeez · April 20, 2021, 8:24am

@kslau,

Regarding WORDSNET-22111, we have completed the analysis of this issue and concluded to close this issue with “not a bug” status. The problem appears because the embedded “PMingLiU” font in the document you shared do not have required glyphs for the proper rendering. It seems that MS Word can’t properly embed this font. I have attached the document with embedded font generated by Aspose.Words and it is rendered well on our end.

Attachment: AWEmbededFontsFull.zip (3.8 MB)

kslau · April 20, 2021, 8:42am

Hi

Thanks a lot your reply !

Do you know how to save the WORD doc with the font properly embedded (as I already check the embedded font option) ? or use the Aspose.Word tools can do so ?

That means, alternatively, I should put the true type font in the ECS font directory in order to generate the correct PDF ?

Regards

Sun

image003.jpg (169 Bytes)

awais.hafeez · April 21, 2021, 4:15am

@kslau,

You can use following Java code to embed all fonts in Word document:

Document doc = new Document("input.docx");
doc.getFontInfos().setEmbedTrueTypeFonts(true);
doc.getFontInfos().setSaveSubsetFonts(false);
doc.getFontInfos().setEmbedSystemFonts(true);
doc.save("outFile.docx");

At the moment Aspose.Words cannot properly embed font subsets for the document you shared; so it is required to embed full fonts.

Yes, providing font files to Aspose.Words in the environment where the document will be converted to the PDF is a viable option. You can copy the latest versions of required font files from Windows 10 machine into a separate folder inside Red Hat OpenShift and try running the following code of latest 21.4 version of Aspose.Words for Java:

Document doc = new Document("input.docx");

FontSettings fontSettings = new FontSettings();
addFontFolder(fontSettings, myDir + "CustomFonts/");
doc.setFontSettings(fontSettings);

doc.save("output.pdf");

private static void addFontFolder(FontSettings fontSettings, String folder)
{
    FontSourceBase[] fontSourceBases = fontSettings.getFontsSources();
    FontSourceBase[] newFontSourceBases = new FontSourceBase[fontSourceBases.length + 1];
    System.arraycopy(fontSourceBases, 0, newFontSourceBases, 0, fontSourceBases.length);
    newFontSourceBases[newFontSourceBases.length - 1] = new FolderFontSource(folder, true);
    fontSettings.setFontsSources(newFontSourceBases);
}

kslau · April 21, 2021, 6:21am

Thanks a lot!

Very useful information.

image003.jpg (169 Bytes)

image004.jpg (163 Bytes)

awais.hafeez · April 21, 2021, 9:22am

@kslau,

I am afraid, the images you attached here are not visible to me. In case you have further inquiries or may need any help in future, please let us know.