Some special characters are not saved correctly when converting from doc to pdf or png

When converting Microsoft Word DOC files to PDF or PNG files, we found some characters are not saved correctly.


I created a simply test case with a few lines:

public void testSpecialCharacters() throws Exception {
InputStream inputStream = getClass().getResourceAsStream("/docs/P331_Jun_17,_2014_75742.doc");
Document document = new Document(inputStream);

File pdfFile = new File(FileUtils.getTempDirectory(), “P331_Jun_17,_2014_75742.pdf”);
document.save(new FileOutputStream(pdfFile), SaveFormat.PDF);
log.debug(“Saved PDF document to: {}”, pdfFile.getAbsolutePath());
Assert.assertTrue(pdfFile.exists());

File pngFile = new File(FileUtils.getTempDirectory(), “P331_Jun_17,_2014_75742.png”);
document.save(new FileOutputStream(pngFile), SaveFormat.PNG);
log.debug(“Saved PNG document to: {}”, pngFile.getAbsolutePath());
Assert.assertTrue(pdfFile.exists());
}

I’ve attached all the test files to this post, include the DOC file, and output PDF and PNG files. You can see that some characters are shown as boxes in the output PDF and PNG files.

I guess this could be a bug of Aspose Words for Java library, please check and confirm. These special characters are very important in our documents.

Thanks,
Jake

Hi Jake,

Please accept my apologies for late response.

Thanks for your
inquiry. Please note that Aspose.Words requires TrueType fonts when
rendering documents to fixed-page formats (PDF, XPS or SWF). Make sure
you have all the Fonts installed on your machine you’re using to convert
Word document to PDF format. I would suggest you please read the
following articles:

http://www.aspose.com/docs/display/wordsjava/How+to++Specify+True+Type+Fonts+Location
http://www.aspose.com/docs/display/wordsjava/How+Aspose.Words+Uses+True+Type+Fonts

I would suggest you please upgrade to the latest version (v14.6.0) from here and let us know how it goes on your side. Hope this helps you.

If
you still face problem, please share following detail for investigation purposes.

  • What environment are you running on?
    • OS (Windows Version or Linux Version)
    • Architecture (32 / 64 bit)
    • Java version


Hi Tahir,


Thanks for the reply.

We’ve upgraded to Aspose.Words v14.6.0, and I tested it again, it still didn’t work for a few characters (not all special characters).

This issue occurred on both Mac (my dev machine) and Linux (the prod environment).

I checked that all TrueType fonts seem in place, including symbol.ttf and wingding.ttf copied from Windows machine.

The environment of my test is:

OS: Mac OS X 10.9.3 (13D65)
Arch: 64bit
Jave version:
----------------------
java version "1.7.0_51"
Java™ SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot™ 64-Bit Server VM (build 24.51-b03, mixed mode)
----------------------

Could you please have a try on the test files I attached?

Thanks,
Jake

Hi Jake,

Thanks for sharing the detail.

I have tested the scenario and have managed to reproduce the same issue at my side. For the sake of correction, I have logged this problem in our issue tracking system as WORDSJAVA-910. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Hi Jake,

Thanks for your patience.

Could you please copy all fonts from Windows to your Linux operating system and use FontSettings.setFontsFolder method and let us know if you still face the issue? Please read following documentation link for your kind reference.
http://www.aspose.com/docs/display/wordsjava/How+to++Install+True+Type+Fonts+on+Linux

It worked. Thanks.

Hi Jake,

Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.