We are having an issue where certain unicode characters are not rendering correctly when saved to PDF. All these characters seem to be from the Latin Extended B unicode block.
I have verified this is not a font issue by turning of font embedding altogether when saving to PDF. When I open the PDF on my laptop which has Arial installed, I the characters appear as squares. When I open the Word document, the characters are render in arial correctly.
See attached zipfile with:
- input.docx
- output.pdf
- actual.pdfexample documents.zip (413.3 KB)
Here is our Aspose code (written in python with jpype):
def convert_to_pdf(src_path, dest_path):
Document = jpype.JClass('com.aspose.words.Document')
SaveFormat = jpype.JClass('com.aspose.words.SaveFormat')
Color = jpype.JClass('java.awt.Color')
doc = Document(src_path)
# Fix Pink Background Issue
# - Right now we are going to always set background color to white.
# - If that causes problems then we could target specific colors
# that are causing problems such as:
# pink = Color(255, 153, 204)
white = Color(255, 255, 255)
doc.setPageColor(white)
PdfSaveOptions = jpype.JClass('com.aspose.words.PdfSaveOptions')
options = PdfSaveOptions()
options.setEmbedFullFonts(True)
doc.save(dest_path, options)
Server is on Ubuntu 16.