Font Issues with content containing Arabic and Chinese when saving to PDF on Linux

Hello,

We have been having some issues with adding content that contains Arabic fonts to documents that we save as PDF files, where incorrect fonts are used in the generated pdf file (particularly with linux, but does affect svgs added to pdfs made on Windows as well). We use nearly identical code to download the same document as a DOCX file (only difference is the SaveFormat, as seen in the snippet below), which works fine and uses the expected fonts.

Our webapp uses Calibri when showing arabic in the browser, but when downloading the same content as a PDF, the arabic font changes to NotoSansArabic, which is too wide and overflows the text areas in the svgs that we add to the documents. We researched other arabic fonts that were more in line with what we wanted, and decided on Amiri Quran, which we install to our linux machine as fonts-hosny-amiri (we also install several other fonts, such as fonts-noto-cjk, fonts-noto-core, and fonts-liberation).

This fixed the arabic in the main html content sections of the downloaded pdf, but the svg that we add to the document still uses NotoSansArabic. Additionally, it caused our chinese pdfs to attempt to use amiri quran, resulting in chinese characters being shown as boxes (an issue we had previously fixed by installing fonts-noto-cjk).

I have attached PDF and DOCX files resulting from several situations:

Here is an approximation of our code flow:

downloadPDF(String svg)
{
   ByteArrayOutputStream baos = new ByteArrayOutputStream();
   Document doc = new Document();
   DocumentBuilder builder = new DocumentBuilder(doc);

   if (SystemUtils.IS_OS_LINUX)
   {
      // We added this section after having issues with Chinese characters showing as boxes
      FontSettings fontSettings = new FontSettings();
      fontSettings.getFallbackSettings().loadNotoFallbackSettings();
      builder.getDocument().setFontSettings(fontSettings);
   }

   builder.insertHtml(someHtmlGoesHere, true);
   builder.insertBreak(BreakType.PARAGRAPH_BREAK);

   builder.insertImage(svg.getBytes(StandardCharsets.UTF_8));

   builder.insertBreak(BreakType.PARAGRAPH_BREAK);
   builder.insertHtml(someOtherHtmlGoesHere, true);

   SaveOptions saveOptions = DocSaveOptions.createSaveOptions(SaveFormat.PDF);
   // SaveFormat.DOCX for word version

   doc.save(baos, saveOptions);
   return baos.toByteArray();
}

In my head, I see three different potential causes

  1. We’re doing something wrong with our font settings or other setup. This seems less likely because, as you can see from the shared files, the docx files are all working correctly, but could be possible.

  2. Maybe it’s a bug in Aspose Words. I did see other topics in the forum here that had bugs related to foreign languages and pdfs, but most of them seem to have been resolved already.

  3. We’re misusing Aspose Words for something it wasn’t intended to do, and perhaps we should be using Aspose.PDF to create and save PDF files

Phew, that was a mouthful. Thank you so much for your time, and any assistance you can provide.

Best regards,
Avery Norris

@averyscottnorris Your code is correct and you use Aspose.Words correctly for convection document to PDF.
You should note that DOCX document is flow document and normally does not contain fonts used in the document embedded into the file, so the consumer application (MS Word or OpenOffice) uses fonts installed on your system to display the document. On other hand PDF if fixed page format and fonts are embedded into the file. This guaranties that the document look identical in any environment when the document is viewed. So to save document to as PDF Aspose.Words requires the fonts used in the document. If Aspose.Words cannot find the fonts used in the document the fonts are substituted. This might lead into the layout difference, since substitution fonts might have different font metrics. You can implement IWarningCallback to get a notification when font substitution is performed.
Regarding incorrect/inaccurate rendering of Arabic text in diagram - MS Word by default uses open type features. You should explicitly enable open type features in Aspose.Words to get an accurate output for such content.

So to get an accurate PDF result, the fonts used in the document should be available in the environment where the conversion it performed and in your case it is required to enable open type features.

Thank you for the response, Alexey.

I have enabled opentype features as instructed, and added substitution and fallback tables. The substitution table is the default, and the fallback table is the noto fallback, with the Noto Sans Arabic swapped out for Amiri. This did the trick in getting the correct arabic font into the pdf’s svgs (yay!).

Unfortunately, it did not fix the problem with Chinese characters showing as boxes. I have attached a pdf with both chinese and arabic present, generated from the current fallback and substitution tables. The arabic comes through flawlessly while the chinese shows as boxes.

linux_ChineseArabicRussian.pdf (47.3 KB)

I also set the default language to Liberation Sans. It seems that the fallback is not working correctly for the chinese characters.

Here are our substitution settings:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<TableSubstitutionSettings xmlns="Aspose.Words">
	<SubstitutesTable>
		<Item OriginalFont="Arial" SubstituteFonts="Liberation Sans, FreeSans, Garuda, DejaVu Sans" />
		<Item OriginalFont="Charcoal" SubstituteFonts="FreeSerif" />
		<Item OriginalFont="Comic Sans MS" SubstituteFonts="DejaVu Sans" />
		<Item OriginalFont="Courier New" SubstituteFonts="FreeMono, Liberation Mono, DejaVu Sans Mono" />
		<Item OriginalFont="Georgia" SubstituteFonts="Norasi, Liberation Serif, FreeSerif, DejaVu Serif" />
		<Item OriginalFont="Helvetica" SubstituteFonts="FreeSans" />
		<Item OriginalFont="Lucida Grande" SubstituteFonts="FreeMono, Liberation Mono, DejaVu Sans" />
		<Item OriginalFont="Lucida Sans Unicode" SubstituteFonts="FreeMono, Liberation Mono, DejaVu Sans" />
		<Item OriginalFont="Lucida Console" SubstituteFonts="FreeMono, Liberation Mono, DejaVu Sans Mono" />
		<Item OriginalFont="New York" SubstituteFonts="DejaVu Serif" />
		<Item OriginalFont="Tahoma" SubstituteFonts="DejaVu Sans, Kalimati" />
		<Item OriginalFont="Times New Roman" SubstituteFonts="FreeSerif, Liberation Serif, DejaVu Serif" />
		<Item OriginalFont="Palatino Linotype" SubstituteFonts="FreeSerif" />
		<Item OriginalFont="Verdana" SubstituteFonts="DejaVu Sans Mono" />
		<Item OriginalFont="Trebuchet MS" SubstituteFonts="Liberation Sans, FreeSans, Garuda, DejaVu Sans" />
		<Item OriginalFont="Impact" SubstituteFonts="Rekha, DejaVu Sans" />
		<Item OriginalFont="Arabic Transparent" SubstituteFonts="KacstArt" />
		<Item OriginalFont="Arial Baltic" SubstituteFonts="Liberation Sans, FreeSans, Garuda, DejaVu Sans" />
		<Item OriginalFont="Arial CE" SubstituteFonts="Liberation Sans, FreeSans, Garuda, DejaVu Sans" />
		<Item OriginalFont="Arial Cyr" SubstituteFonts="Liberation Sans, FreeSans, Garuda, DejaVu Sans" />
		<Item OriginalFont="Arial Greek" SubstituteFonts="Liberation Sans, FreeSans, Garuda, DejaVu Sans" />
		<Item OriginalFont="Arial TUR" SubstituteFonts="Liberation Sans, FreeSans, Garuda, DejaVu Sans" />
		<Item OriginalFont="Courier New Baltic" SubstituteFonts="FreeMono, Liberation Mono, DejaVu Sans Mono" />
		<Item OriginalFont="Courier New CE" SubstituteFonts="FreeMono, Liberation Mono, DejaVu Sans Mono" />
		<Item OriginalFont="Courier New Cyr" SubstituteFonts="FreeMono, Liberation Mono, DejaVu Sans Mono" />
		<Item OriginalFont="Courier New Greek" SubstituteFonts="FreeMono, Liberation Mono, DejaVu Sans Mono" />
		<Item OriginalFont="Courier New TUR" SubstituteFonts="FreeMono, Liberation Mono, DejaVu Sans Mono" />
		<Item OriginalFont="Courier" SubstituteFonts="FreeMono, Liberation Mono, DejaVu Sans Mono" />
		<Item OriginalFont="Tahoma Armenian" SubstituteFonts="DejaVu Sans" />
		<Item OriginalFont="Times" SubstituteFonts="FreeSerif, Liberation Serif, DejaVu Serif" />
		<Item OriginalFont="Times New Roman Baltic" SubstituteFonts="FreeSerif, Liberation Serif, DejaVu Serif" />
		<Item OriginalFont="Times New Roman CE" SubstituteFonts="FreeSerif, Liberation Serif, DejaVu Serif" />
		<Item OriginalFont="Times New Roman Cyr" SubstituteFonts="FreeSerif, Liberation Serif, DejaVu Serif" />
		<Item OriginalFont="Times New Roman Greek" SubstituteFonts="FreeSerif, Liberation Serif, DejaVu Serif" />
		<Item OriginalFont="Times New Roman TUR" SubstituteFonts="FreeSerif, Liberation Serif, DejaVu Serif" />
		<Item OriginalFont="Microsoft Sans Serif" SubstituteFonts="DejaVu Sans" />
		<Item OriginalFont="MS UI Gothic" SubstituteFonts="TakaoPGothic" />
		<Item OriginalFont="PMingLiU-ExtB" SubstituteFonts="FreeSerif" />
		<Item OriginalFont="Cambria Math" SubstituteFonts="FreeSerif Italic" />
		<Item OriginalFont="Calibri" SubstituteFonts="Liberation Sans" />
		<Item OriginalFont="MS PGothic" SubstituteFonts="TakaoPGothic" />
		<Item OriginalFont="Arial Unicode MS" SubstituteFonts="TakaoPGothic" />
		<Item OriginalFont="Microsoft YaHei" SubstituteFonts="MSungGB18030C-Medium" />
	</SubstitutesTable>
</TableSubstitutionSettings>

And here are our fallback settings:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<FontFallbackSettings xmlns="Aspose.Words">
	<FallbackTable>
		<Rule Ranges="1E900-1E95F" FallbackFonts="Noto Sans Adlam Unjoined, Noto Sans Adlam" />
		<Rule Ranges="14400-1467F" FallbackFonts="Noto Sans AnatoHiero" />
		<Rule Ranges="0530-058F, FB13-FB17" FallbackFonts="Noto Sans Armenian" />
		<Rule Ranges="10B00-10B3F" FallbackFonts="Noto Sans Avestan" />
		<Rule Ranges="1B00-1B7F" FallbackFonts="Noto Sans Balinese" />
		<Rule Ranges="A6A0-A6FF, 16800-16A3F" FallbackFonts="Noto Sans Bamum" />
		<Rule Ranges="1BC0-1BFF" FallbackFonts="Noto Sans Batak" />
		<Rule Ranges="0980-09FF" FallbackFonts="Noto Sans Bengali" />
		<Rule Ranges="11000-1107F" FallbackFonts="Noto Sans Brahmi" />
		<Rule Ranges="1A00-1A1F" FallbackFonts="Noto Sans Buginese" />
		<Rule Ranges="1740-175F" FallbackFonts="Noto Sans Buhid" />
		<Rule Ranges="1400-167F, 18B0-18FF" FallbackFonts="Noto Sans Canadian Aboriginal" />
		<Rule Ranges="102A0-102DF" FallbackFonts="Noto Sans Carian" />
		<Rule Ranges="11100-1114F" FallbackFonts="Noto Sans Chakma" />
		<Rule Ranges="AA00-AA5F" FallbackFonts="Noto Sans Cham" />
		<Rule Ranges="13A0-13FF, AB70-ABBF" FallbackFonts="Noto Sans Cherokee" />
		<Rule Ranges="0370-03FF, 2C80-2CFF" FallbackFonts="Noto Sans Coptic" />
		<Rule Ranges="12000-1254F" FallbackFonts="Noto Sans Cuneiform" />
		<Rule Ranges="10800-1083F" FallbackFonts="Noto Sans Cypriot" />
		<Rule Ranges="10400-1044F" FallbackFonts="Noto Sans Deseret" />
		<Rule Ranges="0900-097F, 1CD0-1CFF, A830-A83F, A8E0-A8FF" FallbackFonts="Noto Sans Devanagari" />
		<Rule Ranges="13000-1342F" FallbackFonts="Noto Sans EgyptHiero" />
		<Rule Ranges="1200-139F, 2D80-2DDF, AB00-AB2F" FallbackFonts="Noto Sans Ethiopic" />
		<Rule Ranges="10A0-10FF, 2D00-2D2F" FallbackFonts="Noto Sans Georgian" />
		<Rule Ranges="2C00-2C5F, 1E000-1E02F" FallbackFonts="Noto Sans Glagolitic" />
		<Rule Ranges="10330-1034F" FallbackFonts="Noto Sans Gothic" />
		<Rule Ranges="0A80-0AFF" FallbackFonts="Noto Sans Gujarati" />
		<Rule Ranges="0A00-0A7F" FallbackFonts="Noto Sans Gurmukhi" />
		<Rule Ranges="1720-173F" FallbackFonts="Noto Sans Hanunoo" />
		<Rule Ranges="0590-05FF, FB1D-FB4F" FallbackFonts="Noto Sans Hebrew" />
		<Rule Ranges="10840-1085F" FallbackFonts="Noto Sans ImpAramaic" />
		<Rule Ranges="10B60-10B7F" FallbackFonts="Noto Sans InsPahlavi" />
		<Rule Ranges="10B40-10B5F" FallbackFonts="Noto Sans InsParthi" />
		<Rule Ranges="A980-A9DF" FallbackFonts="Noto Sans Javanese" />
		<Rule Ranges="11080-110CF" FallbackFonts="Noto Sans Kaithi" />
		<Rule Ranges="0C80-0CFF" FallbackFonts="Noto Sans Kannada" />
		<Rule Ranges="A900-A92F" FallbackFonts="Noto Sans Kayah Li" />
		<Rule Ranges="10A00-10A5F" FallbackFonts="Noto Sans Kharoshthi" />
		<Rule Ranges="1780-17FF, 19E0-19FF" FallbackFonts="Noto Sans Khmer" />
		<Rule Ranges="0E80-0EFF" FallbackFonts="Noto Sans Lao" />
		<Rule Ranges="1C00-1C4F" FallbackFonts="Noto Sans Lepcha" />
		<Rule Ranges="1900-194F" FallbackFonts="Noto Sans Limbu" />
		<Rule Ranges="10000-1013F" FallbackFonts="Noto Sans Linear B" />
		<Rule Ranges="A4D0-A4FF" FallbackFonts="Noto Sans Lisu" />
		<Rule Ranges="10280-1029F" FallbackFonts="Noto Sans Lycian" />
		<Rule Ranges="10920-1093F" FallbackFonts="Noto Sans Lydian" />
		<Rule Ranges="0D00-0D7F" FallbackFonts="Noto Sans Malayalam" />
		<Rule Ranges="0840-085F" FallbackFonts="Noto Sans Mandaic" />
		<Rule Ranges="AAE0-AAFF, ABC0-ABFF" FallbackFonts="Noto Sans Meetei Mayek" />
		<Rule Ranges="1800-18AF" FallbackFonts="Noto Sans Mongolian" />
		<Rule Ranges="1000-109F, A9E0-A9FF, AA60-AA7F" FallbackFonts="Noto Sans Myanmar" />
		<Rule Ranges="07C0-07FF" FallbackFonts="Noto Sans N'Ko" />
		<Rule Ranges="1980-19DF" FallbackFonts="Noto Sans NewTaiLue" />
		<Rule Ranges="1680-169F" FallbackFonts="Noto Sans Ogham" />
		<Rule Ranges="1C50-1C7F" FallbackFonts="Noto Sans Ol Chiki" />
		<Rule Ranges="10300-1032F" FallbackFonts="Noto Sans Old Italic" />
		<Rule Ranges="103A0-103DF" FallbackFonts="Noto Sans OldPersian" />
		<Rule Ranges="10A60-10A7F" FallbackFonts="Noto Sans OldSouArab" />
		<Rule Ranges="10C00-10C4F" FallbackFonts="Noto Sans Old Turkic" />
		<Rule Ranges="0B00-0B7F" FallbackFonts="Noto Sans Oriya" />
		<Rule Ranges="104B0-104FF" FallbackFonts="Noto Sans Osage" />
		<Rule Ranges="10480-104AF" FallbackFonts="Noto Sans Osmanya" />
		<Rule Ranges="A840-A87F" FallbackFonts="Noto Sans Phags Pa" />
		<Rule Ranges="10900-1091F" FallbackFonts="Noto Sans Phoenician" />
		<Rule Ranges="A930-A95F" FallbackFonts="Noto Sans Rejang" />
		<Rule Ranges="16A0-16FF" FallbackFonts="Noto Sans Runic" />
		<Rule Ranges="0800-083F" FallbackFonts="Noto Sans Samaritan" />
		<Rule Ranges="A880-A8DF" FallbackFonts="Noto Sans Saurashtra" />
		<Rule Ranges="10450-1047F" FallbackFonts="Noto Sans Shavian" />
		<Rule Ranges="0D80-0DFF, 111E0-111FF" FallbackFonts="Noto Sans Sinhala" />
		<Rule Ranges="1B80-1BBF, 1CC0-1CCF" FallbackFonts="Noto Sans Sundanese" />
		<Rule Ranges="A800-A82F" FallbackFonts="Noto Sans Syloti Nagri" />
		<Rule Ranges="1700-171F" FallbackFonts="Noto Sans Tagalog" />
		<Rule Ranges="1760-177F" FallbackFonts="Noto Sans Tagbanwa" />
		<Rule Ranges="1950-197F" FallbackFonts="Noto Sans Tai Le" />
		<Rule Ranges="1A20-1AAF" FallbackFonts="Noto Sans Tai Tham" />
		<Rule Ranges="AA80-AADF" FallbackFonts="Noto Sans Tai Viet" />
		<Rule Ranges="0B80-0BFF" FallbackFonts="Noto Sans Tamil" />
		<Rule Ranges="0C00-0C7F" FallbackFonts="Noto Sans Telugu" />
		<Rule Ranges="0E00-0E7F" FallbackFonts="Noto Sans Thai" />
		<Rule Ranges="0F00-0FFF" FallbackFonts="Noto Sans Tibetan" />
		<Rule Ranges="10380-1039F" FallbackFonts="Noto Sans Ugaritic" />
		<Rule Ranges="A500-A63F" FallbackFonts="Noto Sans Vai" />
		<Rule Ranges="0600-06FF, 0750-077F, 08A0-08FF, FB50-FDFF, FE70-FEFF" FallbackFonts="Amiri" />
		<Rule Ranges="0780-07BF, 0300-036F" FallbackFonts="Noto Sans Thaana" />
		<Rule Ranges="0700-074F, 0300-036F" FallbackFonts="Noto Sans Syriac Estrangela, Noto Sans Syriac Eastern, Noto Sans Syriac Western" />
		<Rule Ranges="2D30-2D7F, 0300-036F" FallbackFonts="Noto Sans Tifinagh" />
		<Rule Ranges="1100-11FF, 2500-259F, 2E80-2FDF, 2FF0-4DBF, 4E00-9FFF, A960-A97F, AC00-D7FF, F900-FAFF, FE10-FE1F, FE30-FE6F, FF00-FFEF, 1F100-1F2FF, 20000-2A6DF, 2A700-2EBEF, 2F800-2FA1F" FallbackFonts="Noto Sans CJK JP Regular, Noto Sans CJK KR Regular, Noto Sans CJK SC Regular, Noto Sans CJK TC Regular" />
		<Rule Ranges="3000-303F, A000-A4CF, FE50-FE6F" FallbackFonts="Noto Sans Yi" />
		<Rule Ranges="2150-21FF, 2300-23FF, 2460-24FF, 2600-27BF, 1F100-1F1FF, 1F300-1F5FF, 1F700-1F77F" FallbackFonts="Noto Sans Symbols" />
		<Rule Ranges="2190-245F, 25A0-27BF, 2800-28FF, 2B00-2BFF, 4DC0-4DFF, 10140-101FF, 102E0-102FF, 10E60-10E7F, 1D300-1D37F, 1F000-1F0FF, 1F300-1F5FF, 1F650-1F6FF, 1F780-1F9FF" FallbackFonts="Noto Sans Symbols2" />
		<Rule Ranges="2190-21FF, 2300-23FF, 2600-27BF, 1F100-1F64F, 1F680-1F6FF" FallbackFonts="Noto Emoji" />
		<Rule FallbackFonts="Noto Sans" />
	</FallbackTable>
</FontFallbackSettings>

Please let me know if there’s any other information I can provide, and I appreciate your time and assistance.

Best regards,
Avery

@averyscottnorris Could you please also attach your source MS Word document here for testing? We will check the conversion and provide you more information.

Here is a DOCX generated from the same content

linux_ChineseArabicRussian.docx (21.4 KB)

Here are the fonts we install in the linux environment where this was generated:

apt-get install -y fonts-liberation fontconfig libfreetype6 fonts-noto-cjk fonts-hosny-amiri fonts-noto-core

@averyscottnorris Thank you for additional information. I will check the scenario on my side and provide you more information.

@averyscottnorris It seems that Aspose.Words do not have access to the Noto CJK font you are installed. By default Aspose.Words scans “home/<username>/.fonts”, “/usr/share/fonts”, “/usr/local/share/fonts”, “/usr/X11R6/lib/X11/fonts” folders on Linux systems. Could you please check the location where the fonts are installed? Also as an alternative you could download the required fonts from Noto website and explicitly set up their location in the FontSettings.

Here are screenshots of the layout of /usr/share/fonts on our system:

Noto-CJK installed, Amiri NOT installed (Chinese correct font, Arabic wrong font):

Noto-CJK AND Amiri installed (Arabic correct font, Chinese showing as boxes):

Interestingly, the noto-cjk fonts (and when installed, Amiri) seem to be getting installed into /fonts/opentype, while liberation and noto-core get installed into fonts/truetype, even though all fonts seem to be .ttf or .ttc. In the case where Amiri is not installed, this seems to cause no problem as the Chinese fonts load properly.

@averyscottnorris Thanks for sharing additional info. I’ve missed the issue the first time. The font names in the package “fonts-noto-cjk” (“Noto Sans CJK JP”) is different then the ones used in fallback settings (“Noto Sans CJK JP Regular”). It seems that these fonts names have been changed since the fallback setting were updated last time. Please change the name in fallback settings to “Noto Sans CJK JP” (or you could use other Noto CJK font if your prefer). It should fix the issue.

1 Like

That seems to have done the trick! Thank you so much @alexey.noskov and @Konstantin.Kornilov, I really appreciate your time and assistance!

Best,
Avery

1 Like