Aspose.PDF for .NET ShowText operator does not support Chinese characters

Aspose.PDF ShowText operator that implements PDF Tj operator does not support Chinese characters. I am using Windows 10 OS and Aspose.PDF for .NET 21.10.1 (latest at the time of writing), and the problem exists in this version, still. Here is my code snippet:

using Aspose.Pdf;
using Aspose.Pdf.Text;

namespace ConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            var document = new Document();
            var page = document.Pages.Add();

            string fontResName;
            {
                Font font = FontRepository.FindFont("Arial");
                font.IsEmbedded = false;
                page.Resources.GetFonts(true).Add(font, out fontResName);
            }

            {
                Font font = page.Resources.Fonts[1];
                var fontSize = 9.0;
                var line = "CJK Unified Ideographs from BMP (wiki): 一 丁 丂 七 丄 丅 丆 万 丈 三 上 下 丌 不 与 丏";

                page.Contents.Add(new Aspose.Pdf.Operators.GSave());
                page.Contents.Add(new Aspose.Pdf.Operators.BT());
                page.Contents.Add(new Aspose.Pdf.Operators.SelectFont(fontResName, fontSize));
                page.Contents.Add(new Aspose.Pdf.Operators.MoveTextPosition(120.0, 750.0));
                page.Contents.Add(new Aspose.Pdf.Operators.ShowText(line, font));
                page.Contents.Add(new Aspose.Pdf.Operators.ET());
                page.Contents.Add(new Aspose.Pdf.Operators.GRestore());
            }

            document.Save("test_chinese_characters.pdf");
        }
    }
}

I am using Arial font installed in my system, and word processors in my system are capable of displaying the problematic characters with no issue, i.e. it is unlikely to be a font issue. I tried Courier New and Times New Roman fonts - Aspose.PDF is able to find these fonts and render European languages glyphs with no issue, but Asian characters still do not render, so the issue is with Asian characters.

In addition to Chinese characters, at least the following character sets are also unsupported: hindi letters, Japanese kanji, Japanese hiragana, Japanese katakana, Korean alphabet, Korean hanja, Japanese hiragana from Unicode Basic Multilingual Plane (BMP), CJK Unified Ideographs from BMP.

I am aware that TextFragment class offers a better support for unicode characters, however, due to technicalities of my project I cannot use TextFragment (or TextStamp) to display text, it needs to be PDF Tj operator implemented by ShowText class in Aspose.PDF. This request is specifically about Text property unicode support in ShowText class rather than about text display in general.

Aspose_PDF_Chinese_characters_support.zip (39.4 KB)

In the attachment, please find the following: (a) example C# project/solution using latest Aspose.PDF for .NET reproducing the issue; (b) test_chinese_characters.pdf containing example of broken PDF generated by Aspose.PDF; ( c ) _unicode_characters_test.txt containing extended test suite for unicode characters, of which most of Asian characters fail the test (all characters display correctly in my word processor for Arial, Courier New, and Times New Roman fonts).

What is expected: Chinese and other characters are correctly supported by ShowText operator with no issue.

Can you please help me with this issue?

By studying the issue in more detail, I now realize that my word processors have been (silently) performing font substitution for the problematic characters. If I set font name for the problematic characters as suggested by my word processor, everything starts working correctly. So probably there is no issue with ShowText operator, after all.

So my next question is: do you offer an API so that I could implement such font substitution programmatically, i.e. knowing the characters in my string, I could somehow iterate fonts installed in my system and find a one that supports all characters in my string? I think I saw some example of this in the net (perhaps for Aspose.PDF for Java), but cannot recollect it anymore.

Edit: Here is the example that I meant, but it is for Java: Chinese characters issw - #8 by tilal.ahmad

System.out.println(FontRepository.findFont("MSGothic").doesFontContainAllCharacters("竤"));

Anything like this for .NET?

@AlexeyM

For Aspose.PDF for .NET API, you may specify ReplacementFont property in TextEditOptions class. The specified font will be used (if accessible) in the case where specified TextState.Font can not display segment text. It can detect if any character is not supported but it can not decide which font can support the characters by iterating all installed fonts in your system.

segment.TextEditOptions = new TextEditOptions(TextEditOptions.NoCharacterAction.UseCustomReplacementFont)
{
    ReplacementFont = FontRepository.FindFont("Lucida Handwriting")
};

Thank you Mudassir for your prompt reply. As per original question, the subject of this request is really about PDF Tj operator / Aspose.PDF operator API ShowText class. I understand your suggestion is applicable to regular non-operator text display API, so I think it cannot be used for ShowText operator. Can you offer an API related to font substitution that can be used together with ShowText operator in a meaningful way? Thanks again.

@AlexeyM

An investigation ticket with ID PDFNET-50815 has been created in our issue tracking system. This thread has been linked with the issue so that you may be notified once any update is ready to be shared.