Font issues converting Word to PDF with Thai

Hello,

we are experiencing an issue when we insert a wartermark in Thai into a document and try to convert it to PDF. The watermark isn´t displayed correctly.

If the document is saved as docx, Word replaces the Font with Leelawadee UI and can display the symbols. The issue only occurs when saving to PDF.

As the symbols may be missing I read about Font Fallback settings here Manipulate and Substitute TrueType Fonts|Aspose.Words for Java

So I tried using LoadMsOfficeFallbackSettings, LoadNotoFallbackSettings and BuildAutomatic but none of them seem to help to mimics word behaviour of changing the font for affected symbols.

// See https://aka.ms/new-console-template for more information

using Aspose.Words;
using Aspose.Words.Drawing;
using System.Drawing;
using Aspose.Words.Fonts;
using Aspose.Words.Loading;

var lic = new License();
lic.SetLicense(@"S:\Aspose.Total.NET.lic");

var fontSettings = FontSettings.DefaultInstance;

fontSettings.FallbackSettings.LoadMsOfficeFallbackSettings();
//fontSettings.FallbackSettings.LoadNotoFallbackSettings();
//fontSettings.FallbackSettings.BuildAutomatic();

var options = new LoadOptions() { FontSettings = fontSettings };

var doc = new Document(@"S:\tmp\in.docx", options);

var watermark = new Shape(doc, ShapeType.TextPlainText);
// Delphi hat die Byte zuordnungen genau anders herum, also jeweils ein Byte für BGR
var realColor = Color.FromArgb(150, 190, 200);


watermark.TextPath.Text = "แบบร่าง";
watermark.TextPath.FontFamily = "Times New Roman";
//watermark.TextPath.FontFamily = "Leelawadee UI";
watermark.Width = 600;
watermark.Height = 200;
watermark.Rotation = 315;
watermark.Fill.ForeColor = realColor;
watermark.Fill.Opacity = 0.5;
watermark.StrokeColor = Color.Transparent;

watermark.RelativeHorizontalPosition = RelativeHorizontalPosition.Margin;
watermark.RelativeVerticalPosition = RelativeVerticalPosition.Margin;
watermark.WrapSide = WrapSide.Largest;
watermark.WrapType = WrapType.None;
watermark.VerticalAlignment = VerticalAlignment.Center;
watermark.HorizontalAlignment = HorizontalAlignment.Center;

InsertWatermarkToHeader(watermark, HeaderFooterType.HeaderPrimary);
InsertWatermarkToHeader(watermark, HeaderFooterType.HeaderFirst);
InsertWatermarkToHeader(watermark, HeaderFooterType.HeaderEven);


doc.Save(@"S:\tmp\out.docx");
doc.Save(@"S:\tmp\out.pdf");

void InsertWatermarkToHeader(Shape watermark, HeaderFooterType headerType)
{
    var builder = new DocumentBuilder(doc);

    builder.MoveToHeaderFooter(headerType);
    var header = builder.CurrentSection.HeadersFooters[headerType];
    if (header == null)
    {
        header = new HeaderFooter(doc, headerType);
        builder.CurrentSection.HeadersFooters.Add(header);
    }

    header.FirstParagraph.AppendChild(watermark.Clone(true));
}

files.zip (69.9 KB)

@Serraniel, I have tried running your program with the line where FontFamily is set to Leelawadee UI uncommented, and the program produces PDF file with watermark correctly.

Times New Roman does not contain glyphs for Thai, so it will be substituted with another font. Leelawadee UI does contain glyphs for Thai so it will be used for rendered Thai characters.

Am I correct that in some case the PDF produced by Aspose.Words contains rectanges instead of Thai characters as in the attached out.pdf? Under what conditions does this happen? What font and what operating system are used when this issue occurs?

Yes this is correct. In all of the cases with Times New Roman, the PDF contains the rectangles and no font or symbols are substituted.
My expectation was, that the Fallbacksettings.LoadMsOfficeFallbackSettings (or any of the other functions) would substitute Times New Roman with Leelawadee for PDF as this is the behaviour of word, as the fallback settings are described to be used, if single symbols of a font are missing in the help:

The Font fallback mechanism is used when the font is resolved, but it does not contain a specific character. In this case, Aspose.Words tries to use one of the fallback fonts for the character.

@Serraniel, I would like to get more details about this issue:

  1. Is the Leelawadee UI font present on the computer where you generate PDF file? The rectangles instead of characters usually occur when Aspose.Words cannot find any suitable font to render these characters.
  2. What happens if you uncomment this line watermark.TextPath.FontFamily = "Leelawadee UI"; and run the program? Does the output PDF contain Thai characters?
  3. Does the issue occur on Windows or Linux?
  1. Is the Leelawadee UI font present on the computer where you generate PDF file? The rectangles instead of characters usually occur when Aspose.Words cannot find any suitable font to render these characters.

Yes the front is installed: image.png (127.7 KB)

  1. What happens if you uncomment this line watermark.TextPath.FontFamily = "Leelawadee UI"; and run the program? Does the output PDF contain Thai characters?

In this case, the Thai symbols are correct in the PDF. The issue only occures if the originally used font does not contain the symbols. That´s why I thought it maybe is a bug with the FallbackFont Settings. I also don´t know if Leelawadee UI is the correct / standard font for these symbols but I saw that was the font Microsoft Word uses on my PC, to replace the missing symbols from Times New Roman watermark, if I save as docx and open it with Word.

  1. Does the issue occur on Windows or Linux?

It´s a windows 10 system
image.png (4.3 KB)

@Serraniel Aspose.Words uses “Angsana New” font as a fallback for Thai characters. This fits the MS Word behavior if the “Angsana New” font is installed. To solve the issue with missing watermark you could either install “Angsana New” font or create the custom fallback table with “Leelawadee UI” for Thai characters.
Aspose.Words emulates latest MS Word version font fallback with all Windows/Office installed fonts. At the moment Aspose.Words do not emulate the MS Word behavior with limited number of fonts installed. So to get the output closer to MS Word you could consider installing fonts for all languages (via “Download fonts for all languages” on the fonts control panel) or at least supplemental fonts for languages your are using (via “Optional features” in settings).

Thanks for your reply. After installing the fonts for all languages the squares are gone and text is visible.

However there still is an issue with the placement:

var doc = new Document(@"S:\tmp\in.docx");

var watermark = new Shape(doc, ShapeType.TextPlainText);
var realColor = Color.FromArgb(150, 190, 200);


watermark.TextPath.Text = "แบบร่าง";
watermark.TextPath.FontFamily = "Times New Roman";
watermark.Width = 600;
watermark.Height = 200;
watermark.Rotation = 315;
watermark.Fill.ForeColor = realColor;
watermark.Fill.Opacity = 0.5;
watermark.StrokeColor = Color.Transparent;

watermark.RelativeHorizontalPosition = RelativeHorizontalPosition.Margin;
watermark.RelativeVerticalPosition = RelativeVerticalPosition.Margin;
watermark.WrapSide = WrapSide.Largest;
watermark.WrapType = WrapType.None;
watermark.VerticalAlignment = VerticalAlignment.Center;
watermark.HorizontalAlignment = HorizontalAlignment.Center;

InsertWatermarkToHeader(watermark, HeaderFooterType.HeaderPrimary);
InsertWatermarkToHeader(watermark, HeaderFooterType.HeaderFirst);
InsertWatermarkToHeader(watermark, HeaderFooterType.HeaderEven);

doc.Save(@"S:\tmp\out.pdf");

void InsertWatermarkToHeader(Shape watermark, HeaderFooterType headerType)
{
    var builder = new DocumentBuilder(doc);

    builder.MoveToHeaderFooter(headerType);
    var header = builder.CurrentSection.HeadersFooters[headerType];
    if (header == null)
    {
        header = new HeaderFooter(doc, headerType);
        builder.CurrentSection.HeadersFooters.Add(header);
    }

    header.FirstParagraph.AppendChild(watermark.Clone(true));
}

When saving with Times New Roman the watermark is placed in a different position (out.pdf).
For comparison if I set the font to Angsana New it is placed in the middle of the document (out2.pdf).

files.zip (48.3 KB)

@Serraniel
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25804

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

The issues you have found earlier (filed as WORDSNET-25804) have been fixed in this Aspose.Words for .NET 23.10 update also available on NuGet.