Mailmerge issue with Thai language in Docker kubernetes container

Hi,

We are using .Net Aspose Word to mailmerge, and we observe some issue with the Thai language document.
The texts in Thai language are converted to some unrecognized characters during merge, and this happens only in Linux server (Azure Kubernetes), while it is working fine on local Windows machine.

Input (Docx)

Merged Output (Pdf)

Below is the C# code we use to mailmerge

IDictionary<string, object> parameters1 = new Dictionary<string, object> { { "PERSONNAME", "Jagan" } };
 
 IEnumerable<TemplateSequenceParameter> sequenceParameters1 = new List<TemplateSequenceParameter>
 {
 	new TemplateSequenceParameter
 	{
 		Key = "CONTACTTYPE",
 		Values = new List<object> { "home", "office", ... }
 	},
 	new TemplateSequenceParameter
 	{
 		Key = "EMPLOYEEFIRSTNAME",
 		Values = new List<object> { "john", "peter", ... }
 	}
 };
 
 using (MemoryStream stream = new MemoryStream())
 {
 	using (XmlWriter writer = XmlWriter.Create(stream, new XmlWriterSettings()))
 	{
 		writer.WriteStartDocument();
 		writer.WriteStartElement(DataSourceName);
 
 		foreach (KeyValuePair<string, object> parameter in parameters)
 		{
 			writer.WriteElementString(parameter.Key, parameter.Value.ToString());
 		}
 
 		int colCount = sequenceParameters.Count();
 		if (colCount > 0)
 		{
 			int rowCount = sequenceParameters.First().Values.Count();
 
 			for (int rowInd = 0; rowInd < rowCount; rowInd++)
 			{
 				writer.WriteStartElement(DataSourceSequenceParameterPrefix);
 
 				for (int colInd = 0; colInd < colCount; colInd++)
 				{
 					writer.WriteElementString(sequenceParameters.ElementAt(colInd).Key, sequenceParameters.ElementAt(colInd).Values.ElementAt(rowInd).ToString());
 				}
 
 				writer.WriteEndElement();
 			}
 		}
 
 		writer.WriteEndElement();
 		writer.WriteEndDocument();
 		writer.Flush();
   }
 
 	stream.Position = 0;
 
 	XmlDataSource dataSource = new XmlDataSource(stream);

      ReportingEngine engine = new ReportingEngine();
           engine.Options |= ReportBuildOptions.AllowMissingMembers;
           engine.Options |= ReportBuildOptions.RemoveEmptyParagraphs;

     engine.BuildReport(document, dataSource, DataSourceName);

    using (MemoryStream ms = new MemoryStream(4096))
    {
	   document.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;
	   document.Save(ms, GetSaveOptions("pdf"));
	   byte[] documentContent = ms.ToArray();

          return documentContent;
    }

 }


private static SaveOptions GetSaveOptions(DocumentFormat outputFormat)
{
	switch (outputFormat)
	{
		case "pdf":
			PdfSaveOptions options = new PdfSaveOptions();

			options.MemoryOptimization = true;
			options.ImageCompression = PdfImageCompression.Jpeg;
			options.JpegQuality = 0;
			options.UseHighQualityRendering = false;
			options.DmlRenderingMode = DmlRenderingMode.Fallback;
			options.DmlEffectsRenderingMode = DmlEffectsRenderingMode.None;

			return options;
	}
}

Input file -
PND91SPECIALTAX_FinalUpdated (1) (2) (1).docx (349.4 KB)

Output merged file -
PND91SPECIALTAX_Merged.pdf (157.4 KB)


Below is the docker file we use to publish

FROM mcr.microsoft.com/dotnet/aspnet:8.0

# Install dependencies
RUN sed -i 's/^Components: main$/& contrib/' /etc/apt/sources.list.d/debian.sources
RUN apt-get update \
    && apt-get install -y \
        libfreetype6 \
        libfontconfig1 \
        libgif-dev autoconf libtool automake build-essential gettext libglib2.0-dev libcairo2-dev libtiff-dev libexif-dev \
		apt-utils libgdiplus libc6-dev \
        ttf-mscorefonts-installer fontconfig \
	&& fc-cache -f -v \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY ./dist /app

ENV ASPNETCORE_URLS=http://+:5000

ENTRYPOINT [ "dotnet", "Hrx.DocGen.WebApi.dll" ]

please note that I added the below line to support Thai language

document.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;

After that mailMerge is throwing below exception while saving documents.

{
	"ClassName": "System.Exception",
	"Message": "Text shaper factory failed to return text shaper for '/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf', face index '0'",
	"Data": null,
	"InnerException": {
		"ClassName": "System.DllNotFoundException",
		"Message": "Unable to load shared library 'harfbuzz' or one of its dependencies. In order to help diagnose loading problems, consider using a tool like strace. If you're using glibc, consider setting the LD_DEBUG environment variable: \n/app/harfbuzz.so: cannot open shared object file: No such file or directory\n/app/libharfbuzz.so: cannot open shared object file: No such file or directory\n/app/harfbuzz: cannot open shared object file: No such file or directory\n/app/libharfbuzz: cannot open shared object file: No such file or directory\n",
		"Data": null,
		"InnerException": null,
		"HelpURL": null,
		"StackTraceString": "   at \u000F\u000F.\u0002.d(IntPtr d, UInt32 v, b c, IntPtr t, IntPtr n)\n   at \u000F\u000F.\u0002(IntPtr \u0002, UInt32 \b, b \u0005, IntPtr \u0006, IntPtr \u0003)\n   at \u000E\u000F.\u0002(Byte[] \u0002, Int32 \b)\n   at Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.GetTextShaper(String fontPath, Int32 faceIndex)\n   at Aspose.Words.Shaping.BasicTextShaperCache.Aspose.Words.Shaping.ITextShaperFactory.GetTextShaper(String fontPath, Int32 faceIndex)\n   at MUi.d(String[] d, QU5 v, ITextShaperFactory c, UnicodeScript t, Boolean n, VariationAxisCoordinate[] B)",
		"RemoteStackTraceString": null,
		"RemoteStackIndex": 0,
		"ExceptionMethod": null,
		"HResult": -2146233052,
		"Source": "Aspose.Words.Shaping.HarfBuzz",
		"WatsonBuckets": null,
		"TypeLoadClassName": null,
		"TypeLoadAssemblyName": null,
		"TypeLoadMessageArg": null,
		"TypeLoadResourceID": 0
	},
	"HelpURL": null,
	"StackTraceString": "   at MUi.d(String[] d, QU5 v, ITextShaperFactory c, UnicodeScript t, Boolean n, VariationAxisCoordinate[] B)\n   at MUi.d(kUJ d, ITextShaperFactory v)\n   at YUi.n(kUJ d)\n   at YUi.d(kUJ d)\n   at kUJ.kUJiFl3()\n   at kUJ.cU7iFp()\n   at pUq.cU7iFp()\n   at vU5.b()\n   at vU5.d(Boolean d, Int32 v)\n   at vU5.d(Boolean d)\n   at OU5.d(vU5 d)\n   at OU5.d(mU5 d, Int32 v)\n   at oUZ.d(mU5 d)\n   at oUZ.d(mU5 d, SUq v)\n   at GUZ.d(mU5 d, SUq v)\n   at GUZ.v(mU5 d, Int32 v, Boolean c)\n   at zU7.v()\n   at zU7.v(wU7 d, Int32 v, Boolean c)\n   at zUJ.v()\n   at zUJ.v(nU7 d, Int32 v, Boolean c, Boolean t)\n   at pU7.c()\n   at pU7.v(jU7 d, Int32 v)\n   at pU7.d(jU7 d, Int32 v)\n   at GU5.v(jU7 d)\n   at GU5.c()\n   at GU5.fUZiFd(cU7 d)\n   at rU7.d(cU7 d)\n   at PU7.d(JU5 d)\n   at RU7.m()\n   at tUi.c()\n   at Aspose.Words.Document.UpdatePageLayout()\n   at jUC.v(nCQ d, FP v)\n   at jUC.YUYiFd(nCQ d)\n   at wUa.wUaiFd(nCQ d)\n   at Aspose.Words.Document.d(nCQ d)\n   at Aspose.Words.Document.d(Stream d, String v, SaveOptions c)\n   at Hrx.DocGen.Infrastructure.Services.WordsReportBuilderService.GetDocumentData(DocumentGenerationModel generationModel, Document document) in /home/vsts/work/1/s/src/Hrx.DocGen.Infrastructure/Services/WordsReportBuilderService.cs:line 374",
	"RemoteStackTraceString": null,
	"RemoteStackIndex": 0,
	"ExceptionMethod": null,
	"HResult": -2146233088,
	"Source": "Aspose.Words",
	"WatsonBuckets": null
}

Can you please suggest how to solve this issue

@mjjagan

Cause

The issue you’re experiencing with Thai language text during mail merge in a Linux environment is likely due to missing font support or libraries required for rendering complex scripts. The error message indicates that the HarfBuzz library, which is essential for shaping text in languages like Thai, is not found in your Docker container.

Solution

To resolve this issue, you need to ensure that the HarfBuzz library and its dependencies are installed in your Docker container. You can modify your Dockerfile to include the installation of the HarfBuzz library. Here’s how you can do it:

  1. Update your Dockerfile to install the necessary libraries, including HarfBuzz.

Code

Here’s an updated version of your Dockerfile:

FROM mcr.microsoft.com/dotnet/aspnet:8.0

# Install dependencies
RUN apt-get update \
    && apt-get install -y \
        libfreetype6 \
        libfontconfig1 \
        libharfbuzz-dev \
        libharfbuzz0b \
        libgif-dev \
        autoconf \
        libtool \
        automake \
        build-essential \
        gettext \
        libglib2.0-dev \
        libcairo2-dev \
        libtiff-dev \
        libexif-dev \
        apt-utils \
        libgdiplus \
        libc6-dev \
        ttf-mscorefonts-installer \
        fontconfig \
    && fc-cache -f -v \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY ./dist /app

ENV ASPNETCORE_URLS=http://+:5000

ENTRYPOINT [ "dotnet", "Hrx.DocGen.WebApi.dll" ]

Make sure to rebuild your Docker image after making these changes.

Next Steps

  1. Rebuild your Docker image using the updated Dockerfile.
  2. Run your application again and check if the Thai text is processed correctly during the mail merge.
  3. If you continue to experience issues, ensure that the fonts you are using support Thai characters and are correctly installed in your Docker container.

Please proceed with these steps, and let me know if you need further assistance!

@mjjagan Usually, the such problems occur because the fonts used in your input document are not available on the machine where document is converted to PDF. The fonts are required to build document layout. If Aspose.Words cannot find the font used in the document, the font is substituted . This might lead into fonts mismatch and document layout differences due to the different fonts metrics. You can implement IWarningCallback to get notifications when font substitution is performed.
Please see our documentation to learn where Aspose.Words looks for fonts:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/

Regarding HarfBuzzTextShaperFactory. For Windows platforms no additional efforts are required for installing HarfBuzz because Aspose.Words.Shaping.Harfbuzz already includes compiled HarfBuzz library.

For other systems, Aspose.Words.Shaping.Harfbuzz relies on already installed HarfBuzz library. For instance, many Linux-based systems have HarfBuzz installed system-wide by default. If not, there is usually a package available for installing via package manager.

For example in the clear Ubuntu Docker image it is required to additionally install Harfbuzz using command like this:

RUN apt-get update && apt-get install -y libharfbuzz-dev

Thanks Alexey,

Below is the updated docker commands

FROM mcr.microsoft.com/dotnet/aspnet:8.0

# Install dependencies
RUN sed -i 's/^Components: main$/& contrib/' /etc/apt/sources.list.d/debian.sources
RUN apt-get update \
    && apt-get install -y \
        libfreetype6 \
        libfontconfig1 \
        libharfbuzz-dev \
        libharfbuzz0b \
        libgif-dev autoconf libtool automake build-essential gettext libglib2.0-dev libcairo2-dev libtiff-dev libexif-dev \
		apt-utils libgdiplus libc6-dev \
        ttf-mscorefonts-installer fontconfig \
	&& fc-cache -f -v \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY ./dist /app

I tried installing libharfbuzz-dev and now mailmerge works but still Thai document is not rendered correctly.

MailMerged document -
mailMerge after harfbuzz.pdf (282.9 KB)

I have the below line before save

document.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;
document.Save(ms, GetSaveOptions(generationModel.OutputFormat));

@mjjagan Have you tried implementing IWarningCallback to check whether fonts required for rendering are available in your environment?

Thaks Akelexy.

I added the WarningCallback code as below.

FontSettings fontSettings = new FontSettings();
fontSettings.SubstitutionSettings.DefaultFontSubstitution.DefaultFontName = "Arial";
fontSettings.SubstitutionSettings.FontInfoSubstitution.Enabled = true;

// Original font metrics should be used after font substitution.
document.LayoutOptions.KeepOriginalFontMetrics = true;

// We will get a font substitution warning if we save a document with a missing font.
document.FontSettings = fontSettings;

RenderDocument(document);

DocumentData documentData = SaveDocumentData(document);

foreach (WarningInfo info in warningCollector)
{
	if (info.WarningType == WarningType.FontSubstitution)
	{
		_logger.LogWarning("DocGen WarningInfoCollection - {Description}", info.Description);
	}
}

I see the below fontSubstution warnings in server


So, after mailmerge and fontSubstution, the resultant file has some unrecognized characters like below

how to fix this?

Do we need to install all the needed fonts (‘SimSun-ExtB’, ‘Palatino Linotype’, ‘Tahoma’, ‘Leelawadee UI’, ‘Microsoft Sans Serif’) in Linux server?

Few fonts are already installed through docker command

ttf-mscorefonts-installer fontconfig

@mjjagan Yes, for proper document rendering the required fonts should be provided. The substitution fonts might not contain the requested glyphs, in such case missed glyph is rendered instead of actual glyph.

Thanks for the suggestion, Thai PDF is rendering fine after I installed Thai fonts through docker

RUN apt-get update && apt-get install -y xfonts-thai

1 Like