Aspose software's IVS encoding format problem

We are now using Aspose software to convert Word documents to PDF files in our systems. We have noticed that when using Aspose software to output PDF files, the software is unable to correctly read characters in the IVS encoding format.
The IVS encoding format is a combination of the U+ code of encoded Chinese characters and variant selection symbols, which is used to define and identify variant characters of encoded Chinese characters, allowing unencoded variant characters to be associated with the encoded Chinese characters and encoded within the ISO/IEC 10646 system, while preserving the glyph of the variant characters. Find more IVS/UVS information please go to the website: cmap - Character To Glyph Index Mapping Table (OpenType 1.9.1) - Typography | Microsoft Learn

@songweili
Please attach an example of a pdf document with this encoding, which is not displayed properly by the library.

Below is our case study, you can see Chinese characters in the file, they are followed by square spaces, this may be due to the unique character encoding of IVS. Another file is shown the encoding of the example word.

Testingdata information.pdf (13.1 KB)

Testingdata.pdf (42.6 KB)

@songweili
If I understand you correctly, you have converted a Word document to a PDF document. In PDF documents obtained in this way, when opening them in Acrobat, do you see squares instead of some symbols?

You are right; I didn’t explain clearly. When we converted the document to a PDF and opened it with Acrobat or other software, a square appeared. This square is not replacing any specific symbol. It’s because we tried to output an IVS character. IVS characters have two parts in their encoding (for example, U+36C7 E0101), whereas ordinary characters usually have just one part (like U+36C7). We believe the software treats the “E0101” part as a separate symbol, which causes the display issue.

@songweili
You have worked with the Aspose.Words product and I will move your question to the appropriate section of the forum.

@songweili Could you please provide the original Word document here for testing?

Testingdata.docx (12.9 KB)

sorry for late reply, the original Word document is uploaded.
@vyacheslav.deryushev

@songweili Please try to use following code before saving the document:

doc.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;

After using the code you provided, the output PDF does not display the IVS characters correctly even though the square symbols no longer appear.
Testingdata(1).pdf (32.9 KB)

@Cally_Chong The MingLiU_MSCS font is used in your document, but this font is not available on my side. Could you please attach this font here for testing? Font substitution might affect the glyphs appearance in the output document.

The font file of Ming_MSCS is attached. Thanks for your help.
Ming_MSCS.zip (155.5 KB)

@Cally_Chong Thank you for additional information. However, name specified in the source document does not match the name of the provided font, so it is substituted anyways. I had to add a custom substitution rule to make Aspose.Words to use the provided font:

FontSettings.DefaultInstance.SetFontsSources(new FontSourceBase[] { new SystemFontSource(), new FolderFontSource(@"C:\Temp\fonts", true) });
FontSettings.DefaultInstance.SubstitutionSettings.TableSubstitution.AddSubstitutes("MingLiU_MSCS", "Ming_MSCS");

Document doc = new Document(@"C:\Temp\in.docx");
doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
doc.WarningCallback = new FontSubstitutionWarningCallback();
doc.Save(@"C:\Temp\out_HarfBuzz.pdf");

out_HarfBuzz.pdf (30.4 KB)

I am sorry that I provided the wrong glyph file to you, I am attaching the correct glyph file MingLiu_MSCS for you to test.
https://drive.google.com/file/d/1snocjNSK8MlT0tdc_SG0-gb34Z-kxcpn/view?usp=sharing

@Cally_Chong Thank you for additional information. With this font the document is rendered the same as in MS Word:
Aspose.Words: out_HarfBuzz.pdf (16.4 KB)
MS Word: ms.pdf (92.9 KB)

Thank you for your assistance. However, we are still unable to solve the problem of IVS characters not being displayed correctly. We have ensured that the correct glyph file (MingLiu_MSCS) is used, and we have applied the code snippet provided earlier:
doc.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;

The issue persists despite these steps. Here are the details of our current setup:

  • Operating System: Linux
  • Aspose Version: 23.5.0

We would like to inquire if there are any additional settings or configurations required to ensure the correct display of IVS characters in our environment.

@Cally_Chong Unfortunately, I cannot reproduce the problem on my side. here is the output produce on my side: out_linux.pdf (16.4 KB)
Here is my test Dockerfile:

FROM mcr.microsoft.com/dotnet/runtime:7.0 AS base
WORKDIR /app
RUN apt-get update && apt-get install -y libfontconfig1
RUN apt-get update && apt-get install -y libharfbuzz-dev

FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build
WORKDIR /src
COPY ["TestNet.csproj", "."]
RUN dotnet restore "./TestNet.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "TestNet.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "TestNet.csproj" -c Release -r linux-x64 --no-self-contained -o /app/publish

FROM base AS final

WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "TestNet.dll"]

And here is test code:

FontSettings.DefaultInstance.SetFontsSources(new FontSourceBase[] { new FolderFontSource(@"/temp/fonts", true), new FolderFontSource(@"/winfonts", true) });

Document doc = new Document(folder+"in.docx");
doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
doc.WarningCallback = new FontSubstitutionWarningCallback();
doc.Save(folder + "out_linux.pdf");

/temp/fonts contains MingLiU_MSCS font and /winfonts contains standard Windows fonts.