We are now using Aspose software to convert Word documents to PDF files in our systems. We have noticed that when using Aspose software to output PDF files, the software is unable to correctly read characters in the IVS encoding format.
The IVS encoding format is a combination of the U+ code of encoded Chinese characters and variant selection symbols, which is used to define and identify variant characters of encoded Chinese characters, allowing unencoded variant characters to be associated with the encoded Chinese characters and encoded within the ISO/IEC 10646 system, while preserving the glyph of the variant characters. Find more IVS/UVS information please go to the website: cmap - Character To Glyph Index Mapping Table (OpenType 1.9.1) - Typography | Microsoft Learn
@songweili
Please attach an example of a pdf document with this encoding, which is not displayed properly by the library.
Below is our case study, you can see Chinese characters in the file, they are followed by square spaces, this may be due to the unique character encoding of IVS. Another file is shown the encoding of the example word.
Testingdata information.pdf (13.1 KB)
Testingdata.pdf (42.6 KB)
@songweili
If I understand you correctly, you have converted a Word document to a PDF document. In PDF documents obtained in this way, when opening them in Acrobat, do you see squares instead of some symbols?
You are right; I didn’t explain clearly. When we converted the document to a PDF and opened it with Acrobat or other software, a square appeared. This square is not replacing any specific symbol. It’s because we tried to output an IVS character. IVS characters have two parts in their encoding (for example, U+36C7 E0101), whereas ordinary characters usually have just one part (like U+36C7). We believe the software treats the “E0101” part as a separate symbol, which causes the display issue.
@songweili
You have worked with the Aspose.Words product and I will move your question to the appropriate section of the forum.
Testingdata.docx (12.9 KB)
sorry for late reply, the original Word document is uploaded.
@vyacheslav.deryushev
@songweili Please try to use following code before saving the document:
doc.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;
After using the code you provided, the output PDF does not display the IVS characters correctly even though the square symbols no longer appear.
Testingdata(1).pdf (32.9 KB)
@Cally_Chong The MingLiU_MSCS
font is used in your document, but this font is not available on my side. Could you please attach this font here for testing? Font substitution might affect the glyphs appearance in the output document.
@Cally_Chong Thank you for additional information. However, name specified in the source document does not match the name of the provided font, so it is substituted anyways. I had to add a custom substitution rule to make Aspose.Words to use the provided font:
FontSettings.DefaultInstance.SetFontsSources(new FontSourceBase[] { new SystemFontSource(), new FolderFontSource(@"C:\Temp\fonts", true) });
FontSettings.DefaultInstance.SubstitutionSettings.TableSubstitution.AddSubstitutes("MingLiU_MSCS", "Ming_MSCS");
Document doc = new Document(@"C:\Temp\in.docx");
doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
doc.WarningCallback = new FontSubstitutionWarningCallback();
doc.Save(@"C:\Temp\out_HarfBuzz.pdf");
out_HarfBuzz.pdf (30.4 KB)
I am sorry that I provided the wrong glyph file to you, I am attaching the correct glyph file MingLiu_MSCS for you to test.
https://drive.google.com/file/d/1snocjNSK8MlT0tdc_SG0-gb34Z-kxcpn/view?usp=sharing
@Cally_Chong Thank you for additional information. With this font the document is rendered the same as in MS Word:
Aspose.Words: out_HarfBuzz.pdf (16.4 KB)
MS Word: ms.pdf (92.9 KB)
Thank you for your assistance. However, we are still unable to solve the problem of IVS characters not being displayed correctly. We have ensured that the correct glyph file (MingLiu_MSCS) is used, and we have applied the code snippet provided earlier:
doc.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;
The issue persists despite these steps. Here are the details of our current setup:
- Operating System: Linux
- Aspose Version: 23.5.0
We would like to inquire if there are any additional settings or configurations required to ensure the correct display of IVS characters in our environment.
@Cally_Chong Unfortunately, I cannot reproduce the problem on my side. here is the output produce on my side: out_linux.pdf (16.4 KB)
Here is my test Dockerfile:
FROM mcr.microsoft.com/dotnet/runtime:7.0 AS base
WORKDIR /app
RUN apt-get update && apt-get install -y libfontconfig1
RUN apt-get update && apt-get install -y libharfbuzz-dev
FROM mcr.microsoft.com/dotnet/sdk:7.0 AS build
WORKDIR /src
COPY ["TestNet.csproj", "."]
RUN dotnet restore "./TestNet.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "TestNet.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "TestNet.csproj" -c Release -r linux-x64 --no-self-contained -o /app/publish
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "TestNet.dll"]
And here is test code:
FontSettings.DefaultInstance.SetFontsSources(new FontSourceBase[] { new FolderFontSource(@"/temp/fonts", true), new FolderFontSource(@"/winfonts", true) });
Document doc = new Document(folder+"in.docx");
doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
doc.WarningCallback = new FontSubstitutionWarningCallback();
doc.Save(folder + "out_linux.pdf");
/temp/fonts
contains MingLiU_MSCS
font and /winfonts
contains standard Windows fonts.
Thanks for your previous help. However, we are still facing issues with IVS characters not displaying correctly in PDF output, despite using the correct MingLiu_MSCS font and applying the code snippet:
doc.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;
We have two specific concerns:
-
Linux Environment: One of our teams is unable to get HarfBuzz working on Linux. Are there additional dependencies or configurations required?
-
Java Environment: Another team using the Aspose.Words Java version also encounters issues with IVS characters in PDF output. Is there a different approach for Java to ensure proper IVS character rendering compared to the .NET implementation?
Aspose.Words.Shaping.Harfbuzz uses P/Invoke technology for invoking unmanaged functions from HarfBuzz.
For Windows platforms no additional efforts are required for installing HarfBuzz because Aspose.Words.Shaping.Harfbuzz already includes compiled HarfBuzz library.
For other systems, Aspose.Words.Shaping.Harfbuzz relies on already installed HarfBuzz library. For instance, many Linux-based systems have HarfBuzz installed system-wide by default. If not, there is usually a package available for installing via package manager.
For example in the clear Ubuntu Docker image it is required to additionally install Harfbuzz using command like this:
RUN apt-get update && apt-get install -y libharfbuzz-dev
The approach for java is the same as for .NET. Advanced typography features are supported by Aspose.Words via Aspose.Words.Shaping.HarfBuzz package.
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-words</artifactId>
<version>24.10</version>
<classifier>jdk17</classifier>
</dependency>
<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-words</artifactId>
<version>24.10</version>
<classifier>shaping-harfbuzz-plugin</classifier>
</dependency>
You should install the above package and modify the code as shown below:
Document doc = new Document("C:\\Temp\\in.docx");
doc.getLayoutOptions().setTextShaperFactory(com.aspose.words.shaping.harfbuzz.HarfBuzzTextShaperFactory.getInstance());
doc.Save(@"C:\Temp\out.pdf");