Word to PDF losses formatting and Fonts

When creating a PDF from a Word document the formatting is lost and the fonts replaced. Samptest.docx (38.5 KB)
test.pdf (106.6 KB)
le files attached

@ito_software_gov_sk_ca I can see the issues in the documents that you attached, but when I run the code for conversion I can’t see any difference. Please see the following example code, I am using Aspose.Words for C# with the version 23.1.0 (and also the package Aspose.Words.Shaping.HarfBuzz, with the same version):

// Load the document.
Document doc = new Document("C:\\Temp\\test.docx");

doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
PdfSaveOptions opt = new PdfSaveOptions();
opt.DownsampleOptions.DownsampleImages = false;
opt.UseAntiAliasing = true;
opt.ImageCompression = PdfImageCompression.Jpeg;

// Save the document
doc.Save("C:\\Temp\\test.pdf", opt);

Please, notice that to get a better result when you convert documents from Word to PDF you should use PdfSaveOptions class and set a ShaperFactory

@ito_software_gov_sk_ca This is the same problem as you have reported in your another thread:
https://forum.aspose.com/t/formatting-lost-and-fonts-replaced/258552/3
The problem occurs because fonts used in your original document are not available in the environment where the document is converted.

FYI: @Eduardo_Canal

Actually, I’m the dev on this one. I applied your code and I have an exception.
image.png (67.0 KB)

@ludo.d did you installed the Nuget Package Aspose.Words.Shaping.HarfBuzz? What are the development environment that you are using?

Yes I did install the version 23.1.0. I’m on Windows 11, VS22.
image.png (38.6 KB)
image.png (26.1 KB)

@ludo.d I was inspecting a bit deep into the issue that you post and I think that your issue could be related to a known issue. Nuget package contains two native harfbuzz.dll compiled for x86 and x64 platforms and special MSBuild script.
Depending on project’s target platform, the MSBuild script chooses necessary native harfbuzz.dll (x86 or x64) and copies them to TargetDir of the project.

This MSBuild script detects project’s target platform using {PlatformTarget} and {Prefer32Bit} parameters of the project. It is mostly OK for all types of projects except ASP.NET WebForms.
Project’s target platform may depend on {Use64BitIISExpress} parameter of the project.
In general, there is no good way to detect actual project’s target platform if {Use64BitIISExpress} set to “Default” and {PlatformTarget} set to “Any CPU”.

The workaround: Please try to set explicit value in {Use64BitIISExpress} = x86 or x64. And REBUILD the project.

Use64BitIISExpress.png (23.4 KB)

Actually this fix does work when using IIsExpress, but it does not work when using Docker. And the app is a .NET6 API

@ludo.d to get the application working when you use docker, please follow the guide about how configure Aspose.Wrods using docker. In the “More Examples” section you will find how to configure the dockerfile to add the libharfbuzz library to the image.

@Eduardo_Canal I followed the steps but I still have the same ecxeption.
image.png (17.0 KB)

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-24886

You can obtain Paid Support services if you need support on a priority basis, along with the direct access to our Paid Support management team.

@ludo.d Could you please provide your Dokerfile here for testing? What base image do you use?
You should note that Aspose.Words.Shaping.Harfbuzz uses P/Invoke technology for invoking unmanaged functions from HarfBuzz.

For Windows platforms no additional efforts are required for installing HarfBuzz because Aspose.Words.Shaping.Harfbuzz already includes compiled HarfBuzz library.

For other systems, Aspose.Words.Shaping.Harfbuzz relies on already installed HarfBuzz library. For instance, many Linux-based systems have HarfBuzz installed system-wide by default. If not, there is usually a package available for installing via package manager.

For example in the clear Ubuntu Docker image it is required to additionally install Harfbuzz using command like this:

RUN apt-get update && apt-get install -y libharfbuzz-dev

Regarding the original issue reporting in the initial post. The rendering difference occurs because fonts used in the original document are not available in the environment where the document is rendered. Most likely your Docker image does not have fonts at all, since the last resort font Fanwood is used for rendering the document. This font is embedded into Aspose.Words dll and is used if no other fonts are available. Please note Aspose.Words needs the fonts used in the document to build document layout upon conversion to PDF. If Aspose.Words cannot find the font used in the document, the font is substituted. This might lead into fonts mismatch and document layout differences due to the different fonts metrics. You can implement IWarningCallback to get notifications when font substitution is performed.
In your case you can copy the required fonts into your Docker image. Please see our documentation to learn how to specify location where Aspose.Words will look for fonts:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/

@alexey.noskov this is the output I got

And here is the docker file

@ludo.d I have tested on my side with a simple console application with the following code:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Write("Test document");

doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;

doc.Save(@"/temp/out.pdf");

and the following simple Dokerfile:

FROM mcr.microsoft.com/dotnet/runtime:6.0 AS base

WORKDIR /app
RUN apt-get update && apt-get install -y libharfbuzz-dev
RUN apt-get update && apt-get install -y libfontconfig1

FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build
WORKDIR /src
COPY ["TestNet6.csproj", "."]
RUN dotnet restore "./TestNet6.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "TestNet6.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "TestNet6.csproj" -c Release -r linux-x64 --no-self-contained -o /app/publish

FROM base AS final

WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "TestNet6.dll"]

And everything works fine.

From your log it looks like Aspose.Words cannot find the libharfbuzz, it tries different paths without success.

If possible, could you please create a simple application that will allow us to reproduce the problem?

Finally I managed to make it work. The problem was Visual studio was not recreating the container so the command get install libharfbuzz was never run :expressionless:
So in the generated PDF the font is still not Calibri, but DejaVuSans. We managed a workaround by setting the font folder like you were suggesting.
The only minor thing now is some bullet point are not render correctly.


The first one is the bullet point in the word document and the second one in the pdf document.

@ludo.d This bullet symbol is in Wingdings font. Looks like this font is not available and Aspose.Words cannot find an alternative font where this symbol presents. So put Windows symbolic fonts (“Symbol”, “Webdings”, “Wingdings”, etc.) into the folder with fonts to get the desired result.

Ok I understand. I’ll try it and let you know.

1 Like

Hi @alexey.noskov It worked, thanks.

1 Like

The issues you have found earlier (filed as WORDSNET-24886) have been fixed in this Aspose.Words for .NET 23.2 update also available on NuGet.