Unicode Characters are partly broken while exporting from Word to PDF

Hello,

we have a problem saving certain unicode sequences A̋A̋A̋ to PDF. We are unsure why.
The preview here is also broken but see attachted image:

If we save as a docx file and open it in Word, it’s ok. Only the direct export into a pdf is wrong.
We tested Aspose Words 23.10.0 as well as the 24.9.0 version.

Any ideas what may be wrong?

var doc = new Aspose.Words.Document();

DocumentBuilder builder = new DocumentBuilder(doc);
builder.Font.Name = "Liberation Serif";
builder.Writeln("ABC");
builder.Writeln("A̋A̋A̋");

Aspose.Words.Saving.PdfSaveOptions saveOptions = (Aspose.Words.Saving.PdfSaveOptions)Aspose.Words.Saving.SaveOptions.CreateSaveOptions(Aspose.Words.SaveFormat.Pdf);
saveOptions.ExportDocumentStructure = true;
saveOptions.Compliance = Aspose.Words.Saving.PdfCompliance.PdfA4;
saveOptions.EmbedFullFonts = true;
saveOptions.DisplayDocTitle = true;
saveOptions.FontEmbeddingMode = Aspose.Words.Saving.PdfFontEmbeddingMode.EmbedAll;           

var outputStream = new MemoryStream();
doc.Save(outputStream, saveOptions);
outputStream.Seek(0, SeekOrigin.Begin);
return File(outputStream, "application/pdf");

@mafaust The problem occurs because by default MS Word uses font open type features. Aspose.Words.Shaping.Harfbuzz package provides support for OpenType features in Aspose.Words using the HarfBuzz text shaping engine. You should enabling open type features to get the expected result. To achieve this you should add reference to Aspose.Words Shaping Harfbuzz plugin and use the following code to convert your document:

var doc = new Aspose.Words.Document();
doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;

DocumentBuilder builder = new DocumentBuilder(doc);
builder.Font.Name = "Liberation Serif";
builder.Writeln("ABC");
builder.Writeln("A̋A̋A̋");

Aspose.Words.Saving.PdfSaveOptions saveOptions = (Aspose.Words.Saving.PdfSaveOptions)Aspose.Words.Saving.SaveOptions.CreateSaveOptions(Aspose.Words.SaveFormat.Pdf);
saveOptions.ExportDocumentStructure = true;
saveOptions.Compliance = Aspose.Words.Saving.PdfCompliance.PdfA4;
saveOptions.EmbedFullFonts = true;
saveOptions.DisplayDocTitle = true;
saveOptions.FontEmbeddingMode = Aspose.Words.Saving.PdfFontEmbeddingMode.EmbedAll;

doc.Save(@"C:\Temp\out.pdf", saveOptions);

out.pdf (669.6 KB)

Thanks for your fast reply. This worked like a charm. Hint: Must be defined before the content is imported.

1 Like

Dear Alexey,

the Windows version works fine, but I’m having trouble with the Linux Container Version.
I’ve tried but this produces bad output.
What version of HarfBuzz should I use for Aspose.Words.Shaping.HarfBuzz version 23.10.0 ?

Thanks, Martin

@mafaust For Windows platforms no additional efforts are required for installing HarfBuzz because Aspose.Words.Shaping.Harfbuzz already includes compiled HarfBuzz library.

For other systems, Aspose.Words.Shaping.Harfbuzz relies on already installed HarfBuzz library. For instance, many Linux-based systems have HarfBuzz installed system-wide by default. If not, there is usually a package available for installing via package manager.

For example in the clear Ubuntu Docker image it is required to additionally install Harfbuzz using command like this:

RUN apt-get update && apt-get install -y libharfbuzz-dev

actual, it was an old liberation font from debian that caused the trouble ;-(

@mafaust So the problem does not occur if use the new version of Liberation font?