Save DOCX to PDF character difference

Hello,
If you have a PDF document, you don’t have to type the text to read it.

Thanks in advance

1P272pStr18.docx (44.1 KB)

@benestom The problem is not reproducible on my side using the latest 25.7 version of Aspose.Words. Could you please attach your problematic output document here for our reference?

I downloaded the latest version you’re using and the result is the same, I just probably described it incorrectly in the text, so I’d rather send a screenshot with a significant difference.

@benestom Unfortunately, I cannot reproduce the problem on my side using the latest 25.7 version of Aspose.Words and the following simple code:

Document doc = new Document(@"C:\Temp\in.docx");
doc.Save(@"C:\Temp\out.pdf");

Here is the produced output: out.pdf (111.0 KB)

Usually, the such problems occur because the fonts used in your input document are not available on the machine where document is converted to PDF. The fonts are required to build document layout. If Aspose.Words cannot find the font used in the document, the font is substituted . This might lead into fonts mismatch and document layout differences due to the different fonts metrics. You can implement IWarningCallback to get notifications when font substitution is performed.
Please see our documentation to learn where Aspose.Words looks for fonts:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/

In your output PDF it is wrong. It changes see the picture. On the left Word in the right PDF

The font is Arial, which the server definitely contains. Aspopose probably loads the font incorrectly when creating a grouped scheme.

If I generate the PDF in the WORD application it is fine

@benestom Could you please attach PDF document produced on your side?

1P272pStr18_257Wrod.pdf (171.4 KB)

This is a PDF generated in MS Word.

I have the same document generated in Aspose as you.

@benestom MS Word on my side produces the same output as Aspose.Words does: ms.pdf (189.2 KB)

What preferred editing language is specified in MS Word on your side?

You probably can’t simulate it because I have the Czech language pack in ms word.

I found another document that contains the same image but this problem doesn’t appear here. Do you know what it could be? at first glance they look the same but the result is ok for me
4P272pStr18_257.docx (65.8 KB)

@benestom The problem is not reproducible with 4P272pStr18_257.docx document on my side too. The images in the documents are actually different. in 1P272pStr18.docx image is in WMF format, and in 4P272pStr18_257.docx image is in EMF format.
I tried setting Czech language as default editing language on my side, but still MS Word shows WMF image like this:

I thought that the bad PDF generation was because I was generating with the older WMF format, but my customer prepared a document for me where there was EMF and it was converted to PDF incorrectly.

You can look at why other EMFs don’t work with Czech names either.

I am sending Word, a PDF generated from Word and a JPEG that shows the differences.

inputETM.docx (46.8 KB)

inputETM.pdf (110.1 KB)

@benestom Here is how your input document looks in MS Word on my side:

It is rendered to PDF the same way as it shown in MS Word. Here is PDF produced from your document using MS Word on my side: ms.pdf (189.4 KB)

EMF is created in MS PowerPoint and inserted into the Document. Everything is created with the Czech language pack. In the document opened in MS Word it is displayed with Czech characters. After exporting from MS Word to PDF, Czech letters are also displayed.

What should I do to distinguish Czech letters, even though you do not see them in the English set

@benestom I have installed Czech language pack in MS Word. Set Czech language as MS Word UI language and as default editing language:

Still metafile looks the same letter is displayed as Ø.

Thanks for checking.

Many EMF files store text as 8bit (ANSI) and the system code page of the computer is used when rendering. “ř” is byte 0xF8 in CP-1250, but in Western European CP-1252 0xF8 is the character “ø”. That’s why my colleague sees ø instead of ř on his EN computer. Setting Options > Language in Office does not affect this - the system “Language for non-Unicode programs” decides.

I have to put it there as “office shapes” then the encoding is fine

@benestom Thank you for additional information. I will consult with out developers and provide you more information once get response.

@benestom There are two possible workarounds we can suggest.

  1. In one of your test document WMF metafile is used. MS Word uses the system encoding to render these metafiles. Aspose.Words always used Win-1252 encoding for rendering metafiles. If you are using .NET Framework version of Aspose.Words you can delegate metafile rendering to GDI+:
Document doc = new Document(@"C:\Temp\in.docx");

PdfSaveOptions saveOptions = new PdfSaveOptions();
saveOptions.MetafileRenderingOptions.RenderingMode = MetafileRenderingMode.Bitmap;

doc.Save(@"C:\Temp\out.pdf", saveOptions);
  1. In another provided test document EMF+ metafile is used. If use EmfPlusDualRenderingMode.Emf mode, text will be rendered properly:
Document doc = new Document(@"C:\Temp\in.docx");

PdfSaveOptions saveOptions = new PdfSaveOptions();
saveOptions.MetafileRenderingOptions.EmfPlusDualRenderingMode = EmfPlusDualRenderingMode.Emf;

doc.Save(@"C:\Temp\out.pdf", saveOptions);

out.pdf (114.4 KB)