HTML to PDF with Chinese characters

Hi, I am evaluating Aspose.Words for purchase. I have not been able to get it to work with Chinese (Simplified) characters in an HTML document which is then saved in PDF format. Rtf seems to work fine. See attached zip file with the source and the two outputs.

Hi

Thanks for your request. I think you have the same issue as described in this thread:

Best regards.

No, this is not the same. There are no fonts involved. The HTML renders fine in all browsers and converts properly to Rtf.

Hi

Thank you for additional information. I managed to reproduce the problem on my side. You will be notified as soon as it is fixed. As a workaround, you can specify font in your HTML. See the following example:


筛选确认

研究:ABC 867-5309



用户名:Smith, John L

Hope this helps.

Best regards.

The workaround does not work for me. I still get the blocks in the PDF instead of the characters.

Hi

Have you installed PMingLiU fonts as I described here?

Best regards.

Thanks, it works after installing that font. However, that’s not a good long-term solution as we add additional languages to the system. Please let me know when the issue is resolved so that it will “just work” like it does with Rtf.

Actually, this occurs, because PDF conversion requires reading fonts. Since in your HTML fonts are not specified, the default font is used during conversion (Times New Roman). This works fine during conversion to RTF, because MS Word automatically detects language and specify necessary font. In case of PDF conversion, subset of font is stored in the PDF file, and since Times New Roman font does not contain Chinese characters they are replaced with no-character symbol.

Best regards.

Thanks, makes sense. Two more minor questions:

  1. Why doesn’t this same workaround work with the Arial Unicode MS font which is also installed on the same computer?
  2. Why is it required to be a span tag? I would have assumed that applying the style to the body tag would have done the same thing.

Hi

Thanks for your inquiry.

  1. It seems there is a problem during reading font-family, when font name includes whitespaces. In case only Arial is read, and Arial does not contain Chinese characters. As a workaround, you can use element, as shown in the following example:


筛选确认

研究:ABC 867-5309

  1. The problem occurs because, currently, Aspose.Words does not support inheriting styles from parent elements.

Currently, Aspose.Words expects that font formatting is set in or elements, formatting of paragraph – in

or

elements etc…

Best regards.

Hello!

Thank you for your patience. I have addressed the issue related to specifying font-family in CSS. Strictly speaking this code is incorrect:

If a font name contains spaces it should be quoted. The whole style value is enclosed in double quotation marks so we should use single ones:

You can find all the rules here:

http://www.w3.org/TR/CSS2/fonts.html#propdef-font-family

On the other hand, Microsoft Word and Internet Explorer recognize CSS declaration with this format deviation. So I suggested doing the same. In the next version Aspose.Words will read both cases as expected.

Alexey’s idea will work but it’s not a good solution since element in HTML is deprecated:

In most cases compliant code is preferable. Otherwise it could lead to difficulties with other software.

Regards,

The issues you have found earlier (filed as 9634) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

The issues you have found earlier (filed as 9624) have been fixed in this update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

The issues you have found earlier (filed as WORDSNET-2021) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(36)