Chinese font problem in XML file generated by Document.Save as FormatAsposePdf

Hi,
I am glad to find Aspose product to translate word file to PDF format. But there is one problem when chinese fonts content.

I save word to xml file like => doc.Save(“doc-1.xml”, SaveFormat.FormatAsposePdf)
and trace again and again to find a rule.

Here are 2 examples :
ex1 : in word application, if I select chinese font (ex:新細明體) first and input “新細明體”, then the xml tag:segment => <Segment FontName=“新細明體” IsUnicode=“true” Color=“rgb 0 0 0” FontSize=“14”>新細明體 => it is correct to translate to PDF file.

ex2 : in word application, if I select font:TimeNewRoman first and input “新細明體”, then the xml tag:segment => <Segment FontName=“Times New Roman” IsUnicode=“true” Color=“rgb 0 0 0” FontSize=“14”>新細明體 => it is wrong and showes empty in PDF file.

In this 2 examples, users see same content and same font in Word application because Word will auto-select right chinese font for “新細明體” in ex2 (I guest!). But when save to xml file, ex2 seems to select wrong font (TimeNewRoman; not support chinese) and make it empty.

I think this is an ordinary problem in CJK. Please help us. Thanks.

Best regards,
YC Juang.

I think I know what is happening here. MS Word has a special attribute for the text run, which defines what font to use for the characters that belong to Far East languages. It corresponds to Font.NameFarEast property in Aspose.Words. When you type chinese characters in MS Word and the font is set to, for example, Times New Roman then this attribute is implicitly set by MS Word to the name of the font that actually supports far east characters. In MS Word it is Arial Unicode MS.

But unfortunately this attribute is not relayed during transformation. The workaround is to set the font for far east characters explicitly in MS Word. That way the text with chinese characters will be put in a separate run by MS Word and Aspose.Words will make a separate text segment for it in the xml file with the font correctly set. That approach may prove impractical however if you have a text heavily mixed with western and far east words. I suggest proposing the following solution for Aspose.Pdf team to implement - check character code page during import and set the font for particular characters to Arial Unicode MS if the font in containg text segment is not supporting fareast characters. Or maybe they will come up with a better solution.

I am transferring this thread to Aspose.Pdf forum. Let's wait for their answer.

Vladimir's suggestion sounds reasonable. I will try to resolve this problem ASAP.

Hi,
Thanks for your reply. I think it is a good suggestion. But could you do more for that ?

Because not all TimeNewRoman mapping to Arial Unicode MS, it can be changed in every PC, for example, 新細明體 in mine.

So, is it possible to get correct chinese mapping font and make separated segment in XML file ?

Best regards,
YC Juang.

I need to test this issue and contact the Aspose.Words team. Maybe I can provide a property to set the default font for far east languages.

Hi,

It sounds good. Thanks for your support and wish it (set default font) will be finished soon :slight_smile:

Best regards,
YC Juang.

Hi,

Is this problem still pending?
I have the same problem about setting default FontName "Times New Roman" in xml file when converting from html to xml by using the code below.

Aspose.Words.Document asposeDoc = new Aspose.Words.Document("c:\\temp\\test.htm");
asposeDoc.Save("c:\\temp\\test.xml", SaveFormat.AsposePdf);

My goal is generating PDF file properly regardless html contains double byte character inside.

Thanks
Kazu

Sorry we have not found good solution for this problem and the task is postponed. I hope we can resolve this problem soon. The issue is logged as PDFNET-3535.
Currently in order to support your requirement you have to write a piece of code to modify the font setting before saving the Pdf.

Hi Tommy

Thank you for your quick response.
I wrote the code to modify the FontName to "MS PGothic".

====================================================================
// Convert from Html to Xml with Aspose Pdf format
Aspose.Words.Document asposeDoc = new Aspose.Words.Document(this.DocHtmlFileName, Aspose.Words.LoadFormat.Html, null);

// Modify Font Name to TrueType for double-byte characters
for (int i = 0; i < asposeDoc.Styles.Count; i++)
{
Aspose.Words.Style asposeStyle = asposeDoc.Styles[i];
if (asposeStyle.Type == StyleType.Paragraph || asposeStyle.Type == StyleType.Character)
{
asposeStyle.Font.Name = "MS PGothic";
}
}

asposeDoc.Save(docXmlFileName, SaveFormat.AsposePdf);
====================================================================

But still some FontName attribute contains "monospace" in the xml file like below

日本語のテスト

After I convert from xml to pdf , these segments do not apper in pdf file.
Could you point out my code, or let me know other solution.

Thanks in advance
Kazu

Hi,

You should also set IsUnicode=“true”.

Hi

Pdf conversion works for double byte characters!

Thank you
Kazu

Hi,

It is known that each font has a set of supported characters. Sometimes,
users may assign a font to a Segment paragraph which doesn’t support
every character appear in the Segment. In this release, it is possible
to adjust fonts automatically. It will select a proper font to the
Segment paragraph according to its contents(works like MS Word). To turn on this function, please set: Pdf.IsAutoFontAdjusted = true; before Pdf.Save() method. Please download our new release and try it. Thanks.

Best regards.