Hi Alexei,
Thanks for your inquiry. Your query is related to Aspose.Pdf. I am moving this forum thread to Aspose.Pdf forum. My colleagues from Aspose.Pdf team will reply you shortly.
You may also use Aspose.Words to insert an HTML fragment or whole HTML document into Aspose.Words DOM and convert the final document to Pdf file format. Please check the following code example for your kind reference.
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.InsertHtml("
doc.Save(MyDir + "Out.pdf");
<!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-GB</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val="–"/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–><!–[if gte mso 10]>
<![endif]–>
Hi Alexei,
Thanks for contacting support.
In order to display multibyte/special characters inside PDF file, you need to use the font which supports unicode characters i.e. Arial Unicode MS. Please try using the following code snippet to generate correct output. For your reference, I have also attached the resultant PDF generated over my end. We are sorry for your inconvenience.
Code:
var pdf = new Aspose.Pdf.Generator.Pdf();
string html1 = "<div>TEST_<font name='Arial'> 中文文档</font>资料_TEXT</div>";
// creat text object
Aspose.Pdf.Generator.Text text1 = new Aspose.Pdf.Generator.Text(html1);
// indicate to render HTML tags inside PDF
text1.IsHtmlTagSupported = true;
// use TextInfo style
text1.UseTextInfoStyle = true;
// specify the font for PDF contents
text1.TextInfo.FontName = "Arial Unicode MS";
// add text paragraph to paragraphs collection of section object
pdf.Sections.Add().Paragraphs.Add(text1);
// embed font inside PDF file
pdf.SetUnicode();
// save PDF file
pdf.Save(@"c:\pdftest\SpecialCharacters_test2.pdf")
This solution in inacceptable for us:
This will mean a lot of work on our side to parce fonts in html files.When we switched to Aspose libraries, we expected that any legitimate html code will be migrated correctly. Aspose.Words does it seemlessly, without any work on our side.
Could you fix this issue on your side?
Thank you,
Alexei
Hi Alexei,
Hi Nayyer,
Hi Alexei,
Aspose.Pdf for .NET and Aspose.Words for .NET are two separate APIs and both have separate/different document rendering engines. Both APIs use individual techniques to render objects inside the targeted file format.
However, in order to render/display Unicode characters inside PDF file without specifying the font information, you may try using the Document Object Model (DOM) of Aspose.Pdf namespace. But when using this approach, the HTML tags are are rendered/transformed accordingly and they appear as native HTML tags. For the sake of correction, we already have logged the requirement of “parsing HTML tags when using Aspose.Pdf namespace”, in our issue tracking system as PDFNEWNET-35804. The development team is looking into the details of this requirement and will keep you updated on the status of a correction. We are sorry for this inconvenience.
Document doc = new Document("c:/pdftest/Paysage.pdf");
string html1 = "<div>TEST_中文文档资料_TEXT</div>";
doc.Pages.Add().Paragraphs.Add(new Aspose.Pdf.Text.TextFragment(html1));
doc.Save("c:/pdftest/UniCodeTextDOM.pdf");
yes, this may be a work around, but we prefer to use a Pdf.Generator's method and will expect that eventually it will convert all multibyte characters into correct pdf file.
pdf.ParseToPdf(html);
Looking forward to see this feature implemented,
Thank you,
Alexei
Hi Alexei
Thanks for sharing the details.
I have logged an investigation ticket in our issue tracking system as PDFNEWNET-36825, and the development team will further look into this matter to see if the font/multi-byte text related problem can be fixed in ParseToPdf(..)
method. We will further look into the details of this issue and will keep you updated on the status of a correction.
We apologize for any inconvenience caused.
The issues you have found earlier (filed as PDFNEWNET-35804) have been fixed in Aspose.Pdf for .NET 9.5.0.
This message was posted using Notification2Forum from Downloads module by Aspose Notifier.
(8)
Hi Alexei,
Hi Alexei,
The issues you have found earlier (filed as PDFNEWNET-36825) have been fixed in Aspose.Pdf for .NET 9.6.0.