Special Character Getting Truncated while HTML to PDF Conversion

cba_ecs · October 23, 2013, 8:14pm

Hi,

We have recently brought the complete suite of Aspose DLL for .net.

We are converting HTML to PDF using aspose PDF with following code. Special characters are getting trucated in this coversion from HTML to PDF. Issue here is that HTML are getting build on run time from data from database and we could not parse either data from database or HTML content to filter or handle these special characters as it would degrade performance of overall operation.

We are using UTF8 encoding for generating HTML.

Content as visible in HTML: -<$2 million Referrals

Content as visible in PDF: -

Please suggest how we could resolve this issue for scenario mentioned.

private void ExportToPDF(string strTemplateRead)
{
try
{
//Encoding encoding = Encoding.GetEncoding(65001);

// Instantiate an object PDF class
Aspose.Pdf.Generator.Pdf pdf = new Aspose.Pdf.Generator.Pdf();
// add the section to PDF document sections collection
Aspose.Pdf.Generator.Section section = pdf.Sections.Add();

// Read the contents of HTML file into StreamReader object
StreamReader r = File.OpenText(strTemplateRead);
//StreamReader r = new StreamReader(Response.GetResponseStream(), encoding);

pdf.HtmlInfo.CharSet = "UTF-8";

//Create text paragraphs containing HTML text
Aspose.Pdf.Generator.Text text2 = new Aspose.Pdf.Generator.Text(section, r.ReadToEnd());
// enable the property to display HTML contents within their own formatting
text2.IsHtmlTagSupported = true;
//Add the text paragraphs containing HTML text to the section
section.Paragraphs.Add(text2);
Response.ClearHeaders();
Response.ContentType = "application/pdf";
Response.Clear();
Response.ContentEncoding = Encoding.Default;
Response.AppendHeader("Content-Disposition", "attachment");

pdf.Save("NewDoc.pdf", Apdf.SaveType.OpenInBrowser, Response);
r.Close();
}
#region Exceptions
catch (Exception notsupex)
{
new CustomExceptionHandler("ReportsPDF - ExportToPDF - Template : Not Supported Exception :", notsupex).WriteLog(HttpContext.Current.User.Identity.Name);
}
#endregion
}

Regards,

Dinesh

codewarior · October 24, 2013, 2:48am

Hi, Luke,

Thanks for using our products.

I have tested the scenario and I am able to reproduce the same problem. For the sake of correction, I have logged it in our issue tracking system as PDFNEWNET-35954. We will investigate this issue in detail and will keep you updated on the status of a correction.

We apologize for your inconvenience.

tilal.ahmad · June 1, 2014, 11:54pm

Hi Luke,

Thanks for your patience. We have rechecked the issue and found It’s not a bug. Please pay attention to the fact that in HTML special char < is treated as markup symbol. So, to get the sample working as intended it’s necessary to replace it with special HTML-sequence '<'.

So, relevant sample’s code line must look like

Aspose.Pdf.Generator.Text text2 = new Aspose.Pdf.Generator.Text("-&lt;$2\nmillion Referrals");

In such case code works as intended (please refer to the attached document - from_corrected_source_HTML.pdf).

Please feel free to contact us for any further assistance.

Best Regards,

aspose.notifier · July 9, 2014, 8:00am

The issues you have found earlier (filed as PDFNEWNET-35954) have been fixed in Aspose.Pdf for .NET 9.4.0.

This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

tilal.ahmad · August 7, 2014, 12:21am

Hi Luke,

In addition to above reply, please note we have enhanced our DOM approach of HTML to PDF conversion and now it supports above HTML scenario. New HTML engine mimics HTML syntax reading as per browser. Hopefully using new approach will resolve the issue.

Best Regards,