Cannot convert document to pdf in chinese

Hi,

We are trying to convert text ( in different languages) into pdf. Though it is working fine for other languages, it is not able to generate the pdf for chinese language.

Please find my code below:

public MemoryStream ConvertToPDF(string PdfContent)
{
    MemoryStream ms = null;
    License setLicense = new License();
    setLicense.SetLicense("Aspose.Total.lic");
    Document doc = new Document();
    DocumentBuilder docBuilder = new DocumentBuilder(doc);
    docBuilder.InsertHtml(PdfContent);
    ms = new MemoryStream();
    doc.Save(ms, SaveFormat.Pdf);
    ms.Position = 0;
    return ms;
}

Hi,

Thanks for your inquiry. Could you please attach your input document (or html string) you are getting this problem with here for testing? We will investigate the issue on our end and provide you more information.

Best regards,

Thanks for your reply.

Please find the sample text below:

中國哲學書電子化計劃

Sample html string:

Sample -Letter

This agreement contains the terms and conditions upon which XXXX grants to your
company a limited license to use XXXX

中國哲學書電子化計劃

By clicking “I agree”, I,Saurabh, confirm that I have the authority to bind Vendorname1subsite to this agreement and thereby I confirm I have read and understood the terms and conditions of this Sample Letter and agree to the above terms and condition.

Agreement Version Number : V1
Name : Stephen Pie
Title : Test
Email : abc@yahoo.com
Date of Consent : 25 JUN 2014
Member Firm Legal Name : XXXX CA
Site Name : SS
**Site URL : http://google.com/aaaa/ca-Apr28Testsite1/SS
Where the data will be hosted :
Client / Vendor Name : Vendorname1subsite

Hi Awaisv,

I cen see discrepancy while converting Hebrew and Arabic text as well.

Please find the sample text below.

Chinese : 中國哲學書電子化計劃
Hebrew : אָנֹכִי מְצַוֶּה | אֶתְכֶם הַיּוֹם, לְאַהֲבָה אֶת יְיָ | אֱלֹֽהֵיכֶם, וּלְעָבְדוֹ
Arabic : جامعة الدول العربية هي
English: Sample Text

The output text in pdf is as below:

Chinese : ����������
Hebrew : ודבעלו ,םכיהלא | יי תא הבהאל ,םויה םכתא | הוצמ יכנא
Arabic : يه ةيبرعلا لودلا ةعماج
English: Sample Text

Please find the html string below while converting the text to pdf:

Company Agreement - Supplemental Engagement Letter

This agreement contains the terms and conditions upon which XXXX grants to your
company a limited license to use XXXX Central - External

Chinese : 中國哲學書電子化計劃

Hebrew : אָנֹכִי מְצַוֶּה | אֶתְכֶם הַיּוֹם, לְאַהֲבָה אֶת יְיָ | אֱלֹֽהֵיכֶם, וּלְעָבְדוֹ

Arabic : جامعة الدول العربية هي

English: Sample Text

By clicking “I agree”, I,Saurabh Saha, confirm that I have the authority to bind Vendorname1subsite to this agreement and thereby I confirm I have read and understood the terms and conditions of this Supplemental Engagement Letter and agree to the above terms and condition.

Agreement Version Number : V1
Name : Stephen Pie
Title : Test
Email : abc@yahoo.com
Date of Consent : 25 JUN 2014
Member Firm Legal Name : XXXX CA
Site Name : SS
Site URL : http://google.com/audit/ca-Apr28Testsite1/SS
Where the data will be hosted :
Client / Vendor Name : Vendorname1subsite
Company Agreement - Supplemental Engagement Letter

This agreement contains the terms and conditions upon which XXXX grants to your
company a limited license to use XXXX Central - External

Chinese : 中國哲學書電子化計劃

Hebrew : אָנֹכִי מְצַוֶּה | אֶתְכֶם הַיּוֹם, לְאַהֲבָה אֶת יְיָ | אֱלֹֽהֵיכֶם, וּלְעָבְדוֹ

Arabic : جامعة الدول العربية هي

English: Sample Text

By clicking “I agree”, I,Saurabh Saha, confirm that I have the authority to bind Vendorname1subsite to this agreement and thereby I confirm I have read and understood the terms and conditions of this Supplemental Engagement Letter and agree to the above terms and condition.

Agreement Version Number : V1
Name : Stephen Pie
Title : Test
Email : abc@yahoo.com
Date of Consent : 25 JUN 2014
Member Firm Legal Name : XXXX CA
Site Name : SS
Site URL : http://google.com/audit/ca-Apr28Testsite1/SS
Where the data will be hosted :
Client / Vendor Name : Vendorname1subsite

Hi,

Thanks for your inquiry. Please find attached a couple of HTML files and corresponding output PDF files here with this post. These PDFs were produced using Aspose.Words for .NET 14.5.0 using the following code snippet:

Document doc = new Document();
DocumentBuilder docBuilder = new DocumentBuilder(doc);
docBuilder.InsertHtml(File.ReadAllText(MyDir+ "html2.htm", Encoding.UTF8));
doc.Save(MyDir + @"out2.pdf");

Best regards,

Hi Awais,

Thanks for your reply.

Actually we are NOT saving the htmlstring in any file in the server.Instead we are fetching the htmlstring from a richtexteditor(in Sharepoint) and convert the string in pdf using the function i posted earlier.

So in our case, we we won’t be able to use File.ReadAllText() method.

Would you please let me know how directly I can convert the input htmlstring to Encoding.UTF8 and pass it to docBuilder.InsertHtml() and generate the pdf in my function.

Appreciate your kind help.

We are using Aspose version 10.3.0.0

We are using the following function while user clicks “Save PDF” button from UI and the pdf is getting saved to user desktop.

public void DownloadFile(string htmlText, string strAgreementTxtKey, bool isSel)
{
    try
    {
        string strHttpResCT = "application/pdf";
        string strHttpHdrCD = string.Empty;
        if (isSel)
            strHttpHdrCD = "attachment; filename=SELConsentDocument.pdf";
        else
            strHttpHdrCD = "attachment; filename=TOUConsentDocument.pdf";
        if (!string.IsNullOrEmpty(htmlText))
        {
            htmlText = XmlConvert.DecodeName(htmlText);
            using (MemoryStream ms = ConvertToPDF(htmlText))
            {
                if (ms != null)
                {
                    HttpContext.Current.Response.Buffer = true;
                    HttpContext.Current.Response.ContentType = strHttpResCT;
                    HttpContext.Current.Response.Clear();
                    HttpContext.Current.Response.AddHeader("Content-Disposition", strHttpHdrCD);
                    byte[] bytesInStream = ms.ToArray();
                    HttpContext.Current.Response.BinaryWrite(bytesInStream);
                }
            }
            HttpContext.Current.Response.Flush();
            HttpContext.Current.Response.End();
            HttpContext.Current.Response.Clear();
            HttpContext.Current.Response.ClearHeaders();
            HttpContext.Current.Response.ClearContent();
        }
        else
        {
            //Log in event viewer that cache was empty 
        }
    }
    catch (Exception ex)
    {
        Logger.Write(new ExceptionLogEntry(ex.Message));
    }
}

Hi Awais,

Just for your information. I have tried the same code you have shared with me with Aspose version 10.3.0.0.

But is didn’t work for me. Is is related to the version of Aspose? Need your kind guidance.

public MemoryStream ConvertToPDF(string PdfContent)
{
    MemoryStream ms = null;
    try
    {
        SPSecurity.RunWithElevatedPrivileges(delegate ()
        {
            string strHtmFile = SPUtility.GetGenericSetupPath(@"Template\Layouts\") + "Mydir\\temp.htm";
            File.WriteAllText(strHtmFile, PdfContent);
            License setLicense = new License();
            setLicense.SetLicense("Aspose.Total.lic");
            Document doc = new Document();
            DocumentBuilder docBuilder = new DocumentBuilder(doc);
            docBuilder.InsertHtml(File.ReadAllText(strHtmFile, Encoding.UTF8));
            // docBuilder.InsertHtml(PdfContent);
            ms = new MemoryStream();
            doc.Save(ms, SaveFormat.Pdf);
            ms.Position = 0;
        });
    }
    catch (Exception ex)
    {
        Logger.Write(new ExceptionLogEntry(sErrMsg));
    }
}

Hi,

Thanks for your inquiry. You are using a very old version of Aspose.Words, we encourage you to use the latest version of Aspose.Words as it contains newly introduced features, enhancements and fixes to the issues that were reported earlier. So, I would suggest you please download the latest version 14.5.0 from the following link:

https://downloads.aspose.com/words/net

I hope, this helps.

Best regards,

We have downloaded the dll of Aspose.Words(14.5.0.0) from the url you have share with me and ran our modified code.

It worked fine with Chinese, Arabic and Hebrew languages. However, still I can see issue with the following languages:

  • Hindi
  • Armenian
  • Bengali
  • Georgian
  • Thai

Can you please guide me on this?

Please find the attached document with all our finding after using the latest version and modified code.

Hi,

Thanks for your inquiry. I am afraid, I was unable to reproduce this issue on my side when using Aspose.Words for .NET 14.5.0. It would be great if you please create a standalone runnable console application that helps us reproduce your problem on our end and attach it here for testing. As soon as you get this simple application ready, we’ll start further investigation into your issue and provide you more information.

Best regards,

Hi,

Please find the console application with the input and output file attached herewith.

I am using the Aspose.Words 14.5.0.0 (Aspose.Words.dll under net3.5_ClientProfile_AuthenticodeSigned folder) for this console app.

Please find the code below:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using Aspose.Words;

namespace AsposeDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            ConvertToPDF();
        }

        public static void ConvertToPDF()
        {
            MemoryStream ms = null;
            try
            {
                Console.WriteLine("Application Started..Please wait");
                License setLicense = new License();
                Document doc = new Document();
                string exe_path = System.Reflection.Assembly.GetEntryAssembly().Location;
                exe_path = exe_path.Substring(0, exe_path.LastIndexOf('\\') + 1);
                DocumentBuilder docBuilder = new DocumentBuilder(doc);
                docBuilder.InsertHtml(File.ReadAllText(exe_path + "pdfContent.htm", Encoding.UTF8));
                ms = new MemoryStream();
                doc.Save(ms, SaveFormat.Pdf);
                ms.Position = 0;
                Console.WriteLine("File getting created...");
                using (FileStream file = new FileStream("Output.pdf", FileMode.Create, FileAccess.Write))
                {
                    ms.WriteTo(file);
                }
                Console.WriteLine("File created successfully...");
                Console.ReadLine();
            }
            catch (Exception ex)
            {
            }
        }
    }
}

Hi,

Thanks for the additional information. I have generated a PDF file using your code and attached it here for your reference. This PDF was generated over Windows 7 machine on my side. Could you please share the details of the development environment (e.g. OS, .NET Framework versions) of the machine you’re getting this problem on? Does this problem also occur when saving to Word formats such as DOCX or DOC?

Best regards,

Hi,

Please find the server details below:

Windows Server® 2008 Standard
Version 6
.Net Framework 3.5

I have saved the file in docs format and same result occured.

However, as you have mentioned, I have checked the same code from my local desktop(Win 7 && .Net 4.0) and it worked from there.

So it seems , Aspose verion 14.5.0.0 works properly with particluar OS and .Net version?

Can you please guide me on the OS and .net version?

Saurabh

Hi Saurabh,

Thanks for the additional information. It seems problem only occurs when opening Word document with Microsoft Word 2010. In MS Word 2010, could you please select Hindi text and choose “Mangal” font and see if it resolves this problem? Likewise, select Bengali text and format it with “Vrinda” font. Also, format Thai text with “Angsana New”. On the other hand this problem does not occur when viewing with Microsoft Word 2013. This does not seem to be an issue with Aspose.Words. If we can help you with anything else, please feel free to ask.

Best regards,

Thanks for your reply.

However, our requirement is to save the string in PDF format NOT is docx format.Please refer my earlier reply. So I would like to know the OS version and .Net framework version which are required to get Aspose 14.5.0.0 full funtionality to get the PDF with proper text/language.

Saurabh

Hi Saurabh,

Thanks for your inquiry. We were still unable to reproduce this issue on our end when using Aspose.Words for .NET 14.6.0. Please create a standalone runnable console application (with complete source code) that helps us reproduce your problem on our end and attach it here for testing. As soon as you get this simple application ready, we’ll start further investigation into your issue and provide you more information. Thanks for your cooperation.

Best regards,

Hi,

As I have already mentioned that I am facing this issue while I am ruuning the application from my dev (and production) servers with the following configurations.

Server Details:
Windows Server® 2008 Standard Version 6
.Net Framework 3.5

I would like to get the following information from your end:

  • What is the minimum requirement of OS version to get the complete PDF using Aspose 14.5.0.0?
  • What is the minimum requirement of .Net Framework version to get the complete PDF using Aspose 14.5.0.0?

Using the above input I would upgrade my dev (and production) servers accordingly.

Saurabh

Hi Saurabh,

Thanks for your inquiry. Please make sure that the required fonts are all installed on your machines. Aspose.Words requires TrueType fonts when rendering documents to fixed-page formats (PDF, XPS or SWF):
https://docs.aspose.com/words/net/using-truetype-fonts/

Could you please run the code from the following article and see if it reports any missing fonts. If yes, please install them. I hope, this helps:
https://docs.aspose.com/words/net/manipulating-and-substitution-truetype-fonts

Moreover, Aspose.Words supports the environment you have mentioned. Please refer to the following article for more details:
https://docs.aspose.com/words/net/system-requirements/

Best regards,

Hi Awais,

Thanks for your reply. I did what you said but in vain. However I must insist that it is related to the Culture and not with the fonts. All other languages are successfully replicated in pdf except Thai, Bengali, Hindi, Armenian and Georgian. The output pdf was same as I had shared with you earlier.

Please advise.

Please find the updated console application:


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using Aspose.Words;
using Aspose.Words.Saving;

namespace AsposeDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            ConvertToPDF();
        }

        public static void ConvertToPDF()
        {
            MemoryStream ms = null;
            try
            {
                Console.WriteLine("Application Started..Please wait");
                License setLicense = new License();
                Document doc = new Document();
                HandleDocumentWarnings callback = new HandleDocumentWarnings();
                PdfSaveOptions saveOptions = new PdfSaveOptions();
                saveOptions.WarningCallback = callback;
                string exe_path = System.Reflection.Assembly.GetEntryAssembly().Location;
                exe_path = exe_path.Substring(0, exe_path.LastIndexOf('\\') + 1);
                DocumentBuilder docBuilder = new DocumentBuilder(doc);
                docBuilder.InsertHtml(File.ReadAllText(exe_path + "pdfContent.htm", Encoding.UTF8));
                ms = new MemoryStream();
                doc.Save(ms, saveOptions);
                ms.Position = 0;
                Console.WriteLine("File getting created...");
                using (FileStream file = new FileStream("Output.pdf", FileMode.Create, FileAccess.Write))
                {
                    ms.WriteTo(file);
                }
                Console.WriteLine("File created successfully...");
                Console.ReadLine();
            }
            catch (Exception ex)
            {
            }
        }
    }
}

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Aspose.Words;

namespace AsposeDemo
{
    class HandleDocumentWarnings : IWarningCallback
    {
        /// 
        /// Our callback only needs to implement the "Warning" method. This method is called whenever there is a
        /// potential issue during document procssing. The callback can be set to listen for warnings generated during document
        /// load and/or document save.
        /// 
        public void Warning(WarningInfo info)
        {
            // We are only interested in fonts being substituted.
            if (info.WarningType == WarningType.FontSubstitution)
            {
                Console.WriteLine("Font substitution: " + info.Description);
            }
        }
    }
}