Add Accented Characters in PDF Documents in C# using Aspose.PDF for .NET

Hi,


When I try to generate a PDF with the word “Książęce” in it, it comes out with spaces in the place of certain characters.

See a complete, reproducable code listing
 class Program
{
static void Main(string[] args)
{
string testStr = “Książęce”;
        System.IO.<span style="color:#2b91af;">File</span>.WriteAllBytes(<span style="color:#a31515;">"TestFile.pdf"</span>, SavePdf(testStr));
    }

    <span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><summary></span>
    <span style="color:gray;">///</span><span style="color:green;"> Saves the PDF.</span>
    <span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"></summary></span>
    <span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><param name=</span><span style="color:gray;">"inputHtml"</span><span style="color:gray;">></span><span style="color:green;">The input HTML.</span><span style="color:gray;"></param></span>
    <span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><returns></span><span style="color:green;">A byte array representing the generated PDF</span><span style="color:gray;"></returns></span>
    <span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">byte</span>[] SavePdf(<span style="color:blue;">string</span> inputHtml)
    {
        <span style="color:#2b91af;">Pdf</span> pdf = <span style="color:blue;">new</span> <span style="color:#2b91af;">Pdf</span>();
        <span style="color:#2b91af;">Section</span> section = pdf.Sections.Add();

        section.PageInfo.Margin.Top = 5;
        section.PageInfo.Margin.Left = 5;
        section.PageInfo.Margin.Bottom = 5;
        section.PageInfo.Margin.Right = 5;
        section.PageInfo.PageHeight = Aspose.Pdf.Generator.<span style="color:#2b91af;">PageSize</span>.A4Height;
        section.PageInfo.PageWidth = Aspose.Pdf.Generator.<span style="color:#2b91af;">PageSize</span>.A4Width;

        <span style="color:#2b91af;">Text</span> text = <span style="color:blue;">new</span> <span style="color:#2b91af;">Text</span>(section, inputHtml);
        text.IsHtmlTagSupported = <span style="color:blue;">true</span>;
        section.Paragraphs.Add(text);
        text.IsFitToPage = <span style="color:blue;">true</span>;

        <span style="color:blue;">byte</span>[] pdfBytes;
        <span style="color:blue;">using</span> (<span style="color:#2b91af;">MemoryStream</span> s = <span style="color:blue;">new</span> <span style="color:#2b91af;">MemoryStream</span>())
        {
            pdf.Save(s);
            pdfBytes = s.ToArray();
        }

        <span style="color:blue;">return</span> pdfBytes;
    }
}</pre></div><div>I have attached a generated PDF to this post.</div><div><br></div><div>This was reproduced using Aspose.Pdf.dll 8.4.0.0</div><div><br></div><div>Many Thanks,</div><div>James</div><div><br></div>

Hi James,

Thanks for your inquiry. We have noticed your reported issue in Aspose.Pdf.Generator namespace. But I am afraid it is old generator and obsolete now. We are making changes and improvements in new generator Aspose.Pdf (DOM approach), it is more improved and efficient. Please use DOM approach for the purpose, kindly please check following code snippet for the purpose. It will help you to fix the issue.

public static byte[] SavePdf(string inputHtml)
{
    //Open document
    Document pdfDocument = new Document();

    //Get particular page
    Page pdfPage = pdfDocument.Pages.Add();

    //Create text fragment
    TextFragment textFragment = new TextFragment(inputHtml);
    textFragment.Position = new Position(100, 600);

    //Set text properties
    textFragment.TextState.FontSize = 12;
    textFragment.TextState.Font = FontRepository.FindFont("TimesNewRoman");
    textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.LightGray);
    textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Red);

    //Create TextBuilder object
    TextBuilder textBuilder = new TextBuilder(pdfPage);

    //Append the text fragment to the PDF page
    textBuilder.AppendText(textFragment);

    byte[] pdfBytes;
    using (MemoryStream s = new MemoryStream())
    {
        pdfDocument.Save(s);
        pdfBytes = s.ToArray();
    }

    return pdfBytes;
}

Please feel free to contact us for any further assistance.

Best Regards,

Hi Tilal,


As you can probably tell from my method signature, we convert a block of HTML to a PDF. Is there a better supported method of doing this than in my example? I’m not sure if the DOM approach works for HTML to PDF.

Thanks,
James

Hi James,

I am pleased to share that the new Document Object Model (DOM) of Aspose.Pdf namespace supports the feature to convert HTML to PDF format. However for conversion purposes, the HTML should be File object. Please try using the following code snippet to accomplish this requirement.

[C#]

HtmlLoadOptions options = new HtmlLoadOptions();
// use the new conversion engine
options.UseNewConversionEngine = true;
// load HTML file
Document pdfDocument = new Document(myDir + "Sample.html", options);
// save output as PDF format
pdfDocument.Save(myDir + "HTMLtoPDF_DOM.pdf");

This is good to hear.


Does the Document class support a MemoryStream instance? We use this as part of a long running operation with the conversion occuring in-memory. I have attached a screenshot of the exception when I try to use a MemoryStream.

Hi James,

I have tested the scenario using Aspose.Pdf for .NET 9.1.0 where I have used the following code snippet and I am unable to notice any issue. Can you please try using the following code snippet and in case you still face any issue, please feel free to contact.

C#

using (FileStream fileStream = File.OpenRead("c:/pdftest/Rectangle-fb2.html")) {
    MemoryStream memStream = new MemoryStream();
    memStream.SetLength(fileStream.Length);
    fileStream.Read(memStream.GetBuffer(), 0, (int)fileStream.Length);

    Document doc = new Document(memStream, new HtmlLoadOptions());
    doc.Save("c:/pdftest/HTMLCOnversionDirect.pdf");
}

Hi Nayyer,


I have tested the code below with Aspose Pdf for .NET 9.1.0 and the generated PDF file contains ??? in place of accented characters. See the attached PDF file.

    class Program
{
static void Main(string[] args)
{
string testStr = “Książęce”;
Aspose.Pdf.License lic = new License();
lic.SetLicense(“Aspose.Total.lic”);
        System.IO.<span style="color:#2b91af;">File</span>.WriteAllBytes(<span style="color:#a31515;">"TestFile.pdf"</span>, SavePdfEx(testStr));
    }

    <span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">byte</span>[] SavePdfEx(<span style="color:blue;">string</span> inputHtml)
    {
        <span style="color:blue;">byte</span>[] inputBytes = <span style="color:#2b91af;">Encoding</span>.ASCII.GetBytes(inputHtml);
        <span style="color:#2b91af;">HtmlLoadOptions</span> options = <span style="color:blue;">new</span> <span style="color:#2b91af;">HtmlLoadOptions</span>();

        <span style="color:#2b91af;">MemoryStream</span> inputStream = <span style="color:blue;">new</span> <span style="color:#2b91af;">MemoryStream</span>(inputBytes);
        <span style="color:#2b91af;">MemoryStream</span> outputStream = <span style="color:blue;">new</span> <span style="color:#2b91af;">MemoryStream</span>();

        <span style="color:#2b91af;">Document</span> pdf = <span style="color:blue;">new</span> <span style="color:#2b91af;">Document</span>(inputStream, options);
        pdf.Save(outputStream);
        <span style="color:blue;">return</span> outputStream.ToArray();
    }
}</pre></div><div><br></div><div>Thanks,</div>

Hi James,

I have tested the scenario and I am able to reproduce the same problem. For the sake of correction, I have logged it in our issue tracking system as PDFNEWNET-36815. We will investigate this issue in detail and will keep you updated on the status of a correction.

We apologize for your inconvenience.

Hi James,

Thanks for your patience. We have further investigated the issue and would like to update you that you need to use Unicode encoding for accented characters. Please check the following code snippet, hopefully it will help you to accomplish the task.

string testStr = "Książęce";

byte[] inputBytes = Encoding.UTF8.GetBytes(testStr);
HtmlLoadOptions options = new HtmlLoadOptions();

MemoryStream inputStream = new MemoryStream(inputBytes);

Document pdf = new Document(inputStream, options);

pdf.Save("c:/pdftest/Accented_characters.pdf");

Please feel free to contact us for any further assistance.

Best Regards,

The issues you have found earlier (filed as PDFNEWNET-36815) have been fixed in Aspose.Pdf for .NET 9.4.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.