Classic ASP(!) Extended character set

We’re using 3.3.5.0 (back from 2007 I think) and I’m having trouble including extended characters within an PDF. We’re creating the PDF using classic ASP, first generating an XML document and passing that to the BindXML_2 method. But it throws an “Invalid character in given encoding” error.

I’ve tried XML encoding the characters such as é but that just leads to the XML encoding appearing in the final document.

Is this even possible? If so, how?

Hi Felbrigg,


Thanks for contacting support.

Can you please share the resource XML so that we can test the conversion at our end. We are sorry for this inconvenience.

Hi, Here’s the part of the XML that’s causing the issue. It’s the special characters in the first CDATA section. I’ve attached the xml in a file to this post. I hope you can see it.

Hi Felbrigg,


Thanks for sharing the resource file.

I have tested the scenario using following code and as per my observations, System.Xml.XmlException is being generated. For
the sake of correction, I have logged it in our issue tracking system as PDFNEWNET-37670. We will
investigate this issue in details and will keep you updated on the status of a
correction.

We apologize for your inconvenience.


[C#]

Pdf pdf = new Pdf();

pdf.BindXML("c:/pdftest/sampleXML.txt", null);

pdf.Save("c:/pdftest/sampleXML.pdf");

Hi Felbrigg,


Thanks for your patience.

We have further investigated the issue reported earlier and as per our observations, it does not seem to be an issue with Aspose.Pdf for .NET API. The problem appears to be in source XML file. It also generates error when opened in any another XML viewer.

The reason is : though in document’s XML header “utf-8” encoding is referenced, actually source file is in some another encoding. When we open source file in text editor and saved it in UTF-8, we can see that saved file and original file are different (please look at sampleXML_corrected.xml and attached screenshot that illustrates the difference).

I have also attached as example file (that really uses UTF-8 encoding) that contains language-specific characters (é) which was originally requested by you in forum thread ( see attached 36760.xml). And I have also attached the output PDF (36760.pdf) which is successfully generated from 36760.xml with supplied snippet. As we notice that result PDF is fine (it contains same characters(é)), therefore our understanding is correct that, when source file is in correct encoding (that is encoding defined in header), the PDF document is properly generated.

So the reason of your problem is that your application generates XML file in wrong encoding (not in utf-8 encoding that referenced in XML-header).