Free Support Forum - aspose.com

Character encoding for converted documents?

I am using the PHP SDK to convert various document types to HTML. I just noticed after converting one particular Word document that some bullet characters were returned with bad UTF-8 character values. Is there something that I need to be doing or is there an issue with something in the conversion?


Submitted from: http://saaspose.com/blog/saaspose-words/archive/2012/07/02/convert-document-to-other-file-formats-and-images-using-saaspose-words-rest-api.html

Hi Steve,

Can you please share the input Word document you're having problem with? We'll check it out at our end and guide you accordingly. We're sorry for the inconvenience caused by this issue.

Thanks & Regards
Shahzad Latif
Saaspose Support Team.

Hi Shazad,

I've attached a zip archive containing the original test Word document and an HTML version we got back after calling the following from a PHP script.

$doc = new WordDocument('');
$path = '.../Basic Styles 2011.docx';
$outputFilename = 'Basic Styles 2011.docx.html';
$result = $doc->ConvertLocalFile($path, $outputFilename, 'html');

I found that three types of bullet characters were encoded as

EF 82 A7 = U+F0A7 (bullet)
EF 82 B7 = U+F0B7 (bullet)
EF 82 BC = U+F0BC (checkmark)

which are in the private use block. I was expecting the bullet character(s) to be U+2022 and the checkmark to be U+2713, although a checkmark is being displayed, it seems.

Steve

Hi Steve,

Thank you for your details. I've attached a sample HTML file for your reference. I want you to share the feedback because I thought after this output we only need to fix checkmark character encoding. Checkmark encoding problem has been logged into our issue tracking system as SAASCELLS-60.

Best Regards,
Imran Rafique
Support Developer, Saaspose Sialkot Team
http://www.saaspose.com

Hi Steve,

Please check attachments here in this post.

Best Regards,
Imran Rafique
Support Developer, Saaspose Sialkot Team
http://www.saaspose.com

Hi Imran,

I looked at output.html and the one item that is still not displaying
correctly is the second item labeled "First item is plain" that should have
a checkmark next to it. The checkmark character is coming through as EF 82
BC = U+F0BC when I believe it should be E2 9C 93 = U+2713. (I'm referencing
http://www.fileformat.info/info/unicode/char/2713/index.htm.)

And would it be possible for you to run a conversion on the original Word
document I sent to you so that I could compare it with what I was seeing
before?

Thank you for taking the time to look into this.

Steve

Hi Steve,

Thank you for your details. I have converted the original Word document to HTML file format. Now you can compare it.
Please find zip file as an attachment.

Best Regards,
Imran Rafique
Support Developer, Saaspose Sialkot Team
http://www.saaspose.com

Hi Imran,

Overall it looks good except for the checkmark issue I mentioned before. Is it possible to change that character sequence to use the "official" Unicode checkmark character or there other reasons to preserve the character that I don't understand?

Thanks,
Steve

Hi Steve,

Thank you for your inquiry. Currently, our development team is analyzing the checkmark issue. As soon as we have made some significant progress, we would be more than happy to update you with the status of correction. We will keep you informed and let you know once it is fixed.

Best Regards,
Imran Rafique
Support Developer, Saaspose Sialkot Team
http://www.saaspose.com

Hi Steve,

It is to update you that our development team has finished analysing your issue and come to a conclusion that is your issue and the behaviour you're observing is actually not a bug. You can change the bullet font to "Arial Unicode MS" or "Segoe UI Symbol" and use U+2713 bullet symbol to solve this issue. If we can help you with anything else, please feel free to ask.

Best Regards,
Imran Rafique
Support Developer, Saaspose Sialkot Team
http://www.saaspose.com

Hi Steve,

Is your issue resolved after the last suggestion from our support team? Do you need any further help from our side? If you find any questions or need assistance, please feel free to let us know and we'll be glad to help you.

Thanks & Regards
Shahzad Latif
Saaspose Support Team.