Missing Unicode characters from the first TOC entry (docx to epub)

Caris · October 26, 2011, 5:42am

Hello!

We are using Aspose.Words .NET to convert several hundreds of documents from .docx to epub format and we encountered a strange bug. The first entry of TOC missing all special Unicode characters, in our case these are the special accented Hungarian letters (eg. ő, ű, ó, á). It happens with every converted document and only with the first enty of TOC, the rest of the entries are perfect. Are we missing something or is is a bug?

Thanks,

awais.hafeez · October 26, 2011, 7:37am

Hi Szabolcs,

Thanks for your request. Could you please attach your input (DOCX) and output (EPUB) documents here for testing? I will investigate the issue on my side and provide you more information.

Best Regards,

Caris · October 26, 2011, 8:03am

Hello,

Thanks for your quick answer. I attached a docx and a converted epub (ZIP-ped because .epub extension is not allowed for uploading) which has the problem described before.

As you can see, the first TOC entry should be “Ecsedy Ildikó bibliográfiája” based on the docx document title, but in the epub it is changed to “Ecsedy Ildik bibliogrfija”, it is missing the accented Hungarian characters (ó and á in this case).

Thanks,

Szabolcs

awais.hafeez · October 26, 2011, 9:53am

Hi,

Thanks for providing the additional information.

After investigating your DOCX and EPUB documents, I am afraid I can not find any problems with the rendering of Hungarian characters upon conversion to EPUB on my side. I have attached a screen shot for illustration, please see the attached file.

I think, it may be that the issue only occurs with your current EPUB reader software. If this is found to be the case, could you please give a different reader a try? Moreover, to view your EPUB file, I used EPUBReader 1.4.1.0 add-on for firefox on my side.

Please let us know if you need more information, We are always glad to help you.

Best Regards,

Caris · October 27, 2011, 2:59am

Hello!

Thank you for your answer. Maybe I wasn’t perfectly clear before, only the top level of TOC is missing Unicode characters which based on the Title of the Word document. I attached a picture to demonstrate the problem. As you can see, the title in Word is good, but after conversion it loses the Unicode charaters. You can see it in the converted document if you examine the NCX file inside.

Thanks,

Szabolcs

awais.hafeez · October 27, 2011, 4:39am

Hi,

Thanks for reporting this issue to us and providing the additional information. I managed to reproduce this issue on my side. I logged this problem in our Issue Tracking System and your request has also been linked to this issue. You will be notified as soon as it is resolved.

If we can help with anything else, please feel free to ask.

Best Regards,

aspose.notifier · August 9, 2015, 3:10am

The issues you have found earlier (filed as WORDSNET-5425) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.