Free Support Forum - aspose.com

Missing Unicode characters from the first TOC entry (docx to epub)

Hello!

We are using Aspose.Words .NET to convert several hundreds of documents from .docx to epub format and we encountered a strange bug. The first entry of TOC missing all special Unicode characters, in our case these are the special accented Hungarian letters (eg. ő, ű, ó, á). It happens with every converted document and only with the first enty of TOC, the rest of the entries are perfect. Are we missing something or is is a bug?

Thanks,

Hi Szabolcs,

Thanks for your request. Could you please attach your input (DOCX) and output (EPUB) documents here for testing? I will investigate the issue on my side and provide you more information.

Best Regards,

Hello,

Thanks for your quick answer. I attached a docx and a converted epub (ZIP-ped because .epub extension is not allowed for uploading) which has the problem described before.

As you can see, the first TOC entry should be “Ecsedy Ildikó bibliográfiája” based on the docx document title, but in the epub it is changed to “Ecsedy Ildik bibliogrfija”, it is missing the accented Hungarian characters (ó and á in this case).

Thanks,

Szabolcs

Hi,

Thanks for providing the additional information.

After investigating your DOCX and EPUB documents, I am afraid I can not find any problems with the rendering of Hungarian characters upon conversion to EPUB on my side. I have attached a screen shot for illustration, please see the attached file.

I think, it may be that the issue only occurs with your current EPUB reader software. If this is found to be the case, could you please give a different reader a try? Moreover, to view your EPUB file, I used EPUBReader 1.4.1.0 add-on for firefox on my side.

Please let us know if you need more information, We are always glad to help you.

Best Regards,

Hello!

Thank you for your answer. Maybe I wasn’t perfectly clear before, only the top level of TOC is missing Unicode characters which based on the Title of the Word document. I attached a picture to demonstrate the problem. As you can see, the title in Word is good, but after conversion it loses the Unicode charaters. You can see it in the converted document if you examine the NCX file inside.

Thanks,

Szabolcs

Hi,


Thanks for reporting this issue to us and providing the additional information. I managed to reproduce this issue on my side. I logged this problem in our Issue Tracking System and your request has also been linked to this issue. You will be notified as soon as it is resolved.

If we can help with anything else, please feel free to ask.

Best Regards,

The issues you have found earlier (filed as WORDSNET-5425) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.