Document.GetText does not return correct symbol value.using Java

Hi,
I am using Aspose.Words API for Java.
I recently noted that characters (possibly not in the UTF-8) range being changed automatically by the API. I’ll explain the situation.

1. The document contains a special character for bullets. The corresponding integer value for it being 61553.
2. On reading the document through Aspose.Words, i see it displayed as the alphabet ‘q’ (value 113)
3. On encountering more such characters i noticed that for characters with value greater than 61440 are returned as ( - 61440) ?

What is the possible reason for this?
My machine uses UTF-8 encoding but i want my program to be independent of the machine it is running on. Kindly provide an answer and i’ll make changes to my code accordingly.

Thanks,
Divyansh

Hi Divyansh,

Thanks for your query. It would be great if you please share your document for investigation purposes.


Contenido : Se refiere al conjunto de datos con su correspondiente descripción.

Referencial : Contienen referencias bibliográficas a los documentos que contienen los datos pero no contienen los datos.

This is the reference string. The document is too large and i’m facing this problem for only these 2 lines. The unknown characters (as we can see before the bold word) are being shown differently when using para.toTxt()

copy paste this entire text and do a toTxt()
The first character appears as ‘q’ :frowning: the actual integer equivalent value to it is 61553.

Hi Divyansh,

Thanks for your query. I have managed to reproduce the same problem at my end. I have logged this issue as WORDSNET-6517 in our issue tracking system. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Hi Divyansh,

Thanks for your patience. The WORDSNET-6517 has been fixed in latest version of Aspose.Words for Java 17.5. Please upgrade to the latest version of Aspose.Words for Java 17.5.