We are using Aspose 3.x for creating MS Word documents using MS Word templates created in 2007 format. After successfully creating the document, we use document convert method to generate input text stream to send the text contents using Solaris sendmail. We have letters in English and Spanish language. On Solaris, we have necessary locale(s) installed for English and Spanish. However, when we see the generated input text stream from Aspose Word document using the convert method, we see “?” charaters in both (English and Spanish) documents. The example is as below:
Spanish:
Tambi?n puede ver el saldo de su cuenta en nuestro sitio Web, siguiendo estos pasos:
English:
.?Unfortunately, we are unable to accept credit and/or debit card payments online at this time.?
?
Some of the Spanish characters, blank lines, are converted in “?” sign. has anyone experienced this isssue and any suggestions or recommended solution?
Hi
Thanks for your inquiry. Could you please attach sample documents here and provide me sample code? I will try to reproduce the problem on my side and provide you more information.
Best regards.
Attached is the sample file. Below is code we use:
public void setDocumentText(String docDir, String docName, String docExtension, Boolean isSpanish)
{
try
{
docName = docName + docExtension;
Document doc = openDocument(docDir, docName);
DocumentBuilder builder = new DocumentBuilder(doc);
Font font = builder.getFont();
if (isSpanish)
font.setLocaleId(21514);
else
font.setLocaleId(1033);
docText = builder.getDocument().toTxt();
// docText = doc.toTxt();
logger.info("Document Text: " + docText);
}
catch (FileNotFoundException e)
{
status = ERROR;
}
catch (Exception e)
{
status = ERROR;
}
}
Hi
Thank you for additional information. I used the following code for testing and returned string displayed correctly:
Document doc = new Document("C:\\Temp\\Sample_Spanish_Document.doc");
System.out.println(doc.toTxt());
Best regards.
Which environment did you try your code? Please note that if we run this on Windows server, this is working fine. We have issues on Solaris.
Do we need specific language and locale environment variables set for the session in which we are running Aspose?
Hi
Thank you for additional information. I ran this code on Windows. I will consult with our lead java developer regarding your issue and provide you more information.
Best regards.
Hi,
Are you tried to use System.out.println() instead of logger? Probably Solaris logger tuned to use 7-bit encoding or something similar.
The thing is that Document.toTxt() doesn’t use any encoding options – it just returns a java string (which is Unicode UTF-16 and accepts any local language). So, it’s a developer responsibility to properly encode the string when saving it to a file or stream.
For convenience you can use Document.save(".txt") or Document.save(“filename”, SaveFormat.TEXT) or Document.save( , SaveFormat.TEXT) instead of Document.toTxt(). In that case you can control encoding of output strings by setting doc.getSaveOptions().setTxtExportEncoding(java.nio.Charset).
And also you can delete all DocumentBuilder-Font-setLocale() stuff: 1) these settings is MS-Word specific and doesn’t saved into txt; 2) these settings affect only new text added by DocumentBuilder.
Best Regards,
Any update on this? Did anyone try this on Solaris?
I tried changing encoding at the OS level, JBoss level and by setting Java properties. Still the same issue.
Hi,
Please, read my previous message – changing default encoding doesn’t help since Document.toTxt() produces just a java string and doesn’t use any encoding options.
Are you tried to use System.out.println() or something else instead of your logger?
Can you look at resulting sting in a debugger?
Regards,