When I try to convert a Shift_JIS encoded text file to pdf, Japanese characters are displayed as garbled characters. Even if I set encoding, I get the same result. The file is created with sakura editor.
com.aspose.words.LoadOptions options = new com.aspose.words.LoadOptions();
options.setLoadFormat(com.aspose.words.LoadFormat.TEXT);
options.setEncoding(Charset.forName("Shift_JIS"));
Document document = new Document(decompressInput);
WordsFontWarning warning = new WordsFontWarning();
document.setWarningCallback(warning);
document.setFontSettings(fontSettings);
document.save(outputStream, SaveFormat.PDF);
My text/csv file can have any different encoding. Is it possible for Aspose to convert those files to PDF with out mentioning the encoding type in code.?
You do not need to specify encoding in load options. By default, Aspose.Words tries to detect a proper
encoding during loading a text file into it’s DOM.
You are testing with a very simple document. Please use original TXT document (with more text) that has Shift-JIS or Ansi encoding. If you still face problem, please ZIP and attach your original text document here for testing. We will investigate the issue and provide you more information on it.
Thanks for the suggestion. We will check with files having more text.
But we would like to know, is there any limitation for Aspose library to auto detect the encoding of the file(text/csv).?
Because the file we shared previously was given by the customer. Since customer can give any number of characters in the file(even single character), we would like to know the limitations of Aspose library in this case.
[ Update : ]
We tried files with more text in text file and the Aspose library is auto detecting the encoding. But if we try the same with CSV files, aspose library is not auto detecting the encoding and results in garbled characters.
I have attached the CSV files we used for testing.
We have tested the scenario using the latest version of Aspose.Words for Java 21.7 and have not found the shared issue. So, please use Aspose.Words for Java 21.7. We have attached the output PDF files with this post for your kind reference.
Could you reply for the below query? We use Aspose.words for converting txt to pdf.
When we use a txt file with very few characters, the encoding is not detected properly and we are getting garbled characters in PDF output. If we use txt file with more characters the encoding is set correctly. So we would like to know is there any limitation for Aspose library to auto detect the encoding of the file(text/csv).?
Aspose.Words detects the encoding of TXT file for small and big size documents. However, if TXT document has fewer text or has text with different encoding, Aspose.Words selects the suitable encoding for text.
Please remove English text from the TXT files and convert them into PDF files. You will get the correct output.
I understand this issue is not related to file size. Thanks.
So if a text document have both English and Japanese characters together and the file is saved with Shift-JIS encoding, Aspose could not recognize Shift-JIS encoding while auto-detecting the encoding type. Should I consider this as a limitation of Aspose.?
Please note that Aspose.Words reads the whole text file, checks the numbers of characters with different encoding, choose the encoding which greater numbers of characters.
In your case, when text characters with Shift-JIS are greater than normal encoding, Aspose.Words choose Shift-JIS. You can check it by adding more Japanese text into TXT file.