EUC and JIS encoding

Hello,


I currently have a need to support the EUC and JIS text encoding types. I am using Aspose.Words to extract document data and render page images from text files. No errors are thrown, however the characters being displayed are not correct. Most of them show up as the unknown character symbol.

Is there any support for these encoding types?

Thanks

Hi Ralph,

Thanks for your inquiry. It would be great if you please share some more detail about your query along with following detail for investigation purposes.


  • Please attach your input Word document.
  • Please

    create a standalone/runnable simple application (for example a Console
    Application Project
    ) that demonstrates the code (Aspose.Words code) you used to generate
    your output document

  • Please attach the output documents that shows the undesired behavior.
  • Please share to which file format you are saving your final document.

Unfortunately,
it is difficult to say what the problem is without the Document(s) and
simplified application. We need your Document(s) and simple project to
reproduce the problem. As soon as you get these pieces of information to
us we’ll start our investigation into your issue.

Hello,


Attached is a visual studio project that demonstrates what I am trying to do. The text files and resulting output images are located in the root folder. I am seeing issues with the EUC, JIS, and ShiftJIS encodings. The Unicode text file works fine and was included for comparison.

Thanks

Hi Ralph,

Thanks for sharing the detail. By default, Aspose.Words tries to detect a proper
encoding during loading a text file into it’s DOM. Aspose.Words’
encoding detector tries to detect the following encodings:


  • latin1
  • BigEndianUnicode
  • Unicode
  • UTF32
  • UTF7
  • UTF8

But,
in your case, you need to specify the desired character encoding using
LoadOptions.Encoding
property. Aspose.Words tries to mimic the way the
Microsoft Word works i.e. if you open this text file with Microsoft
Word, it asks to provide an encoding too. This is by design.

Please check the following code example for your kind reference.

LoadOptions option = new LoadOptions();

option.Encoding = Encoding.GetEncoding(50222); // Japanese JIS

Document doc = new Document(MyDir + "aiueoJIS.txt", option);

ImageSaveOptions options = new ImageSaveOptions(SaveFormat.Png);

options.PageCount = 1;

options.PageIndex = 0;

doc.Save(MyDir + "aiueoJIS.png", options);

option = null;

option = new LoadOptions();

option.Encoding = Encoding.GetEncoding(932); //Shift-JIS

doc = new Document(MyDir + "aiueoShiftJIS.txt", option);

options = new ImageSaveOptions(SaveFormat.Png);

options.PageCount = 1;

options.PageIndex = 0;

doc.Save(MyDir + "aiueoShiftJIS.png", options);

option = null;

option = new LoadOptions();

option.Encoding = Encoding.GetEncoding(51932); //EUC Japanese

doc = new Document(MyDir + "aiueoEUC.txt", option);

options = new ImageSaveOptions(SaveFormat.Png);

options.PageCount = 1;

options.PageIndex = 0;

doc.Save(MyDir + "aiueoEUC.png", options);


Perfect.


Thank you.

Hi Ralph,

Thanks for your feedback. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.