Change the encoding type for an .ODT document

Trying to change the Encoding property of a document that will be saved as .ODT as outputs some as diamond question mark.
We had the issue with Cells and solve it instantiating a TxtLoadOptions and change t’s Encoding property to UTF7, than provide the instance as argument when instantiating the workbook. Thought might be something similar for Words and tried 2 approaches:

  1. Aspose.Words.Saving
HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.Encoding = System.Text.Encoding.UTF7;
doc.Save(sRepName, saveOptions);
  1. Aspose.Words.Loading
//Tried this by instantiating a 2nd document that will load 1st doc in (.odt format)
TxtLoadOptions loading = new TxtLoadOptions();
loading.LoadFormat = LoadFormat.Odt;
loading.Encoding = System.Text.Encoding.UTF7;
Document docEncoded = new Document(sRepName, loading);
docEncoded.Save(@"C:\Temp\testEnc.odt"); Process.Start(@"C:\Temp\testEnc.odt");

None of the approaches worked

@Remus87 Could you please attach your problematic input and output documents here for testing? We will check the issue and provide you more information.

Hi Alexey,
Attached you’ll find the inputs (2 txt files) and output (final .odt report)
On the sched1Pt1.txt is line 11 (Staveley Gardens…), and on sched2Pt1 is line 3 (Summerwood Road…) Input_Output.zip (9.8 KB)

On the output (.odt) you’ll see the question mark on same number of rows (Schedule 1 - row3 and Schedule 2 - row 11)

@Remus87 Could you please provide code you use to convert your TXT files to ODT? Simple conversion TXT document to ODT does not allow to reproduce the problem:

Document doc = new Document(@"C:\Temp\sched1Pt1.txt");
doc.Save(@"C:\Temp\out.odt");

the code is long so just created a basic example that uses similar functionality, populating the records from text files through an instance of DocumentBuilder.
You should be able to see the wrong chars.

public void JustATest()
{
    string text1 = System.IO.File.ReadAllText(@"C:\Temp\Input_Output\sched1Pt1.txt");
    string text2 = System.IO.File.ReadAllText(@"C:\Temp\Input_Output\sched2Pt1.txt");

    Document doc = new Document();
    DocumentBuilder builder = new DocumentBuilder(doc);

    string[] feeds = new string[] { text1, text2 };
    foreach (var feed in feeds)
    {
        builder.Writeln(feed);
    }
    doc.Save(@"C:\Temp\Test77.odt"); Process.Start(@"C:\Temp\Test77.odt");
}

@Remus87 The problem occurs because File.ReadAllText uses UTF8 encoding, but to properly read the file Western European (Windows) - 1252 encoding should be used. You can let Aspose.Words to detect the correct encoding of the TXT file. Please try using the following code:

FileFormatInfo info = FileFormatUtil.DetectFileFormat(@"C:\Temp\sched1Pt1.txt");

string text1 = System.IO.File.ReadAllText(@"C:\Temp\sched1Pt1.txt", info.Encoding);
string text2 = System.IO.File.ReadAllText(@"C:\Temp\sched2Pt1.txt", info.Encoding);

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

string[] feeds = new string[] { text1, text2 };
foreach (var feed in feeds)
{
    builder.Writeln(feed);
}
doc.Save(@"C:\Temp\out.odt");

Thanks! Works fine.

1 Like