Keep indentation of Lists when save as Text

Hi,

Please advise about a way to keep indentation of Lists when save docx to Text files.
I assume PrettyFormat will do the job :slight_smile:.

I attached text file I fixed manually.
Here my code sample :

{
Document d = new Document(Path.Combine(SampleFiles, “Sample.docx”));
d.Save(Path.Combine(SampleFiles, “WordSample.txt”), new Aspose.Words.Saving.TxtSaveOptions()
{
PrettyFormat = true,
Encoding = Encoding.UTF8,
PreserveTableLayout = true,
});
}

WordSample.zip (14.2 KB)

@arn951,

We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-17778. We will further look into the details of this problem and will keep you updated on the status of the linked issue.

I consider the following workaround for Latin text (I have issue with Hebrew) :slight_smile:
You can see the nice indent in the attached.

{
Aspose.Words.Document d = new Aspose.Words.Document(Path.Combine(SampleFiles, “SampleIdent.docx”));

d.Save(Path.Combine(SampleFiles, “WordSampleIdent.docx.txt”), new Aspose.Words.Saving.TxtSaveOptions()
{
PrettyFormat = true,
Encoding = Encoding.UTF8,
PreserveTableLayout = true,
});

//Workaround with Aspose.PDF
using (MemoryStream ms = new MemoryStream())
{
d.Save(ms, new Aspose.Words.Saving.PdfSaveOptions()
{
PrettyFormat = true,
UpdateFields = true,
});
ms.Seek(0, SeekOrigin.Begin); // Reset MemorySteeam to posion 0/begin;
using (Aspose.Pdf.Document pdfFile = new Aspose.Pdf.Document(ms)) {
// Create TextAbsorber object to extract text
Aspose.Pdf.Text.TextAbsorber textAbsorber = new Aspose.Pdf.Text.TextAbsorber() ;
// Accept the absorber for all the pages
pdfFile.Pages.Accept(textAbsorber);
// Get the extracted text
string extractedText = textAbsorber.Text;
// Create a writer and open the file
using (TextWriter tw = new StreamWriter(Path.Combine(SampleFiles, “Word2PdfSampleIdent.docx.txt”), false, Encoding.UTF8))
{
// Write a line of text to the file
tw.WriteLine(extractedText);
// Close the stream
tw.Close();
}
}
}
}

SampleIdent.zip (149.2 KB)

@arn951,

Thanks for the additional information. We have logged these details in our issue tracking system. We will inform you via this thread as soon as this issue (WORDSNET-17778) is resolved. We apologize for any inconvenience.

The issues you have found earlier (filed as WORDSNET-17778) have been fixed in this Aspose.Words for .NET 19.3 update and this Aspose.Words for Java 19.3 update.

@arn951,

Regarding WORDSNET-17778, the following public class is introduced that allows specifying how list levels are indented when exporting to a plain text format:

/// <summary>
/// Specifies how list levels are indented when document is exporting to SaveFormat.Text format.
/// </summary>
public class TxtListIndentation

It has two public properties (Count and Character) for specifying how many and which character to use for indentation of list levels. A new option is also added to the TxtSaveOptions class:

/// <summary>
/// Gets a ListIndentation object that specifies how many and which character to use for indentation of list levels.
/// By default it is zero count of character '\0', that means no indentation.
/// </summary>
public TxtListIndentation ListIndentation

Here are a few usecases:

UC1: Use one tab character per level for list indentation:

Document doc = new Document("input_document");
 
TxtSaveOptions options = new TxtSaveOptions();
options.ListIndentation.Count = 1;
options.ListIndentation.Character = '\t';
 
doc.Save("output.txt", options);

UC2: Use three spaces per one level for list indentation:

Document doc = new Document("input_document");
 
TxtSaveOptions options = new TxtSaveOptions();
options.ListIndentation.Count = 3;
options.ListIndentation.Character = ' ';
 
doc.Save("output.txt", options);

UC3: Do not use any list level indentation (default behavior):

Document doc1 = new Document("input_document");
doc1.Save("output1.txt");
 
Document doc2 = new Document("input_document");
TxtSaveOptions options = new TxtSaveOptions();
doc2.Save("output2.txt", options);

Hope, this helps.