ToTxt and lists/bullets

aneumann · January 7, 2009, 1:53am

Hi, i am using the ToTxt method and Save with SaveFormat.Text to get the text representation of a word document. Currently both methods don’t show numbered or symboled bullet points/lists in a document (contents are shown, but not the symbols).

Can I enable this somehow in either method?

Also, are there any other options i should turn on to get as much content from a word document to be included?

Thanks a bunch!

alexey.noskov · January 7, 2009, 2:35am

Hi
Thanks for your request. There is an issue #4919 in our defect database, regarding this problem.
Issue #4919 - Replace all bullets in the document with “\u2022” during converting to TXT.
You can implement such behavior yourself by creating custom to TXT converter as described here:
https://docs.aspose.com/words/net/how-to-extract-selected-content-between-nodes-in-a-document/
Best regards.

aneumann · January 8, 2009, 6:52am

Hi again, I have attempted to implement a custom DocumentVisitor, but am unable to perform the action of replacing the bullets. Could you please provide a more detailed code sample of the required implementation. Cheers!

alexey.noskov · January 8, 2009, 8:49am

Hi

Thanks for your request. Please try using the following code:
ListLabelsExtractor class is attached.

public void ToText(string inout, string output)
{
    // Open the document we want to convert.
    Document doc = new Document(inout);
    // Create an object that inherits from the DocumentVisitor class.
    MyDocToTxtWriter myConverter = new MyDocToTxtWriter();
    // This is the well known Visitor pattern. Get the model to accept a visitor.
    // The model will iterate through itself by calling the corresponding methods
    // on the visitor object (this is called visiting).
    // 
    // Note that every node in the object model has the Accept method so the visiting
    // can be executed not only for the whole document, but for any node in the document.
    doc.Accept(myConverter);
    // Once the visiting is complete, we can retrieve the result of the operation,
    // that in this example, has accumulated in the visitor.
    // Save text to the file
    string text = myConverter.GetText();
    FileStream txtFile = new FileStream(output, FileMode.Create);
    StreamWriter txtWriter = new StreamWriter(txtFile);
    txtWriter.Write(text);
    txtWriter.Close();
    txtFile.Close();
    Console.WriteLine(text);
}
/// 
/// Simple implementation of saving a document in the plain text format. Implemented as a Visitor.
/// 
public class MyDocToTxtWriter : DocumentVisitor
{
    public MyDocToTxtWriter()
    {
        mIsSkipText = false;
        mBuilder = new StringBuilder();
    }
    /// 
    /// Gets the plain text of the document that was accumulated by the visitor.
    /// 
    public string GetText()
    {
        return mBuilder.ToString();
    }
    /// 
    /// Called when a Run node is encountered in the document.
    /// 
    public override VisitorAction VisitRun(Run run)
    {
        AppendText(run.Text);
        // Let the visitor continue visiting other nodes.
        return VisitorAction.Continue;
    }
    /// 
    /// Called when a FieldStart node is encountered in the document.
    /// 
    public override VisitorAction VisitFieldStart(FieldStart fieldStart)
    {
        // In Microsoft Word, a field code (such as "MERGEFIELD FieldName") follows
        // after a field start character. We want to skip field codes and output field 
        // result only, therefore we use a flag to suspend the output while inside a field code.
        //
        // Note this is a very simplistic implementation and will not work very well
        // if you have nested fields in a document. 
        mIsSkipText = true;
        return VisitorAction.Continue;
    }
    /// 
    /// Called when a FieldSeparator node is encountered in the document.
    /// 
    public override VisitorAction VisitFieldSeparator(FieldSeparator fieldSeparator)
    {
        // Once reached a field separator node, we enable the output because we are
        // now entering the field result nodes.
        mIsSkipText = false;
        return VisitorAction.Continue;
    }
    /// 
    /// Called when a FieldEnd node is encountered in the document.
    /// 
    public override VisitorAction VisitFieldEnd(FieldEnd fieldEnd)
    {
        // Make sure we enable the output when reached a field end because some fields
        // do not have field separator and do not have field result.
        mIsSkipText = false;
        return VisitorAction.Continue;
    }
    public override VisitorAction VisitParagraphStart(Paragraph paragraph)
    {
        if (paragraph.IsListItem && paragraph.HasChildNodes)
        {
            string lable = string.Empty;
            if (paragraph.ListFormat.ListLevel.NumberStyle != NumberStyle.Bullet)
            {
                ListLabelsExtractor extractor = ListLabelsExtractor.GetLabelExtractor(paragraph.ListFormat.List);
                // Get lable of list item
                lable = extractor.GetListLabel(paragraph.ListFormat.ListLevelNumber) + "\t";
            }
            else
            {
                lable = "\u2022" + "\t";
            }
            AppendText(lable);
        }
        return VisitorAction.Continue;
    }
    /// 
    /// Called when visiting of a Paragraph node is ended in the document.
    /// 
    public override VisitorAction VisitParagraphEnd(Paragraph paragraph)
    {
        // When outputting to plain text we output Cr+Lf characters.
        AppendText(ControlChar.CrLf);
        return VisitorAction.Continue;
    }
    public override VisitorAction VisitBodyStart(Body body)
    {
        // We can detect beginning and end of all composite nodes such as Section, Body, 
        // Table, Paragraph etc and provide custom handling for them.
        mBuilder.Append("*** Body Started ***\r\n");
        return VisitorAction.Continue;
    }
    public override VisitorAction VisitBodyEnd(Body body)
    {
        mBuilder.Append("*** Body Ended ***\r\n");
        return VisitorAction.Continue;
    }
    /// 
    /// Called when a HeaderFooter node is encountered in the document.
    /// 
    public override VisitorAction VisitHeaderFooterStart(HeaderFooter headerFooter)
    {
        // Returning this value from a visitor method causes visiting of this
        // node to stop and move on to visiting the next sibling node.
        // The net effect in this example is that the text of headers and footers
        // is not included in the resulting output.
        return VisitorAction.SkipThisNode;
    }
    /// 
    /// Adds text to the current output. Honours the enabled/disabled output flag.
    /// 
    private void AppendText(string text)
    {
        if (!mIsSkipText)
            mBuilder.Append(text);
    }
    private readonly StringBuilder mBuilder;
    private bool mIsSkipText;
}

Hope this helps.
Best regards.

aneumann · May 18, 2009, 6:54am

Hey, has there been any progress in fixing this issue? Thanks in advance.

alexey.noskov · May 18, 2009, 7:30am

Hi

Thanks for your request. Unfortunately, the issue is still unresolved.
Best regards.

aneumann · July 26, 2009, 4:56am

Hey, just checking to see if there is any progress on this issue? Thanks

Klepus · July 26, 2009, 6:25am

Hello!
Sorry for inconvenience. The issue is still open. Have you tried the workaround with custom DocumentVisitor suggested by Alexey?
Regards,

aneumann · September 9, 2009, 2:09am

We have yes, thanks. We would prefer to have this functionality built in to the Aspose.Words library however. Are there any updates on when this will be fixed?

As a side note, the above workaround exhibits a memory leak when using the visitor and calling GetText in a loop in a long running application. The static dictionaries keep the Aspose objects in memory.

alexey.noskov · September 9, 2009, 7:16am

Hi

Thanks for your inquiry. Unfortunately, I cannot provide you any reliable estimate regarding this issue at the moment.
Could you please provide me sample code, which will allow me to reproduce Memory leak on my side. I will investigate the problem and provide you more information.
Best regards.

aneumann · May 10, 2010, 8:04am

Hey, just wondering if this is still an issue on your radar?
Thanks, Adam

alexey.noskov · May 10, 2010, 8:34am

Hi

Thanks for your inquiry. The issue is still opened in our defect database. So for now, you can try using the workaround suggested above in this thread.
Best regards.

aspose.notifier · September 6, 2015, 1:58am

The issues you have found earlier (filed as WORDSNET-1643) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.