Free Support Forum - aspose.com

Aspose.Words.Document.ToTxt() returning an extra \r\n at end of string

Hi there,


I’m trying to convert some RTF to plain text and I noticed that the method Aspose.Words.Document.ToTxt() is returning an extra CR+LF at the end of the string.

The rtf I am converting is this:
{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
{*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs20 This is \ul\b test \ulnone\b0 text.}

and the plain text representation I am expecting is this:
This is test text.

This is the code I’m using to convert:
string rtfText = @"{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}}
{*\generator Msftedit 5.41.15.1515;}\viewkind4\uc1\pard\f0\fs20 This is \ul\b test \ulnone\b0 text.}";

using(MemoryStream stream = new MemoryStream(rtfText.GetBytes()))
{
Document document = new Document(stream);
retVal = document.ToTxt();
}

At this point in the code the value of retVal is:
This is test text.\r\n

Let me know if I’m missing something or there’s another better method I should be using for the conversion.

Thanks!

Hi

Thanks for your request. Aspose.Words returns a correct value. MS Word document should contain at least one paragraph. So the line break you see at the end of your string is a paragraph break. If you do not need this break, you can easily trim it out in your code.

Best regards,

Hi,


Thanks for the quick response. I agree that it would be easy for me to fix and I’ll do that. Just to be clear, I can assume there will always be a \r\n at the end of every piece of RTF I convert in this way?

While I understand it seems trivial, its clearly not the case that every paragraph in an RTF file ends with a line break. If you open WordPad, type a line of text and don’t hit Enter, you have a paragraph that does not end with a line break.

Thanks Again

Hi

Thanks for your request. Anyways, even if there is only one paragraph, it is still there and paragraph break is also there. Please see the attached screenshot.

Best regards,

Sorry, I didn’t mean to suggest the paragraph didn’t end there. I meant that the end of a paragraph does not necessarily mean a newline.


Anyway, if I can rely on that newline always being there I’ll just always strip a newline off the end of converted text. Any non-empty converted RTF will have the paragraph break (plain text newline) at the end right?

Thanks

Hi

Thanks for your inquiry. Even empty document will have one paragraph break.

You can also implement your own to txt converter as described here:

http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/howto-extract-content-using-documentvisitor.html

Best regards,