We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

SOLUTION: Opening RTF document- seems to fail if document is too short

Hello,

I am using Aspose.Words for .NET, version 14.1.0.0, runtime version v2.0.50727, with a .NET 4.5.1 c# application. I am trying to read an RTF string and open it as an Aspose Document. It seems to work properly for some of the RTF values, but not others. I finally narrowed it down that it was the shorter RTFs that wouldn’t read correctly.

Here is my code (cleaned up to remove the extraneous portions):

using (var stream = new MemoryStream())
{
var rtfmemo = “{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0
Arial;}}\r\n\viewkind4\uc1\pard\fs20 Lorem ipsum \par\r\n}”;
var sw = new StreamWriter(stream);
sw.Write(rtfmemo);
stream.Position = 0;
var loadOptions = new Aspose.Words.LoadOptions()
{ LoadFormat = Aspose.Words.LoadFormat.Rtf };
var doc = new Aspose.Words.Document(stream, loadOptions);
var txt = doc.GetText();
}
The sample above results in the document text being “\f”, rather than “Lorem ipsum”. This seems to happen with literally any short sample.

However if I merely lengthen the data, but with exactly the same RTF structure, as in the following sample string, after it reaches a certain length it starts working perfectly. That point is somewhere between two and three of the full “lorem ipsum” string, so the sample below has three of them:
var rtfmemo = “{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}}\r\n\viewkind4\uc1\pard\fs20 Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\par\r\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\par\r\nLorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.\par\r\n}”;
Obviously I could just pad all of my RTF’s so that they are long enough…but is there a better way to fix this problem so that it reads short RTF’s correctly?

Thanks in advance!
David

Hi David,


Thanks for your inquiry. Please use Document.ToString(SaveFormat.Text) method instead of Document.GetText method to get the desired output. Hope this helps you.

The Node.GetText method gets the text of this node and of all its children. The returned string includes all control and special characters as described in ControlChar.

Hi Tahir,

Yes, that’s what I’m doing, and it gives essentially the same results as GetText. I was using GetText for brevity. Using the following code in conjunction with the code in my first post:

doc.Save(stream, Aspose.Words.SaveFormat.Html);
var sr = new StreamReader(stream);
var html = sr.ReadToEnd();
This is what I get back in my HTML string:

 


No matter what body is in the RTF file, if it is less than about 1000 characters, the only actual text is this "&xa0;"

Anyway, fortunately I found a work around. It seems the Aspose Words document just wants a stream of at least 1025 characters, and doesn’t care what kind of characters they are. So I am just adding a bunch of white space after the closing “}” at the end of the RTF, and if I do that, I get the actual text from the RTF document, instead of the “\f” or "&xa0;"

rtfmemo += (new String(’ ', 1025));
Hi David,

Thanks for your inquiry. We have tested the scenario using latest version of Aspose.Words for .NET 16.12.0 and have not found the shared issue. Please use Aspose.Words for .NET 16.12.0.