Line Break Changes


#1

Hi,

I have just installed 3.0.3 from 2.3.0.0. Our system that are using Aspose to read the documents screwed up. All lines join as one line. After some checking, we found out that the line break has changed from “carriage return + line feed” to "carriage return + carriage return ".

We made changes to our code to capture the correct line break and it works again.

On which version did this change implemented? I glanced through the release histories and doesnt seems find this info. Will Aspose change again on this matter? or what should i change in my code to correctly identify all line breaks in all Aspose version?

Thanks,

Shu Yih


#2

Line break handling in Aspose.Word should not have changed as to cause problems like this.

Please let me know what do you do with Aspose.Word and how. Do you have a text file that you read and then push into a document using DocumentBuilder.WriteLine or you are talking about saving a text file from Aspose.Word? Post or attach your code if you think it will help us. We will investigate and probably return it to work the way it used to.




#3

Thanks for your prompt reply.

We are reading text from Words document and process on them. We did not attempt to write it to another Words doc.

We read the text look for the line breaks and break them down to lines.
We loop through each chars in Text get from “New Aspose.Word.Document(File.FullName).Range.Text” and try to look for line break in the document and create a new line.

Previously, we tried

If chCurr = vbCr AndAlso intCharIndex < EndIndex AndAlso Text.Chars(intCharIndex + 1) = vbLf Then
’ 2 chars Line Break
’ create a new line
ElseIf chCurr = vbLf Then
’ 1 char Line Break
’ create a new line

Now we changed to,

If chCurr = vbCr AndAlso intCharIndex < EndIndex AndAlso (Text.Chars(intCharIndex + 1) = vbLf OrElse Text.Chars(intCharIndex + 1) = vbCr) Then
’ 2 chars Line Break
ElseIf chCurr = vbLf OrElse chCurr = vbCr Then
’ 1 char Line Break

It seems like the last time line break returning from aspose is “vbCr+vbLf” or "vbLf " now it has changed to “vbCr+vbCr” or “vbCr”





#4

In a Word file, VbCr (or “\r”) is a paragraph break.

In a text file, VbCrl + VbLf ("\r\n") is usually a paragraph break.

Internally Aspose.Word always stores \r as a paragraph break. When Aspose.Word saves (exports) into a text file, it “normalizes” paragraph breaks into “\r\n” combination.

Before Aspose.Word 3.0 Range.Text used to return text using the same algorithm as export to text, therefore it was returning “\r\n”. Since Aspose.Word 3.0 Range.Text was changed to return text using Node.GetText, that does not perform “normalization” of control characters and just returns text as it would be in a Word document.

So you are quite right where Range.Text used to return VbCr + VbLf it now returns only VbCr. VbCr + VbCr combination is not used as a paragraph break, but you can get several VbCr in a sequence if you have empty paragraphs.

Sorry that Range.Text behaviour changed and caused the problems to you. It is going to stay the way it is working now - “\r” is for end of paragraph, “\x000c” - end of section, “\x0007” for end of table cell and end of table row. There could be other control characters, see the ControlChar class in Aspose.Word.