Why is the text split on several runs?

There is a Word document attached to this message. It only contains a single word. I do not understand, why the word is distributed to three different runs. I cannot discover, why this is done.

The document has been created with MS Word 2003 SP2. I tried to load the document into Aspose.Words, then save it under different formats. However, the three runs were never joined.

In this project I read Word documents with the help of Aspose.Words. If certain text appears in a run, something has to be done. However, discovering the text is made impossible, if the text is distributed among several runs.

Thanks a lot in advance.

Michael G. Schneider

Hi Michael.
Thank you for your inquiry.
MS Word can split text in several runs even if they appear with identical format. That is known fact. Why it does so is a complex question. For instance this helps to handle change history (Undo/Redo). Aspose.Words doesn’t join runs found in the document. If you have any application logic relying on runs it would be better to consider paragraphs instead. They are fully predictable. Since you are searching for a substring in text runs you are locating particular places in the document. Most probably if you explain the whole task we could provide better suggestions.
In the attachment I put the example how MS Word splits the runs. In both docs I wrote one word “Apple”. In 1.doc I did it without any tricks and saved the document. In 2.doc I typed whole word, then made last two letters bold, then removed bold attribute and saved. These two letters fell in a separate run.
Regards,

Hello Viktor,

thanks a lot for the answer. I do understand the problem.

I did some more tests. Maybe I found something interesting. If I open the document with Aspose.Words, then perform a replace and change the word to itself

doc.Range.Replace(“blabla”, “blabla”, True, False)

and then save the document, the three runs will be joined. So it seems as if there were some function in Aspose.Words for joining those Runs. Can I call this function? Or maybe there is a function for replacing each word with itsself?

Regarding your question about what I am really trying to do: the programs task is to find certain visible markup in a Word file, and then delete some text. The text might for example be…

Lorem ipsum dolor sit [START]amet, consectetuer adipiscing elit. Quisque arcu elit, ultricies ut, tempor in, eleifend eget, ante. Etiam dui. Vivamus feugiat imperdiet enim. Integer ullamcorper tortor et ipsum. Etiam at dui. Maecenas dictum, odio[END] at consequat sagitti.

Anything between [START] and [END] would have to be deleted. There might be different fonts in between. However, I appreciated if each of the words [START] and [END] were not split into different Runs.

Michael G. Schneider

When we replace something in the text we don’t join runs. We remove parts of existing runs or whole runs and insert what is needed.
Your task is what I expected. If you are changing text between some [start] and [end] you will find useful approach with bookmarks. Type some text in your document. It can be split in several runs or even paragraphs, have different formatting. Then open dialog Insert → Bookmark. Type any string here; it will be your bookmark’s name. For instance “FragmentToSubst”. Note that bookmark names should be unique within the document.
Now you can work with this bookmark programmatically. See this article in Aspose.Words documentation:
https://docs.aspose.com/words/net/working-with-bookmarks/
Let me know whether it helps in you case.
Good luck,

Thanks a lot for the answer.

I do not know, what Aspose.Words does with runs internally. However, the result of a Replace seems promising. Suppose a Word document contains the Word

Michael

Due to the problems that you explained, it might be distributed on three runs

Run 1: Mic
Run 2: ha
Run 3: el

Now, if I perform a

doc.Range.Replace(“Michael”, “Michael”, True, False)

and then save the document, only one run with the full text will survive.

Run: Michael

I am currently playing with the ReplaceEvaluator. It seems possible to perform a document wide replacement with the help of a regular expression, and all unwanted runs are removed.

You are right. Bookmarks might be used for tasks like the one that I described. However, I do not like bookmarks because of two main reasons.

  1. The bookmarks’ names have to be unique. Often the variable (the condition controlling the substitution) appears in the document several times. Then you have create unque names for each appearance.
  2. The bookmarks are not visible. I have not been able to make the bookmark’s name appear next to the place where it is used. So it is difficult to read and understand such a document. The “logic” behind those bookmarks is not easily to be seen.

Michael G. Schneider

I believe that your task can be performed in many approaches. It’s nice that you have found one which fits your needs. Please feel free to ask any questions at any time. We are glad to help you.
Thank you,