Words are split across runs

dleifker · November 8, 2013, 1:25pm

Our company is evaluating Aspose.Words (Java) and we simply need to pull out all the italicized words from a document. No changes to the document are necessary.

The runs we get back are splitting words. For example, the word “Lesson” comes back as 2 runs, “L” and “esson”, even though “Lesson” is a single word in the document and has one style for the entire word. Attached is a short file that exhibits this behavior for me.

Previous posts on this Forum reveal that Word itself may, for complex reasons, split a word into multiple runs and that Aspose.Words will not merge them. Those posts were from many years ago, however, and I cannot find any recent messages on this topic.

Is there any update or known workaround now? If not, I have to write code that merges adjacent runs if they have the same style and are not separated by new paragraphs, white space, etc. Is this in fact the best solution? thank you.

awais.hafeez · November 10, 2013, 10:10am

Hi Daniel,

Thanks for your inquiry. Please use Document.JoinRunsWithSameFormatting optimization method to workaround this problem. Some documents contain adjacent runs with
same formatting. Usually this occurs if a document was intensively edited
manually. You can reduce the document size and speed up further processing by
joining these runs.

Best regards,

dleifker · November 10, 2013, 5:20pm

Awais: Thank you so much! This solved our problem completely. Our evaluation looks like it will be very successful now.

awais.hafeez · November 11, 2013, 1:46am

Hi Daniel,

It’s great you were able to find what you were looking for. Please let us know any time you have any further queries. We’re always glad to help you.

Best regards,