OOXML: Don't Repeat Yourself 30 Times!

It struck me from the first day I saw the OOXML specification how "strange" it is. You know borders in MS Word. I mean those text, paragraph and table borders. Complete specfication for a border takes 6 pages in OOXML. I'd say it is a bit too much for such a simple thing as a border, but that's okay, I guess the more detailed the spec the better.

The problem is that this 6 pages border specification is repeated in the OOXML document 30 times. Yes, for every left, right, bottom and top border and every type of object you get those 6 pages again and again.

As you know any decent programming book will tell you "don't repeat yourself" and "duplication is bad".

To me it looks like defining a simple base data type such as integer and repeating its full specification in every place in the documentation it gets used. I am completely lost at why this is the way things done in OOXML and borders is not the only thing that is duplicated.

By removing such basic duplication the OOXML spec could have been made a lot shorter. Just removing borders duplication could have made it almost 200 pages shorter.

It will be very interesting to see how OOXML spec turns out when it is final and published by ISO. I cannot find any info on this and my gut feel is that the duplication will remain and that will make it a very sad standard.

Here is more on DRY http://en.wikipedia.org/wiki/Don't_repeat_yourself.

On the bright side, the more complex the standard - the better for component vendors such as Aspose!

You've got to be kidding trying to work with OOXML documents without a decent class library such as Aspose.Words. It took us a year to support OOXML to a reasonable level of conformance (with all MS Word formats experience we had before) and we are still working on it.

Your thoughts seem to support the detractors of OOXML as a ISO standard for portable documents. All I can say is that I’m grateful that you folks have taken on the challenge of sorting through the spec, AND for supporting the ODF format (for output at least).

Thanks for comments.

ODT will be supported for export and import.

Whatever the standards are we will support all formats that users ask for.

OOXML and ODT both have enough technical flaws.

Повторение спецификаций может означать, что в будующем они будут различаться, или что MS не хотят, чтобы разработчики считали, что они будут всегда одинаковыми (чтобы развязать себе руки при создании новых версий спецификации).

Meh. DRY is good for software development, but not not necessarily a rule for informational/descriptive documents where repetition may in fact help the reader.



For instance, if I wanted to look at the spec for a paragraph, I wouldn’t want to go flipping back to where borders were described, or any of the other elements that might be part of a paragraph but described elsewhere.



You could always load up the spec document (using Aspose.Words) and produce a ‘linked’ rather than ‘aggregated’ version pretty easily.