Separation of content and format
If there's a single characteristic that impedes the effectiveness of traditional authoring toolssuch as word processorsit is their focus on formatting. Traditional tools have been designed to make it easy for authors to make documents look good. In doing so, they have turned authors into desktop publishers. But from the perspective of reuse, this is not a good thing.
First, word processors began life as an alternative to typewriters. They allowed authors to make documents attractive and potentially, more usable. But they were still very typewriter-like because their focus was the current document. Authors entered the characters that formed the content, then selected the characters to apply the formatting. This wasn't very effective for repeating formatting in a document; it relied on authors remembering that a section title was 18 pt. Helvetica Bold centered on a 36 pica line. The result was inconsistent formatting for all but the most dedicated authors. Later software versions allowed authors to create "styles": formatting that was defined and given a name to apply as required. But even now, none of the word processors provide any functionality to ensure that the formatting remains constant. Authors can define new styles, redefine existing styles, or ignore them altogether. Consistencyor more accurately, predictabilityin the application of style names is vital for reuse.
Second, all that formatting power comes with a price. Simply put, you end up with big, bloated data files that contain not only the content, but also all the details of the formatting. Further, that formatting is specific to the output that the tool is designed to support. Most word processing applications, not surprisingly, have a bias toward paper. What makes this a complication for reuse is that you need a way to remove this formatting to make the content independent of output. To reuse the content, authors must apply formatting that is appropriate for each output. Stripping and reapplying formatting is tricky and usually not 100% effective. Format conversions always require correction by hand or complicated scripting.
For reuse, XML has a significant advantage over traditional word processors. XML stems from the originating goal of making documents transportable across systems and applications. The proponents of markup languages knew that the embedded formatting commands and binary file formats were the main impediment to cross-platform transportability. The solution was to separate the format from the content. XML focuses on the structure of a document, not the presentation. The presentation information (styles) is maintained in separate files that are associated with the document when it is published or used.
The separation of content and format offers immense flexibility. For example, the example XML procedure includes a <note> element. Traditionally, this is formatted for output something like what is shown in Figure 1.
Figure 1 Simple formatted note.
The signal word "NOTE" is not part of the actual content; it has been provided through the style sheet. The keyword could easily be replaced by an signal icon, again through the style sheet (see Figure 2).
Figure 2 Note with signal icon.