Know Your Content
We have already mentioned that the XML parser used in Cocoon can validate the XML data it parses. It can do this using DTDs or XML Schemas. When building the application, you will probably not yet have a DTD for all your data. This means that you cannot use XML validation in Cocoon, because you can only activate it for all the documents, not for an individual one. Even if you do not use the parser to validate the data, you should document your XML using either a DTD or an XML Schema before moving the application into a production environment. (Of course, the earlier the data's format is documented, the better.)
As more and more XML tools come onto the market, they begin to offer advanced features such as automatically validating the data you enter into, say, an editor. Now, suppose you have a Cocoon-based system and have authors who are writing content for that system. Often, they will use third-party tools to do this and then upload the content to the system or deploy it through some other means (perhaps saving it to a database). Obviously this is ideal if you can provide these authors with a DTD of the data. They can then use the DTD inside their editing program, and you know that the data they submit will be in a format you expect and have written stylesheets for.
While the designers are working on developing the stylesheets that will present the data, that data also needs to be defined and documented.
Document Your Data Sources
We talked briefly about external data sources when we discussed application performance. However, other factors also need to be taken into account when data is obtained from an external provider, such as a news feed.
Obviously, the most important fact is that you know exactly what format the data will be in. The best way to achieve this is if the data's format is documented in some way, such as in a DTD. You read about the various ways of documenting XML data in Chapter 2. It is an enormous advantage if your provider can send you the data in a standardized format. This becomes a great time-saver if you have to integrate several sources and they all can provide the data in the same format. It will then be possible to reuse the stylesheets. This is true of the news providers we looked at when building the Cocoon news portal in this book. Because the news is provided in RSS format, you could use the same stylesheet for several different feeds.
When designing the flow of data through your application, you need to consider two important points. The first point is the internal data definition. As shown in Figure 11.1, this is the format of the news data in your application. Every external data format needs to be converted into this format, so you need a stylesheet for every data source. Obviously it makes sense to choose a standardized format as your own internal format. This reduces the number of transformations you need, because not every external source that already supports your internal format needs a stylesheet transformation.
The next step is to define a logical layout format. News data is not normally structured for presentation, so you need to think about defining a format that allows transformations into the end format, such as HTML or PDF. If your application is not limited to publishing just news data, but it also publishes other types of information, you will want to look into defining a logical layout format that is not data-specific. This lets you easily publish different types of data using the same stylesheets.
If you opt to use a standard format such as WML or XHTML as your logical layout format, make sure you will still be able to convert this format into a different layout, as shown on the right side of Figure 11.1.
This concept leaves you with three different transition areas:
Incoming data must be transformed into your news data format.
The news format must then be transformed into the logical layout format.
The last area of transformation is into the regular output format.
Figure 11.1 Format transitions using stylesheets.
Check to see whether your data source is always online. Nothing is more embarrassing than finding out that your news provider is online only during the day when your news portal crashes the first night. Use appropriate selectors in the pipeline to ensure that you access the online server during the day and perhaps a database repository at night.
Make sure you can obtain the data you need with the least number of requests possible. We have seen a Cocoon-based application built to present stock information in which one block of information (such as an overview page) required the middleware solution to perform more than 20 requests against the data provider. Even worse, most of these requests had to be sent in order, because they were dependent on each other. The problem isn't that this can't be done with Cocoonit can. But if you remember the earlier tips on performance, perhaps you will see why this point is worth stressing.
After you've defined the functions your system should have, the layout you want to present to the user, and the data format that is to be the core of your application, you need to look at the Cocoon components you can use to do all this.