- The Cocoon Architecture in Detail
- Advanced Sitemap Features
- Using the Command-Line Interface
- Practical Examples and Tips
- Wrapping Up the User Perspective
Practical Examples and Tips
This chapter has covered a lot of topics so far. Hopefully you have been able to use some of these new features to extend an application you already have built. We will now look at some examples and give you a few tips on getting the most out of Cocoon when you use it to build applications that other people might also use.
The two following examples help you understand the components and concepts presented so far. The first one is a small example showing you how to use the sql transformer to fetch data from a database. You might need to use the log transformer in this example if you have any problems connecting to the database. The second example is a bigger real-world example: the Cocoon Documentation System. This system uses nearly all the concepts explained so far.
We will then look at how you can make sure that your Cocoon application is set up to handle all the requests it might receive when you release it into a production environment.
A SQL Example
The following example requires a database that can be used from Java, so you need a JDBC driver. Instead of using your own database, you can use the included HSQLDB shipped with the Cocoon distribution. This database is completely written in Java and can be started automatically when Cocoon is run.
However, if you want to use your own database, you have to include a suitable driver. Put this driver class either in a JAR file in Cocoon's WEB-INF/lib directory or as a class file in the WEB-INF/classes directory. In order to make the driver available, you have to add it to the list of loaded classes in the web application deployment descriptor (web.xml), as shown in Listing 6.24. The parameter load-class gets a list of classes that are automatically loaded at startup.
Listing 6.24 Adding Drivers
<!-- This parameter is used to list classes that should be loaded at initialization time of the servlet. Usually these classes are JDBC Drivers used --> <init-param> <param-name>load-class</param-name> <param-value> <!-- For HSQLDB: --> org.hsqldb.jdbcDriver <!-- ODBC --> sun.jdbc.odbc.JdbcOdbcDriver </param-value> </init-param>
Next you have to add a connection to your database in cocoon.xconf. Listing 6.25 is an excerpt from cocoon.xconf that shows a custom connection called personnel.
Listing 6.25 Configuring Data Sources
<datasources> <jdbc name="personnel"> <dburl>jdbc:hsqldb:hsql://localhost:9002</dburl> <user>sa</user> <password></password> </jdbc> </datasources>
For this connection, you can define the URL to the database, the username, and the password. These three settings depend on which database you use. The user and password might be optional. If you want to use the HSQLDB, the values shown here should work right out of the box.
After you have defined your database connection, you can use it in the sql transformer by specifying the use-connection element for the transformer. Save the XML document shown in Listing 6.26 to the Cocoon context directory, and name it sqlexample.xml.
Listing 6.26 A Simple SQL Example
<document> <sql:execute-query xmlns:sql="http://apache.org/cocoon/SQL/2.0"> <sql:use-connection>personnel</sql:use-connection> <sql:query> select id,name from department </sql:query> </sql:execute-query> </document>
If you are using your own database, you might need to adjust the select statement. A stylesheet for the SQL data, transforming it to a simple HTML table, could look like Listing 6.27. Save this stylesheet, and name it sqlexample.xsl.
Listing 6.27 A Simple SQL Stylesheet
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sql="http://apache.org/cocoon/SQL/2.0"> <xsl:template match="document"> <html><body><table> <xsl:apply-templates select="sql:rowset/sql:row"/> </table></body></html> </xsl:template> <xsl:template match="sql:row"> <tr> <xsl:apply-templates/> </tr> </xsl:template> <xsl:template match="sql:id|sql:name"> <td> <xsl:value-of select="."/> </td> </xsl:template> </xsl:stylesheet>
Again, if you use a custom database and table, you might have to adjust the stylesheet to reflect different column names. The pipeline for this example is very simple. It is shown in Listing 6.28.
Listing 6.28 A Sample SQL Pipeline
<map:pipeline> <map:match pattern="sqldocument"> <map:generate src="sqlexample.xml"/> <map:transform type="sql"/> <map:transform src="sqlexample.xsl"/> <map:serialize/> </map:match> </map:pipeline>
Start your browser, and request http://localhost:8080/cocoon/sqldocument. You will get the XML data from the database displayed as an HTML table. If you are using your custom database and you face any problems, add the log transformer after the sql transformer to see what data is coming from your database.
Using databases with Cocoon is very easy, as you can see from this example. To demonstrate more of the features introduced in this chapter, we will now look at a larger working example.
The Cocoon Documentation System
One of the best sample applications using many of the features we have described in this and the previous chapters is the Cocoon Documentation System. Because Cocoon itself is an XML publishing framework, the documentation is, of course, generated by Cocoon. Some of the features the documentation system uses include content aggregation, subsitemaps, the cocoon protocol, and image generation using SVG. All these features allow the documentation to be written in a fashion that separates the content from the layout.
Because this example is rather complex and uses many resources, we will examine only the basic idea behind this system. In addition, we will look at some excerpts from each of the files. If you're interested in seeing more than what's presented here, the whole system can be found inside the Cocoon distribution.
The Cocoon documentation (see Figure 6.9) is served by a subsitemap that is independent of the main sitemap. (You will find the subsitemap and all other resources in the documentation directory of your Cocoon context directory.)
Figure 6.9 The Cocoon documentation.
The documentation is currently available in HTML. Each HTML page consists of a static header, a navigation bar on the left side, and the content for the current document on the right.
The navigation bar is created by an index, which is called a book in Cocoon. The documentation is arranged in several hierarchically nested books. There is one main book, and it contains documents and subbooks. You can compare this to a directory structure such as your filesystem. A book is similar to a directory: It has a name, and it contains documents (files) or other books (directories).
As you might have already guessed, each HTML document is created using content aggregation and the cocoon protocol. Let's have a look at the sitemap entries, shown in Listing 6.29.
Listing 6.29 An Excerpt from the Cocoon Documentation Sitemap
<map:pipeline> <map:match pattern="*.html"> <map:aggregate element="site"> <map:part src="cocoon:/book-{1}.xml"/> <map:part src="cocoon:/body-{1}.xml"/> </map:aggregate> <map:transform src="stylesheets/site2xhtml.xsl"> <map:parameter name="use-request-parameters" value="true"/> <map:parameter name="header" value="graphics/{1}-header.jpg"/> </map:transform> <map:serialize/> </map:match> <map:match pattern="**book-**.xml"> <map:generate src="xdocs/{1}book.xml"/> <map:transform src="stylesheets/book2menu.xsl"> <map:parameter name="use-request-parameters" value="true"/> <map:parameter name="resource" value="{2}.html"/> </map:transform> <map:serialize type="xml"/> </map:match> <map:match pattern="body-**.xml"> <map:generate src="xdocs/{1}.xml"/> <map:transform src="stylesheets/document2html.xsl"/> <map:serialize/> </map:match> </map:pipeline>
The document names available from these pipelines do not follow our recommendation: They use explicit endings such as .xml and .html. The HTML document is aggregated by two partsa part called book, and a part called body. The book part reads the current book and creates the navigation bar from it. This navigation bar is transformed by a stylesheet to partial XHTML.
The body part reads the real content from the XML document and transforms it into partial XHTML as well. The main pipeline for the document aggregates these two parts and combines the XHTML fragments using a stylesheet. It also adds the constant header.
The navigation bar and title displayed in the document's header are actually images. These images are rendered using SVG. We left out the pipelines for the images, but they are specified in the installed Cocoon application. Inside the Cocoon context directory is a directory called documentation. This directory contains a subsitemap named sitemap.xmap that contains all pipelines for the whole documentation system.
This example of a real application shows how a web site can be built very easily with Cocoon. By using content aggregation, you separate the different parts of one document and can maintain them more easily. Just take your time and have a look at this application and how it works. It will help you understand the concepts you have learned so far. You will also get a look behind the scenes of Cocoon's documentation system. Of course, one of the most important features of any Internet application, such as the documentation system or a portal built with Cocoon, is how fast the required information is returned. After all, no one wants to wait around for minutes until the browser displays the requested document. Cocoon provides two methods of speeding up the application: pipeline caching and component pooling.
The Cocoon Caching Mechanism
As you have seen, Cocoon generates documents using pipelines that contain a variety of components. You have seen that each time a request reaches a pipeline, the required document is generated and returned to the calling application. Using Cocoon's caching mechanism, you can control whether the document is actually generated or whether it can be returned from a cache. This speeds up the time it takes to return the document, because the pipeline does not have to be processed completely. Cocoon's caching algorithm is very flexible, but fortunately it is also very easy to handle. Let's start with a description of the caching algorithm.
Cocoon generates a stream pipeline for each request. This stream pipeline either is a reader or consists of an event pipeline and a serializer. The event pipeline in turn is assembled by a generator and the used transformers (if any).
Cocoon's caching algorithm can cache the result of a stream pipeline and/or an event pipeline. The caching for such a pipeline is turned on or off in cocoon.xconf (see Listing 6.30). Because everything in Cocoon is implemented using Avalon components, you simply specify which implementation for an event or stream pipeline should be used: the caching or the noncaching one. You will learn more about these components when we explain Cocoon from the developer perspective in Chapter 8.
Listing 6.30 Turning on Caching in cocoon.xconf
<event-pipeline class= "org.apache.cocoon.components.pipeline.CachingEventPipeline"/> <stream-pipeline class= "org.apache.cocoon.components.pipeline.CachingStreamPipeline"/>
These lines turn on caching for both pipelines. The code shown in Listing 6.31 turns it off. Of course, you can mix it and turn on caching for event pipelines but not for stream pipelines. If you want to change your setting, locate the lines for event-pipeline and stream-pipeline in your cocoon.xconf and change the class attribute.
Listing 6.31 Turning off Caching
<event-pipeline class= "org.apache.cocoon.components.pipeline.NonCachingEventPipeline"/> <stream-pipeline class= "org.apache.cocoon.components.pipeline.NonCachingStreamPipeline"/>
But what does it mean if caching is turned on? The following explanation is simplified for the user perspective. We will look at the full power of the caching algorithm in Chapter 8.
But for now, let's start with the stream pipelines. The result of a stream pipeline, for example, can be cached if it is a reader, which can cache. So we can redefine the question: When can a reader cache?
A reader (and this is also true for the other sitemap components, as you will soon see) can cache if it can detect that the content has changed since it was last read. For example, the resource reader reads a file. It can detect whether the file has changed by looking at what time the file was last changed.
So the first time the resource reader reads a document, the caching algorithm stores this document, along with the current time. The next time this document is requested, the caching algorithm provides this time to the reader, which simply checks whether the cached content is still valid. If it is, the cache serves the document. If it is not valid, the cached content is discarded, the reader reads the file again, and the cache stores this along with the current time.
But there are cases in which the reader cannot detect content changes, such as if it gets the read file via HTTP or any other connection. In this case, the reader can't support caching, so nothing is cached. This means that even though Cocoon provides a means of caching pipelines, it is still dependent on the data source to provide a means of determining whether the content has changed since it was last accessed.
If the stream pipeline consists of an event pipeline and a serializer, both parts must support caching. Most serializers in Cocoon support caching, because they are only dependent on the XML they receive from the event pipeline.
The question of whether an event pipeline can be cached is more complex, because the pipeline consists of several components. It is cacheable only if all the components are themselves cacheable. In the event pipeline, the caching algorithm asks each component if it supports caching, starting with the generator. For each component that supports it, a unique key is generated. Then the next pipeline component is queried. This process continues until either all components are queried or one component is not cacheable.
The keys of all cacheable components are chained, and together they build the cache key. The request is processed, and the document is built. The cache stores the result of the last component, indicating cacheability. The next time this document is requested, the key is built, and the cached content is fetched from the cache.
Next, the cache asks all components of the event pipeline if their input has changed since the time the content was cached. For example, the generator checks this by looking at the last modification date of the XML document, the xslt transformer checks the date of the stylesheet, and so on. Only if all state that the content is still valid is it used from the cache. Otherwise, the document is generated from scratch. So the event pipeline tries to cache as much of the XML processing pipeline as possible.
Caching the pipeline results and being able to return them as fast as possible is perhaps the key factor to whether an Internet application built with Cocoon will be successful and whether people will like using it. Cocoon's built-in caching already provides a powerful mechanism for doing this and should be used whenever possible. Another important factor in any component-based system is the performance at which new components are created when they are needed.
Pooling Your Components
Nearly everything inside Cocoon is an Avalon component. Without going into too much detail about the Avalon component model and the life cycle of components, we'll explain how you can fine-tune your application in this area.
For each request received by Cocoon, a lot of Avalon components are generatedone event pipeline, one stream pipeline, one generator, one or more transformers, and a serializer. (In fact, there are more, but these will do for the moment.)
If several documents are requested at the same time, this set of components is created for each request. For example, if 50 documents are requested simultaneously, you end up with 50 event pipelines, 50 stream pipelines, 50 generators, and so on.
One of the most time-consuming operations in Java is the creation and destruction of new objects. Therefore, the Avalon component model supports the pooling of objects. This means that a component is created once, locked when used inside a request processing, and released for further use after the request is processed. It is not destroyed and can be reused for the next request.
If only one request at a time is processed, such a pooled component is created once, locked for this request, used for this request, and released afterwards. When the next request arrives, the same process starts again.
If more than one request is processed at the same time, a pooled component must be created for each request. If 50 requests arrive simultaneously, 50 components must be created. If they all can be pooled, the pool grows to 50 components. At first glance, this seems desirable, but imagine that one day 1000 requests are processed simultaneously. You end up having 1000 components in your pool, although the average of simultaneous requests is less.
In order to adjust your application to the load you might have, you can control the pooling of the Avalon components. You can define how many components are to be stored inside the pool by specifying a minimum and maximum number, as well as how the pool should grow if no free component is available from the pool. If your pool reaches the maximum, but there are more requests to serve, Avalon creates new components to process the request, but these components are discarded afterwards and are not added to the pool.
The configuration of this pooling is on a per-component basis. So you set the values separately for each componentfor the stream pipeline, for the event pipeline, for the file generator, and so on. Listing 6.32 shows a sample pooling configuration.
Listing 6.32 An Example of a Pooling Configuration
<stream-pipeline class= "org.apache.cocoon.components.pipeline.CachingStreamPipeline" pool-max="32" pool-min="16" pool-grow="4"/> <generator name="file" src="org.apache.cocoon.generation.FileGenerator" pool-max="64" pool-min="16" pool-grow="4"/>
In Listing 6.32, you see the configuration for the stream pipeline, which is done in cocoon.xconf, and for the file generator, taken from the sitemap. Remember that both the sitemap and cocoon.xconf contain components that are based on Avalon and therefore can be pooled.
Both configurations are similar in that they use three special attributes. pool-min defines the minimum number of components in the pool. When the pool is instantiated, this number of components is created at startup. pool-max defines the maximum number of components to hold in the pool. pool-grow gives the number by which the pool increases each time no free component is available.
If you set the log level to DEBUG, you can see if your pools are too small by searching for a message containing the phrase "decommissioning instance of." This message is output each time a poolable instance is created when the pool has reached maximum capacity. The component's class name follows the phrase, so it is possible to adjust the setting for exactly this component.
With the tips on caching and component pooling, we have covered the two most important ways to make a Cocoon application as fast as possible. These features are provided by Cocoon and can be used in different application scenarios. Depending on the type of application being built, other factors can influence the application's performance. We will cover some further aspects when we talk about different types of applications in Chapter 11, "Designing Cocoon Applications."