Publishers of technology books, eBooks, and videos for creative people

Home > Articles > Web Design & Development

  • Print
  • + Share This
  • 💬 Discuss
From the author of

From the author of

Creating XSLT Style Sheets

XSLT transformations accept a document tree as input and produce a tree as output. From the XSLT point of view, documents are trees built of nodes, and there are seven types of nodes XSLT recognizes; here are those nodes, and how XSLT processors treat them:

    Node

    Description

    Document root

    Is the very start of the document

    Attribute

    Holds the value of an attribute after entity references have been expanded and surrounding whitespace has been trimmed

    Comment

    Holds the text of a comment, not including <!-- and -->

    Element

    Consists of all character data in the element, which includes character data in any of the children of the element

    Namespace

    Holds the namespace’s URI

    Processing instruction

    Holds the text of the processing instruction, which does not include <? and ?>

    Text

    Holds the text of the node

To indicate what node or nodes you want to work on, XSLT supports various ways of matching or selecting nodes. For example, the character / stands for the root node. To get us started, I’ll create a short example here that will replace the root node—and, therefore, the whole document—with an HTML page.

As you might expect, XSLT style sheets must be well-formed XML documents, so you start a style sheet with the XML declaration. Next, you use a <stylesheet> element; XSLT style sheets use the namespace xsl, which, now that XSLT has been standardized, corresponds to http://www.w3.org/1999/ XSL/Transform. You must also include the version attribute in the <stylesheet> element, setting that attribute to the only current version, 1.0:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    .
    .
    .

That’s how you start an XSLT style sheet (in fact, if you’re using a standalone program that requires you to give the name of the style sheet you’re using, you can usually omit the <xsl:stylesheet> element). To work with specific nodes in an XML document, XSLT uses templates. When you match or select nodes, a template tells the XSLT processor how to transform the node for output. In this example, I want to replace the root node with a whole new HTML document, so I start by creating a template with the <xsl:template> element, setting the match attribute to the node to match, "/":

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"xmlns:xsl=">http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/">
    .
    .
    .
    </xsl:template>
</xsl:stylesheet>

When the root node is matched, the template is applied to that node. In this case, I want to replace the root node with an HTML document, so I just include that HTML document directly as the content of the <xsl:template> element:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/">
        <HTML>
            <HEAD>
                <TITLE>
                    A trivial transformation
                </TITLE>
            </HEAD>
            <BODY>
                This transformation has replaced
                the entire document.
            </BODY>
        </HTML>
    </xsl:template>
</xsl:stylesheet>

And that’s all it takes; by using the <xsl:template> element, I’ve set up a rule in the style sheet. When the XSL processor reads the document, the first node that it sees is the root node. This rule matches that root node, so the XSL processor replaces it with the HTML document, producing this result:

<HTML>
    <HEAD>
        <TITLE>
            A trivial transformation
        </TITLE>
    </HEAD>
    <BODY>
        This transformation has replaced
        the entire document.
    </BODY>
</HTML>

That’s our first, rudimentary transformation. All we’ve done is replace the entire document with another one. But, of course, that’s just the beginning.

The xsl:apply-templates Element

The template I used in the previous section applied to only one node—the root node—and performed a trivial action, replacing the entire XML document with an HTML document. However, you can also apply templates to the children of a node that you’ve matched, and you do that with the <xsl:apply-templates> element.

For example, say that I want to convert planets.xml to HTML. The document node in that document is <PLANETS>, so I can match that element with a template, setting the match attribute to the name of the element I want to match. Then I replace the <PLANETS> element with an <HTML> element, like this:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="PLANETS">
        <HTML>
    .
    .
    .
        </HTML>
    </xsl:template>
    .
    .
    .
</xsl:stylesheet>

But what about the children of the <PLANETS> element? To make sure that they are transformed correctly, you use the <xsl:apply-templates> element this way:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="PLANETS">
        <HTML>
            <xsl:apply-templates/>
        </HTML>
    </xsl:template>
    .
    .
    .
</xsl:stylesheet>

Now you can provide templates for the child nodes. In this case, I’ll just replace each of the three <PLANET> elements with some text, which I place directly into the template for the <PLANET> element:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="PLANETS">
        <HTML>
            <xsl:apply-templates/>
        </HTML>
  </xsl:template>

    <xsl:template match="PLANET">
        <P>
            Planet data will go here....
        </P>
    </xsl:template>
</xsl:stylesheet>

And that’s it; now the <PLANETS> element is replaced by an <HTML> element, and the <PLANET> elements are also replaced:

<HTML>

    <P>
        Planet data will go here....
    </P>

    <P>
        Planet data will go here....
    </P>

    <P>
        Planet data will go here....
    </P>
</HTML>

You can see that this transformation works, but it’s still less than useful; all we’ve done is replace the <PLANET> elements with some text. What if we wanted to access some of the data in the <PLANET> element? For example, say that we wanted to place the text from the <NAME> element in each <PLANET> element in the output document:

    <PLANET>
        <NAME>Mercury</NAME>
        <MASS UNITS="(Earth = 1)">.0553</MASS>
        <DAY UNITS="days">58.65</DAY>
        <RADIUS UNITS="miles">1516</RADIUS>
        <DENSITY UNITS="(Earth = 1)">.983</DENSITY>
        <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->
    </PLANET>

To gain access to this kind of data, you can use the select attribute of the <xsl:value-of> element.

Getting the Value of Nodes with xsl:value-of

In this example, I’ll extract the name of each planet and insert that name into the output document. To get the name of each planet, I’ll use the <xsl:value-of> element in a template targeted at the <PLANET> element, and I’ll select the <NAME> element with the select attribute like this:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="PLANETS">
        <HTML>
            <xsl:apply-templates/>
        </HTML>
    </xsl:template>

    <xsl:template match="PLANET">
        <xsl:value-of select="NAME"/>
    </xsl:template>
</xsl:stylesheet>

Using select like this, you can select nodes. The select attribute is much like the match attribute of the <xsl:template> element, except that the select attribute is more powerful. With it, you can specify the node or nodes to select using the full XPath XML specification, as we’ll see later in this chapter. The select attribute is an attribute of the <xsl:apply-templates>, <xsl:value-of>, <xsl:for-each>, and <xsl:sort> elements, all of which we’ll also see in this chapter.

Applying the previous style sheet, the <xsl:value-of select="NAME"/> element directs the XSLT processor to insert the name of each planet into the output document, so that document looks like this:

<HTML>

  Mercury

  Venus

  Earth
</HTML>

Handling Multiple Selections with xsl:for-each

The select attribute selects only the first node that matches its selection criterion. What if you have multiple nodes that could match? For example, say that you can have multiple <NAME> elements for each planet:

<PLANET>
    <NAME>Mercury</NAME>
    <NAME>Closest planet to the sun</NAME>
    <MASS UNITS="(Earth = 1)">.0553</MASS>
    <DAY UNITS="days">58.65</DAY>
    <RADIUS UNITS="miles">1516</RADIUS>
    <DENSITY UNITS="(Earth = 1)">.983</DENSITY>
    <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->
</PLANET>

The <xsl:value-of> element’s select attribute by itself will select only the first <NAME> element; to loop over all possible matches, you can use the <xsl:for-each> element like this:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="PLANETS">
        <HTML>
            <xsl:apply-templates/>
        </HTML>
    </xsl:template>

<xsl:template match="PLANET">
    <xsl:for-each select="NAME">
        <P>
            <xsl:value-of select="."/>
        </P>
    </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

This style sheet will catch all <NAME> elements, place their values in a <P> element, and add them to the output document, like this:

<HTML>

    <P>Mercury</P>
    <P>Closest planet to the sun</P>

    <P>Venus</P>

    <P>Earth</P>
</HTML>

We’ve seen now that you can use the match and select attributes to indicate what nodes you want to work with. The actual syntax that you can use with these attributes is fairly complex but worth knowing. I’ll take a look at the match attribute in more detail first, and I’ll examine the select attribute later in this chapter.

Specifying Patterns for the match Attribute

You can use an involved syntax with the <xsl:template> element’s match attribute, and an even more involved syntax with the select attribute of the <xsl:apply-templates>, <xsl:value-of>, <xsl:for-each>, <xsl:copy-of>, and <xsl:sort> elements. We’ll see them both in this chapter, starting with the syntax you can use with the match attribute.

Matching the Root Node

As we’ve already seen, you can match the root node with /, like this:

<xsl:template match="/">
    <HTML>
        <xsl:apply-templates/>
    </HTML>
</xsl:template>

Matching Elements

You can match specific XML elements simply by giving their name, as we’ve also seen:

<xsl:template match="PLANETS">
    <HTML>
        <xsl:apply-templates/>
    </HTML>
</xsl:template>

Matching Children

You can use the / operator to separate element names when you want to refer to a child of a particular node. For example, say that you wanted to create a rule that applies only to <NAME> elements that are children of <PLANET> elements. In that case, you can match to the expression "PLANET/NAME". Here’s a rule that will surround the text of such elements in an <H3> element:

<xsl:template match="PLANET/NAME">
  <H3><xsl:value-of select="."/></H3>
</xsl:template>

Notice the expression "." here. You use "." with the select attribute to specify the current node, as we’ll see when discussing the select attribute.

You can also use the * character as a wildcard, standing for any element (* can match only elements). For example, this rule applies to all <NAME> elements that are grandchildren of <PLANET> elements:

<xsl:template match="PLANET/*/NAME">
  <H3><xsl:value-of select="."/></H3>
</xsl:template>

Matching Element Descendants

In the previous section, I used the expression "PLANET/NAME" to match all <NAME> elements that are direct children of <PLANET> elements, and I used the expression "PLANET/*/NAME" to match all <NAME> elements that are grandchildren of <PLANET> elements. However, there’s an easier way to perform both matches: Just use the expression "PLANET//NAME", which matches all <NAME> elements that are inside <PLANET> elements, no matter how many levels deep. (The matched elements are called descendants of the <PLANET> element). In other words, "PLANET//NAME" matches "PLANET/NAME", "PLANET/*/NAME", "PLANET/*/*/NAME", and so on:

<xsl:template match="PLANETS//NAME">
  <H3><xsl:value-of select="."/></H3>
</xsl:template>

Matching Attributes

You can match attributes if you preface their name with @. Here’s an example; in this case, I’ll display the data in planets.xml in an HTML table. You might note, however, that the units for the various measurements are stored in attributes, like this:

<PLANET>
    <NAME>Earth</NAME>
    <MASS UNITS="(Earth = 1)">1</MASS>
    <DAY UNITS="days">1</DAY>
    <RADIUS UNITS="miles">2107</RADIUS>
    <DENSITY UNITS="(Earth = 1)">1</DENSITY>
    <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion-->
</PLANET>

To recover the units and display them as well as the values for the mass and so on, I’ll match the UNITS attribute with @UNITS. Here’s how that looks—note that I’m using the element <xsl:text> element to insert a space into the output document (more on <xsl:text> later):

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="/PLANETS">
        <HTML>
            <HEAD>
                <TITLE>
                    The Planets Table
                </TITLE>
            </HEAD>
            <BODY>
                <H1>
                    The Planets Table
                </H1>
                <TABLE>
                    <TD>Name</TD>
                    <TD>Mass</TD>
                    <TD>Radius</TD>
                    <TD>Day</TD>
                    <xsl:apply-templates/>
                </TABLE>
            </BODY>
        </HTML>
    </xsl:template>

    <xsl:template match="PLANET">
       <TR>
          <TD><xsl:value-of select="NAME"/></TD>
          <TD><xsl:apply-templates select="MASS"/></TD>
          <TD><xsl:apply-templates select="RADIUS"/></TD>
       </TR>
   </xsl:template>

    <xsl:template match="MASS">
        <xsl:value-of select="."/>
        <xsl:text> </xsl:text>
        <xsl:value-of select="@UNITS"/>
    </xsl:template>

    <xsl:template match="RADIUS">
        <xsl:value-of select="."/>
        <xsl:text> </xsl:text>
        <xsl:value-of select="@UNITS"/>
    </xsl:template>

    <xsl:template match="DAY">
        <xsl:value-of select="."/>
        <xsl:text> </xsl:text>
        <xsl:value-of select="@UNITS"/>
    </xsl:template>
</xsl:stylesheet>

Now the resulting HTML table includes not only values, but also their units of measurement. (The spacing leaves a little to be desired, but HTML browsers will have no problem with it; we’ll take a look at ways of handling whitespace later in this chapter.)

<HTML>
<HEAD>
<TITLE>
                    The Planets Table
                </TITLE>
</HEAD>
<BODY>
<H1>
                    The Planets Table
                </H1>
<TABLE>
<TD>Name</TD><TD>Mass</TD><TD>Radius</TD><TD>Day</TD>

    <TR>
<TD>Mercury</TD><TD>.0553 (Earth = 1)</TD><TD>1516 miles</TD>
</TR>

    <TR>
<TD>Venus</TD><TD>.815 (Earth = 1)</TD><TD>3716 miles</TD>
</TR>

    <TR>
<TD>Earth</TD><TD>1 (Earth = 1)</TD><TD>2107 miles</TD>
</TR>

</TABLE>
</BODY>
</HTML>

You can also use the @* wildcard to select all attributes of an element. For example, "PLANET/@*" selects all attributes of <PLANET> elements.

Matching by ID

You can also match elements that have a specific ID value using the pattern id(). To use this selector, you must give elements an ID attribute, and you must declare that attribute of type ID, as you can do in a DTD. Here’s an example rule that adds the text of all elements that have the ID Christine:

<xsl:template match = "id(‘Christine’)">
    <H3><xsl:value-of select="."/></H3>
</xsl:template>

Matching Comments

You can match the text of comments with the pattern comment(). You should not store data that should go into the output document in comments in the input document, of course. However, you might want to convert comments from the <!--comment--> form into something another markup language might use, such as a <COMMENT> element.

Here’s an example; planet.xml was designed to include comments so that we could see how to extract them:

<PLANET>
    <NAME>Venus</NAME>
    <MASS UNITS="(Earth = 1)">.815</MASS>
    <DAY UNITS="days">116.75</DAY>
    <RADIUS UNITS="miles">3716</RADIUS>
    <DENSITY UNITS="(Earth = 1)">.943</DENSITY>
    <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion-->
</PLANET>

To extract comments and put them into <COMMENT> elements, I’ll include a rule just for comments:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="PLANETS">
        <HTML>
            <xsl:apply-templates/>
        </HTML>
    </xsl:template>

<xsl:template match="comment()">
    <COMMENT>
        <xsl:value-of select="."/>
    </COMMENT>
</xsl:template>
</xsl:stylesheet>

Here’s what the result is for Venus, where I’ve transformed the comment into a <COMMENT> element:

Venus
.815
116.75
3716
.943

66.8<COMMENT>At perihelion</COMMENT>

Note that the text for the other elements in the <PLANET> element is also inserted into the output document. The reason for that is that the default rule for each element is to include its text in the output document. Because I haven’t provided a rule for elements, their text is simply included in the output document. I’ll take a closer look at default rules later in the chapter.

Matching Text Nodes with text()

You can match the text in a node with the pattern text(). There’s really not much reason to ever use text(), however, because XSLT includes a default rule: If there are no other rules for a text node, the text in that node is inserted into the output document. If you were to make that default rule explicit, it might look like this:

<xsl:template match="text()">
    <xsl:value-of select="."/>
</xsl:template>

You can override this rule by not sending the text in text nodes to the output document, like this:

<xsl:template match="text()">
</xsl:template>

In the previous example, you can see that a great deal of text made it from the input document to the output document because there was no explicit rule besides the default one for text nodes—the only output rule that I used was for comments. If you turn off the default rule for text nodes by adding the previous two lines to the version of planets.xsl used in the previous example, the text of those text nodes does not go into the output document. This is the result:

<HTML>
<COMMENT>At perihelion</COMMENT>
<COMMENT>At perihelion</COMMENT>
<COMMENT>At perihelion</COMMENT>
</HTML>

Matching Processing Instructions

You can use the pattern processing-instruction() to match processing instructions.

<xsl:template match="/processing-instruction()">
    <I>
        Found a processing instruction.
    </I>
</xsl:template>

You can also specify what processing instruction you want to match by giving the name of the processing instruction (excluding <? and ?>), as in this case, where I’m matching the processing instruction <?xml-include?>:

<xsl:template match="/processing-instruction(xml-include)">
    <I>
        Found an xml-include processing instruction.
    </I>
</xsl:template>

One of the major reasons that XML makes a distinction between the root node (at the very beginning of the document) and the document node is so that you have access to the processing instructions and other nodes in the document’s prolog.

Using the Or Operator

You can match to a number of possible patterns, which is very useful when your documents get a little more involved than the ones we’ve been using so far in this chapter. Here’s an example; in this case, I want to display <NAME> and <MASS> elements in bold, which I’ll do with the HTML <B> tag. To match either <NAME> or <MASS> elements, I’ll use the Or operator, which is a vertical bar (|), in a new rule, like this:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="PLANETS">
        <HTML>
            <xsl:apply-templates/>
        </HTML>
    </xsl:template>

    <xsl:template match="PLANET">
        <P>
            <xsl:apply-templates/>
        </P>
    </xsl:template>

    <xsl:template match="NAME | MASS">
        <B>
            <xsl:apply-templates/>
        </B>
    </xsl:template>
</xsl:stylesheet>

Here are the results; note that the name and mass values are both enclosed in <B> elements. (Also note that, because of the XSL default rules, the text from the other child elements of the <PLANET> element is also displayed.)

<HTML>

  <P>
    <B>Mercury</B>
    <B>.0553</B>
    58.65
    1516
    .983
    43.4
  </P>

  <P>
    <B>Venus</B>
    <B>.815</B>
    116.75
    3716
    .943
    66.8
  </P>

  <P>
    <B>Earth</B>
    <B>1</B>
    1
    2107
    1
    128.4
  </P>
</HTML>

You can use any valid pattern with the | operator, such as expressions like PLANET | PLANET//NAME, and you can use multiple | operators, such as NAME | MASS | DAY, and so on.

Testing with []

You can use the [] operator to test whether a certain condition is true. For example, you can test the following:

  • The value of an attribute in a given string

  • The value of an element

  • Whether an element encloses a particular child, attribute, or other element

  • The position of a node in the node tree

Here are some examples:

  • This expression matches <PLANET> elements that have child <NAME> elements:

      <xsl:template match = "PLANET[NAME]">

  • This expression matches any element that has a <NAME> child element:

      <xsl:template match = "*[NAME]">

  • This expression matches any <PLANET> element that has either a <NAME> or a <MASS> child element:

      <xsl:template match="PLANET[NAME | MASS]">

Say that we gave the <PLANET> elements in planets.xml a new attribute—COLOR—which holds the planet’s color:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="planets.xsl"?>
<PLANETS>

  <PLANET COLOR="RED">
    <NAME>Mercury</NAME>
    <MASS UNITS="(Earth = 1)">.0553</MASS>
    <DAY UNITS="days">58.65</DAY>
    <RADIUS UNITS="miles">1516</RADIUS>
    <DENSITY UNITS="(Earth = 1)">.983</DENSITY>
    <DISTANCE UNITS="million miles">43.4</DISTANCE><!--At perihelion-->
  </PLANET>

  <PLANET COLOR="WHITE">
    <NAME>Venus</NAME>
    <MASS UNITS="(Earth = 1)">.815</MASS>
    <DAY UNITS="days">116.75</DAY>
    <RADIUS UNITS="miles">3716</RADIUS>
    <DENSITY UNITS="(Earth = 1)">.943</DENSITY>
    <DISTANCE UNITS="million miles">66.8</DISTANCE><!--At perihelion-->
  </PLANET>

  <PLANET COLOR="BLUE">
    <NAME>Earth</NAME>
    <MASS UNITS="(Earth = 1)">1</MASS>
    <DAY UNITS="days">1</DAY>
    <RADIUS UNITS="miles">2107</RADIUS>
    <DENSITY UNITS="(Earth = 1)">1</DENSITY>
    <DISTANCE UNITS="million miles">128.4</DISTANCE><!--At perihelion-->
  </PLANET>
</PLANETS>

This expression matches <PLANET> elements that have COLOR attributes:

<xsl:template match="PLANET[@COLOR]">

What if you wanted to match planets whose COLOR attribute was BLUE? You can do that with the = operator, like this:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="PLANETS">
        <HTML>
            <xsl:apply-templates/>
        </HTML>
    </xsl:template>

    <xsl:template match="PLANET[@COLOR = ‘BLUE’]">
            The <xsl:value-of select="NAME"/> is blue.
    </xsl:template>

    <xsl:template match="text()">
    </xsl:template>
</xsl:stylesheet>

This style sheet filters out all planets whose color is blue and omits the others by turning off the default rule for text nodes. Here’s the result:

<HTML>
        The Earth is blue.
</HTML>

In fact, the expressions you can use in the [] operators are W3C XPath expressions. XPath expressions give you ways of specifying nodes in an XML document using a fairly involved syntax. And because the select attribute, which we’re about to cover, uses XPath, I’ll take a look at XPath as well.

Specifying Patterns for the select Attribute

I’ve taken a look at the kinds of expressions that you can use with the <xsl:template> element’s match attribute. You can use an even more involved syntax with the select attribute of the <xsl:apply-templates>, <xsl:value-of>, <xsl:for-each>, <xsl:copy-of>, and <xsl:sort> elements.

The select attribute uses XPath expressions, which is a W3C recommendation as of November 16, 1999. You can find the XPath specification at www.w3.org/TR/xpath.

We’ve seen that you can use the match attribute to find nodes by name, child element(s), attributes, or even descendant. We’ve also seen that you can make some tests to see whether elements or attributes have certain values. You can do all that and more with the XPath specification supported by the select attribute, including finding nodes by parent or sibling elements, as well as much more involved tests. XPath is much more of a true language than the expressions you can use with the match attribute; for example, XPath expressions can return not only lists of nodes, but also Boolean, string, and numeric values.

The XML for Java package has a handy example program, ApplyXPath.java, that enables you to apply an XPath expression to a document and see what the results would be. This is great for testing. For example, if I applied the XPath expression "PLANET/NAME" to planets.xml, here is what the result would look like, displaying the values of all <NAME> elements that are children of <PLANET> elements (the <output> tags are added by ApplyXPath):

%java ApplyXPath planets.xml PLANET/NAME
<output>
<NAME>Mercury</NAME><NAME>Venus</NAME><NAME>Earth</NAME></output>

XPath expressions are more powerful than the match expressions we’ve seen; for one thing, they’re not restricted to working with the current node or child nodes because you can work with parent nodes, ancestor nodes, and more. Specifying what node you want to work in relation to is called

specifying an axis in XPath. I’ll take a look at XPath syntax in detail next.

Understanding XPath

To specify a node or set of nodes in XPath, you use a location path. A location path, in turn, consists of one or more location steps, separated by / or //. If you start the location path with /, the location path is called an absolute location path because you’re specifying the path from the root node; otherwise, the location path is relative, starting with the current node, which is called the context node. Got all that? Good, because there’s more.

A location step is made up of an axis, a node test, and zero or more predicates. For example, in the expression child::PLANET[position() = 5], child is the name of the axis, PLANET is the node test, and [position() = 5] is a predicate. You can create location paths with one or more location steps, such as /descendant::PLANET/child::NAME, which selects all the <NAME> elements that have a <PLANET> parent. The best way to understand all this is by example, and we’ll see plenty of them in a few pages. In the meantime, I’ll take a look at what kind of axes, node tests, and predicates XPath supports.

XPath Axes

In the location path child::NAME, which refers to a <NAME> element that is a child of the current node, the child is called the axis. XPath supports many different axes, and it’s important to know what they are. Here’s the list:

    Axis

    Description

    ancestor

    Holds the ancestors of the context node. The ancestors of the context node are the parent of context node and the parent’s parent and so forth, back to and including the root node.

    ancestor-or-self

    Holds the context node and the ancestors of the context node.

    attribute

    Holds the attributes of the context node.

    child

    Holds the children of the context node.

    descendant

    Holds the descendants of the context node. A descendant is a child or a child of a child, and so on.

    descendant-or-self

    Contains the context node and the descendants of the context node.

    following

    Holds all nodes in the same document as the context node that come after the context node.

    following-sibling

    Holds all the following siblings of the context node. A sibling is a node on the same level as the context node.

    namespace

    Holds the namespace nodes of the context node.

    parent

    Holds the parent of the context node.

    preceding

    Contains all nodes that come before the context node.

    preceding-sibling

    Contains all the preceding siblings of the context node. A sibling is a node on the same level as the context node.

    self

    Contains the context node.

You can use axes to specify a location step or path, as in this example, where I’m using the child axis to indicate that I want to match to child nodes of the context node, which is a <PLANET> element. (We’ll see later that an

abbreviated version lets you omit the child:: part.)

<xsl:template match="PLANET">
    <HTML>
        <CENTER>
            <xsl:value-of select="child::NAME"/>
        </CENTER>
        <CENTER>
            <xsl:value-of select="child::MASS"/>
        </CENTER>
        <CENTER>
            <xsl:value-of select="child::DAY"/>
        </CENTER>
    </HTML>
</xsl:template>

In these expressions, child is the axis, and the element names NAME, MASS, and DAY are node tests.

XPath Node Tests

You can use names of nodes as node tests, or you can use the wild card * to select element nodes. For example, the expression child::*/child::NAME selects all <NAME> elements that are grandchildren of the context node. Besides nodes and the wild card character, you can also use these node tests:

    Node Test

    Description

    comment()

    Selects comment nodes.

    node()

    Selects any type of node.

    processing-instruction()

    Selects a processing instruction node. You can specify the name of the processing instruction to select in the parentheses.

    text()

    Selects a text node.

XPath Predicates

The predicate part of an XPath step is perhaps its most intriguing part because it gives you the most power. You can work with all kinds of expressions in predicates; here are the possible types:

  • Node sets

  • Booleans

  • Numbers

  • Strings

  • Result tree fragments

I’ll take a look at these various types in turn.

XPath Node Sets

As its name implies, a node set is simply a set of nodes. An expression such as child::PLANET returns a node set of all <PLANET> elements. The expression child::PLANET/child::NAME returns a node list of all <NAME> elements that are children of <PLANET> elements. To select a node or nodes from a node set, you can use various functions that work on node sets in predicates.

    Function

    Description

    last()

    Returns the number of nodes in a node set.

    position()

    Returns the position of the context node in the context node set (starting with 1).

    count(node-set)

    Returns the number of nodes in node-set. Omitting node-set makes this function use the context node.

    id(string ID)

    Returns a node set containing the element whose ID matches the string passed to the function, or returns an empty node set if no element has the specified ID. You can list multiple IDs separated by whitespace, and this function will return a node set of the elements with those IDs.

    local-name(node-set)

    Returns the local name of the first node in the node set. Omitting node-set makes this function use the context node.

    namespace-uri(node-set)

    Returns the URI of the namespace of the first node in the node set. Omitting node-set makes this function use the context node.

    name(node-set)

    Returns the full, qualified name of the first node in the node set. Omitting node-set makes this function use the context node.

Here’s an example; in this case, I’ll number the elements in the output document using the position() function:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="PLANETS">
        <HTML>
            <HEAD>
                <TITLE>
                    The Planets
                </TITLE>
            </HEAD>
            <BODY>
                <xsl:apply-templates select="PLANET"/>
            </BODY>
        </HTML>
    </xsl:template>

    <xsl:template match="PLANET">
        <P>
            <xsl:value-of select="position()"/>.
            <xsl:value-of select="NAME"/>
        </P>
    </xsl:template>
</xsl:stylesheet>

Here’s the result, where you can see that the planets are numbered:

<HTML>
<HEAD>
<TITLE>
                    The Planets
                </TITLE>
</HEAD>
<BODY>
<P>1.
            Mercury</P>
<P>2.
            Venus</P>
<P>3.
            Earth</P>
</BODY>
</HTML>

You can use functions that operate on node sets in predicates, as in child::PLANET[position() = last()], which selects the last <PLANET> child of the context node.

XPath Booleans

You can also use Boolean values in XPath expressions. Numbers are considered false if they’re zero and are considered true otherwise. An empty string ("") is also considered false, and all other strings are considered true.

You can use XPath logical operators to produce Boolean true/false results; here are the logical operators:

    Operator

    Description

    !=

    Is not equal to.

    <

    Is less than. (Use &lt; in XML documents.)

    <=

    Is less than or equal to. (Use &lt;= in XML documents.)

    =

    Is equal to. (C, C++, Java, JavaScript programmers take note—this operator is one = sign, not two.)

    >

    Is greater than.

    >=

    Is greater than or equal to.

You shouldn’t use < directly in XML documents; use the entity reference &lt; instead.

You can also use the keywords and and or to connect Boolean clauses with a logical And or Or operation, as we’ve seen when working with JavaScript and Java.

Here’s an example using the logical operator >. This rule applies to all <PLANET> elements after position 5:

<xsl:template match="
PLANET[position() > 5]">

    <xsl:value-of select="."/>
</xsl:template>

There is also a true() functions that always returns a value of true, and a false() function that always returns a value of false.

You can also use the not() function to reverse the logical sense of an expression, as in this case, where I’m selecting all but the last <PLANET> element:

<xsl:template match="PLANET[not(position() = last())]">
    <xsl:value-of select="."/>
</xsl:template>

Finally, the lang() function returns true or false, depending on whether the language of the context node (which is given by xml:lang attributes) is the same as the language you pass to this function.

XPath Numbers

In XPath, numbers are actually stored as in double-precision floating-point format. (See Chapter 10, "Understanding Java," for more details on doubles; technically speaking, all XPath numbers are stored in 64-bit IEEE 754 floating-point double-precision format.) All numbers are stored as doubles, even integers such as 5, as in the example we just saw:

<xsl:template match="PLANET[position() > 5]">
    <xsl:value-of select="."/>
</xsl:template>

You can use several operators on numbers:

    Operator

    Action

    +

    Adds.

    -

    Subtracts.

    *

    Multiplies.

    div

    Divides. (The / character, which stands for division in other languages, is already heavily used in XML and XPath.)

    mod

    Returns the modulus of two numbers (the remainder after dividing the first by the second).

For example, the element <xsl:value-of select="180 + 420"/> inserts the string "600" into the output document. This example selects all planets whose day (measured in earth days) divided by its mass (where the mass of Earth = 1) is greater than 100:

<xsl:template match="PLANETS">
    <HTML>
        <BODY>
            <xsl:apply-templates select="PLANET[DAY div MASS > 100]"/>
        </BODY>
    </HTML>
</xsl:template>

XPath also supports these functions that operate on numbers:

    Function

    Description

    ceiling()

    Returns the smallest integer larger than the number that you pass it

    floor()

    Returns the largest integer smaller than the number that you pass it

    round()

    Rounds the number that you pass it to the nearest integer

    sum()

    Returns the sum of the numbers that you pass it

For example, here’s how you can find the average mass of the planets in planets.xml:

<xsl:template match="PLANETS">
    <HTML>
        <BODY>
            The average planetary mass is:
            <xsl:value-of select="sum(child::MASS) 
            div count(descendant::MASS)"/>
        </BODY>
    </HTML>
</xsl:template>

XPath Strings

In XPath, strings are made up of Unicode characters. A number of functions are specially designed to work on strings, as shown in this table.

    Function

    Description

    starts-with(string string1, string string2)

    Returns true if the first string starts with the second string

    contains(string string1, string string2)

    Returns true if the first string contains the second one

    substring(string string1,number offset, number length)

    Returns length characters from the string, starting at offset

    substring-before(string string1, string string2)

    Returns the part of string1 up to the first occurrence of string2

    substring-after(string string1, string string2)

    Returns the part of string1 after the first occurrence of string2

    string-length(string string1)

    Returns the number of characters in string1

    normalize-space(string string1)

    Returns string1 after leading and trailing whitespace is stripped and multiple consecutive whitespace is replaced with a single space

    translate(string string1, string string2, string string3)

    Returns string1 with all occurrences of the characters in string2 replaced by the matching characters in string3

    concat(string string1, string string2, ...)

    Returns all strings concatenated (that is, joined) together

    format-number(number number1, string string2, string string3)

    Returns a string holding the formatted string version of number1, using string2 as a formatting string (create formatting strings as you would for Java’s java.text.DecimalFormat method), and string3 as the optional locale string

XPath Result Tree Fragments

A result tree fragment is a part of an XML document that is not a complete node or complete set of nodes. You can create result tree fragments in various ways, such as with the document() function when you point to somewhere inside another document.

You really can’t do much with result tree fragments in XPath. Actually, you can do only two things: use the string() or boolean() functions to turn them into strings or Booleans.

XPath Examples

We’ve seen a lot of XPath in theory; how about some examples? Here’s a number of location path examples—note that XPath enables you to use and or or in predicates to apply logical tests using multiple patterns.

    Example

    Action

    child::PLANET

    Returns the <PLANET> element children of the context node.

    child::*

    Returns all element children (* only matches elements) of the context node.

    child::text()

    Returns all text node children of the context node.

    child::node()

    Returns all the children of the context node, no matter what their node type is.

    attribute::UNIT

    Returns the UNIT attribute of the context node.

    descendant::PLANET

    Returns the <PLANET> element descendants of the context node.

    ancestor::PLANET

    Returns all <PLANET> ancestors of the context node.

    ancestor-or-self::PLANET

    Returns the <PLANET> ancestors of the context node. If the context node is a <PLANET> as well, also returns the context node.

    descendant-or-self::PLANET

    Returns the <PLANET> element descendants of the context node. If the context node is a <PLANET> as well, also returns the context node.

    self::PLANET

    Returns the context node if it is a <PLANET> element.

    child::NAME/descendant::PLANET

    Returns the <PLANET> element descendants of the child <NAME> elements of the context node.

    child::*/child::PLANET

    Returns all <PLANET> grandchildren of the context node.

    / Returns the document root (that is, the parent of the document element).
    /descendant::PLANET

    Returns all the <PLANET> elements in the document.

    /descendant::PLANET/child::NAME

    Returns all the <NAME> elements that have a <PLANET> parent.

    child::PLANET[position() = 3]

    Returns the third <PLANET> child of the context node.

    child::PLANET[position() = last()]

    Returns the last <PLANET> child of the context node.

    /descendant::PLANET[position() = 3]

    Returns the third <PLANET> element in the document.

    child::PLANETS/child::PLANET[position() = 4 ]/child::NAME[position() = 3]

    Returns the third <NAME> element of the fourth <PLANET> element of the <PLANETS> element.

    child::PLANET[position() > 3]

    Returns all the <PLANET> children of the context node after the first three.

    preceding-sibling::NAME[position() = 2]

    Returns the second previous <NAME> sibling element of the context node.

    child::PLANET[attribute:: COLOR = "RED"]

    Returns all <PLANET> children of the context node that have a COLOR attribute with value of RED.

    child::PLANET[attribute::]COLOR = "RED"][position() = 3

    Returns the third <PLANET> child of the context node that has a COLOR attribute with value of RED.

    child::PLANET[position() = 3][attribute::COLOR="RED"]

    Returns the third <PLANET> child of the context node, only if that child has a COLOR attribute with value of RED.

    child::MASS[child::NAME = "VENUS" ]

    Returns the <MASS> children of the context node that have <NAME> children whose text is VENUS.

    child::PLANET[child::NAME]

    Returns the <PLANET> children of the context node that have <NAME> children.

    child::*[self::NAME or self::MASS ]

    Returns both the <NAME> and <MASS> children of the context node.

    child::*[self::NAME or self::MASS][position() = first()]

    Returns the first <NAME> or <MASS> child of the context node.

As you can see, some of this syntax is pretty involved and a little lengthy to type. However, there is an abbreviated form of XPath syntax.

XPath Abbreviated Syntax

You can take advantage of a number of abbreviations in XPath syntax. Here are the rules:

    Expression

    Abbreviation

    self::node()

    .

    parent::node() ..
    child::childname childname
    attribute::childname @childname

    /descendant-or-self::node()/

    //

You can also abbreviate predicate expressions such as [position() = 3] as [3], [position() = last()] as [last()], and so on. Using the abbreviated syntax makes XPath expressions a lot easier to use. Here are some examples of location paths using abbreviated syntax—note how well these fit the syntax we saw with the match attribute earlier in the chapter:

    Path

    Description

    PLANET

    Returns the <PLANET> element children of the context node.

    *

    Returns all element children of the context node.

    text()

    Returns all text node children of the context node.

    @UNITS

    Returns the UNITS attribute of the context node.

    @*

    Returns all the attributes of the context node.

    PLANET[3]

    Returns the third <PLANET> child of the context node.

    PLANET[first()]

    Returns the first <PLANET> child of the context node

    */PLANET

    Returns all <PLANET> grandchildren of the context node.

    /PLANETS/PLANET[3]/NAME[2]

    Returns the second <NAME> element of the third <PLANET> element of the <PLANETS> element.

    //PLANET

    Returns all the <PLANET> descendants of the document root.

    PLANETS//PLANET

    Returns the <PLANET> element descendants of the <PLANETS> element children of the context node.

    //PLANET/NAME

    Returns all the <NAME> elements that have an <PLANET> parent.

    .

    Returns the context node itself.

    .//PLANET

    Returns the <PLANET> element descendants of the context node.

    ..

    Returns the parent of the context node.

    ../@UNITS

    Returns the UNITS attribute of the parent of the context node.

    PLANET[NAME]

    Returns the <PLANET> children of the context node that have <NAME> children.

    PLANET[NAME="Venus"]

    Returns the <PLANET> children of the context node that have <NAME> children with text equal to Venus.

    PLANET[@UNITS = "days"]

    Returns all <PLANET> children of the context node that have a UNITS attribute with value days.

    PLANET[6][@UNITS = "days"]

    Returns the sixth <PLANET> child of the context node, only if that child has a UNITS attribute with value days. Can also be written as PLANET[@UNITS = "days"][6].

    PLANET[@COLOR and @UNITS]

    Returns all the <PLANET> children of the context node that have both a COLOR attribute and a UNITS attribute.

Here’s an example in which I put the abbreviated syntax to work, moving up and down inside a <PLANET> element:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:template match="PLANETS">
        <HTML>
            <xsl:apply-templates select="PLANET"/>
        </HTML>
    </xsl:template>

    <xsl:template match="PLANET">
        <xsl:apply-templates select="MASS"/>
    </xsl:template>

    <xsl:template match="MASS">
        <xsl:value-of select="../NAME"/>
        <xsl:value-of select="../DAY"/>
        <xsl:value-of select="."/>
   </xsl:template>
</xsl:stylesheet>

Default XSLT Rules

XSLT has some built-in, default rules that we’ve already seen in action. For example, the default rule for text nodes is to add the text in that node to the output document.

The most important default rule applies to elements and can be expressed like this:

<xsl:template match="/ | *">
    <xsl:apply-templates/>
</xsl:template>

This rule is simply there to make sure that every element, from the root on down, is processed with <xsl:apply-templates/> if you don’t supply some other rule. If you do supply another rule, it overrides the corresponding default rule.

The default rule for text can be expressed like this, where, by default, the text of a text node is added to the output document:

<xsl:template match="text()">
    <xsl:value-of select="."/>
</xsl:template>

The same kind of default rule applies to attributes, which are added to the output document with a default rule like this:

<xsl:template match="@*">
    <xsl:value-of select="."/>
</xsl:template>

By default, processing instructions are not inserted in the output document, so their default rule can be expressed simply like this:

<xsl:template match="processing-instruction()"/>

The same goes for comments, whose default rule can be expressed this way:

<xsl:template match="comment()"/>

The upshot of the default rules is that if you don’t supply any rules at all, all the parsed character data in the input document is inserted in the output document. Here’s what an XSLT style sheet with no explicit rules looks like:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
</xsl:stylesheet>

Here’s the results of applying this style sheet to planet.xml:

<?xml version="1.0" encoding="UTF-8"?>



    Mercury
    .0553
    58.65
    1516
    .983
    43.4



    Venus
    .815
    116.75
    3716
    .943
    66.8



    Earth
    1
    1
    2107
    1

    128.4

    XSLT Rules and Internet Explorer

    One of the problems of working with XSLT in Internet Explorer is that that browser doesn’t supply any default rules. You have to supply all the rules yourself.

  • + Share This
  • 🔖 Save To Your Account

Discussions

comments powered by Disqus