Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2016-02-21T16:50:31+0000


XSLT, part 2: Advanced features

This supplementary XSLT tutorial concentrates on four topics: variables, keys, conditionals (<xsl:if> and <xsl:choose>), and the difference between push processing (<xsl:apply-templates> and <xsl:template>) and pull processing (<xsl:for-each> and <xsl:value-of>).

The <xsl:variable> element

If you’ve used variables in other programming languages before, be aware that variables in XSLT do not work like variables in most other languages. The value of a variable in XSLT cannot be updated once the variable has been declared..

The <xsl:variable> element requires a @name attribute, which names the variable for future use. The value of the variable is typically assigned through a @select attribute (in which case <xsl:variable> is an empty element), but it can also be specified as the content of the <xsl:variable> (in which case the element cannot have a @select attribute). It’s usually easier to use @select, since this typically produces less complicated code, but if you need to do anything particularly involved, you may not be able to use the @select method of assigning your value. To reference a variable later on, just use $variableName, where variableName is the value of the @name attribute you wrote when creating the variable. That is, when you declare a variable to, say, count the paragraphs (<p> elements) in your input, you might give it the name paragraphCount with something like:

<xsl:variable name="pargraphCount" select="count(//p)"/>

Note that there is no leading dollar sign associated with the name when you declare and define a variable. But should you later refer to the variable, you need the leading dollar sign. For example, to get the value of the variable, you might use:

<xsl:value-of select="$paragraphCount"/>

The <xsl:variable> element may be defined in different locations in the stylesheet, and the location makes a difference. When you define a variable (that is, use <xsl:variable>) as an immediate child of the root <xsl:stylesheet> element, the variable can be used anywhere in the stylesheet. When, on the other hand, you define a variable inside a template rule, it’s available only within that template rule.

We often use a top-level <xsl:variable> element to avoid having to recalculate a value that is used often, as well as to access the tree from an atomized context (such as when you've used <xsl:for-each/>, which we’ll explain when the situation arises). You might also use variables to avoid typing a long XPath expression within some other complicated instruction. Variables that are not strictly necessary and that are created for the convenience of the developer are called, not surprisingly, convenience variables.

For more information about variables, see Michael Kay, 500 ff.

The <xsl:key> element

<xsl:key> may be overlooked in situations where comparable functionality is available through other means, but it is often simpler (and almost always faster) to use <xsl:key> than the alternatives (we once reduced the run time for a transformation from twenty minutes to just a few seconds by switching to an implementation that used <xsl:key>!). The <xsl:key> element requires three attributes. Consider, for example, XML structured like:

<book>
    <title>XSLT 2.0 and XPath 2.0 Programmer's Reference</title>
    <author>Michael Kay</author>
    <publisher>Wrox</publisher>
    <edition>4</edition>
    <year>2008</year>
</book>

where you want to be able to find books by their authors. You could define a key as:

<xsl:key name="bookByAuthor" match="book" use="author">

The three required attributes in this case are:

The @match attribute value of the key is the object (typically an element) that the processor will return when the key is referenced (see below), while the @use attribute value tells the processor what to use to look up those values. In the example above, you would be able to use the key to retrieve <book> elements according to their <author> child elements. To retrieve information with the help of a key, you use the key() XPath function, which takes two or three arguments. The first argument is the name of the key (matching the @name value from the <xsl:key> element), and it must be in quotation marks (single or double). The second argument is the value to look up; for example, in the sample above, if you were to specify Michael Kay as the second argument to the key() function (key("bookByAuthor","Michael Kay")), you would retrieve all <book> elements with <author> children that have the value Michael Kay. The (optional) third argument is the document root of the document in which to look. When the third argument is omitted, the function searches in the current document. For further discussion of <xsl:key>, consult Michael Kay, page 376.

Conditionals

<xsl:if>

<xsl:if> is useful when you have one particular feature whose value may sometimes require special treatment. For example, you might use <xsl:if> to color all <speaker> elements with the @who value Hamlet differently from all other <speaker> elements. <xsl:if> takes a required attribute @test, which takes a Boolean argument (that is, the attribute value has to describe a test that evaluates to either True or False) just like a predicate expression in XPath. The contents of the <xsl:if> element, then, describe what the system is to do if the result of @test is True: for example, you might want to apply templates or use <xsl:value-of> to display the results of a particular function, or you might want to create a special @class attribute value (if you are generating HTML) using <xsl:attribute> that can be styled with CSS (see our Using <span> and @class to style your HTML to refresh your memory about the @class attribute). Consider:

<xsl:template match="sp">
    <p>
        <xsl:if test="speaker='Hamlet'">
            <xsl:attribute name="class">mainCharacter</xsl:attribute>
        </xsl:if>
        ...
    </p>
</xsl:template>

In this example we are checking each <sp> (because we’re doing this inside the template rule for <sp> elements) to see whether its child <speaker> (remember that we default to the child axis) is equal to the string Hamlet. If the result of this test is True, we’ll go on to perform whatever is inside <xsl:if>. If it isn’t, we’ll throw it away and won’t do anything special with it. In this case, everywhere this test is True we’ll create an attribute using <xsl:attribute>, and we use the @name attribute to specify what name this attribute should have: in this case we’re creating the attribute @class. This attribute gets attached to the parent element: in this case, <p>. The contents of <xsl:attribute> indicate the value to be assigned to this new attribute: in this case the value of @style will be mainCharacter. This means that anywhere there’s a speech by Hamlet, we’re mapping it to something like:

<p class="mainCharacter"> . . . </p>

If we then have a rule in our CSS like:

.mainCharacter { color: red; }

so any <p> element that contains a speech by Hamlet will have this attribute and will now be colored red.

<xsl:choose>

Although <xsl:if> can be useful, sometimes we need to code for multiple possible environments, or we care about what should happen when the results of our conditional are False, and this is where <xsl:choose> comes in. <xsl:if> can run only one test and can have only two results: True or False. On the other hand, you can use <xsl:choose> to specify a number of different conditional environments, as well as a fallback action if none of the conditions is true. <xsl:choose> takes at least one child <xsl:when> element (and up to as many as you want) and one optional <xsl:otherwise> element. <xsl:when> requires the same @test attribute that we discussed above. Since <xsl:otherwise> is the fallback condition, it doesn’t take this @test attribute; it only applies when all <xsl:when> tests return False.

<xsl:template match="sp">
    <p>
        <xsl:choose>
            <xsl:when test="speaker='Hamlet'">
                <xsl:text>[Hi, Hamlet!] </xsl:text>
            </xsl:when>
            <xsl:when test="speaker='Ophelia'">
                <xsl:text>[Hi, Ophelia!] </xsl:text>
            </xsl:when>
            <xsl:otherwise>
                <xsl:text>[Neither Hamlet nor Ophelia] </xsl:text>
            </xsl:otherwise>
        </xsl:choose>
        <xsl:apply-templates/>
    </p>
</xsl:template>

In this example, we have two tests (<xsl:when>) and one fallback (<xsl:otherwise>), which is used if neither test returns True. The first test checks whether the child <speaker> element (remember that we’re in the template rule for <sp>) is equal to Hamlet. If it is, we use the <xsl:text> element to create a text() node with the content [Hi, Hamlet!] , which means that we return the plain text: [Hi, Hamlet!] . The second test works along the same lines, except that it checks whether the child <speaker> is equal to Ophelia. If this test is True, then we return plain text reading [Hi, Ophelia!] . If neither of these tests returns True (that is, if the speaker is anyone other than Hamlet or Ophelia), then the <xsl:otherwise> condition kicks in. In this case, that means that we return the plain text [Neither Hamlet nor Ophelia] . Note that we’ve put in a space at the end of each of these strings of plain text, because we apply templates at the end of this block of conditionals. If you run this code, your output should look like this (we’ve added bolding to the speaker names to make them easier to see here):

[Neither Hamlet nor Ophelia] Osric: It is indifferent cold, my lord, indeed.

[Hi, Hamlet!] Hamlet: But yet methinks it is very sultry and hot for my
complexion.

The examples above of <xsl:if> and <xsl:choose> came from the following stylesheet, which is included in its entirety for your reference. It outputs all of the speeches in Bad Hamlet normally, but we have the system do some extra formatting depending on whether the speaker is Hamlet, Ophelia, or anyone else.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0"
    xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>XSLT conditional practice</title>
            </head>
            <body>
                <h1>XSLT conditional practice</h1>
                <xsl:apply-templates select="//sp"/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="sp">
        <p>
            <xsl:if test="speaker='Hamlet'">
                <xsl:attribute name="class">mainCharacter</xsl:attribute>
            </xsl:if>
            <xsl:choose>
                <xsl:when test="speaker='Hamlet'">
                    <xsl:text>[Hi, Hamlet!] </xsl:text>
                </xsl:when>
                <xsl:when test="speaker='Ophelia'">
                    <xsl:text>[Hi, Ophelia!] </xsl:text>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:text>[Neither Hamlet nor Ophelia] </xsl:text>
                </xsl:otherwise>
            </xsl:choose>
            <xsl:apply-templates/>
        </p>
    </xsl:template>
    <xsl:template match="speaker">
        <strong>
            <xsl:apply-templates/>
            <xsl:text>: </xsl:text>
        </strong>
    </xsl:template>
    <xsl:template match="l | ab">
        <xsl:apply-templates/>
        <xsl:if test="following-sibling::l or following-sibling::ab">
            <br/>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

Push and pull design

The XSLT processing model supports both push and pull design. The push model, which is what we’ve been using exclusively so far, relies on <xsl:apply-templates> to identify what is supposed to get processed where, and on <xsl:template> to describe how it is supposed to be processed. This is called push because you push the elements and other components out into the stylesheet and rely on the templates to grab the individual pieces and process them. For example, you don’t say take all the paragraphs and paint them blue; what you do instead is say in one place here are some paragraphs; take care of them and in another whenever you happen to run into a paragraph, paint it blue. The great strength of push processing is that you don’t have to know the structure of your input document—that is, you don’t have to know which elements will be encountered where. The declarative template rules ensure that no matter where an element pops up, you’ll have a template around that will know what to do with it. Since the structure of humanities documents involves a lot of variable mixed content, this declarative approach creates a flexibility that is difficult to achieve with the sort of procedural programming that requires you to know at each moment exactly what is supposed to happen next.

The pull model, on the other hand, is procedural in nature, and relies primarily on <xsl:for-each> and <xsl:value-of>. It is useful when you need to round up specific information, instead of dealing with it on the fly whenever it happens to come up. Pull design is useful for generating tables, for example, where you might want to create a row for each character in a play with columns for the name and the number of speeches (see the example below). In this case you don’t want to process each speech where it occurs; you want to go out and grab them all for one character, and then for the next, etc. The pull model would work poorly, on the other hand, for rendering each speech as it occurs, since it might contain an unpredictable variety of in-line elements, and you need to be able to deal with those as they arise, without having to know in advance which ones to call for explicitly.

About pull

Pull design is frequently overused by beginning XSLT programmers, especially if they have experience with procedural programming languages. In many cases the end result of using pull will be the same as the result of using push, but pull design is often harder to maintain because it is less consistent with the declarative nature of XSLT as a programming language. With that said, pull design does have its uses. As noted above, the two principal elements used in pull coding are <xsl:for-each> and <xsl:value-of>.

<xsl:for-each>

The <xsl:for-each> element is used to iterate over a sequence of items (most often elements, but other items are also permissible). <xsl:for-each> requires one attribute, @select, the value of which can be a full XPath expression (just like the value of the @select attribute with <xsl:apply-templates>). Whatever @select identifies becomes the sequence of current context items, so any XPath expressions used in children of <xsl:for-each> begin at the current context node, not at the document node.

We often use <xsl:for-each> with scalable vector graphics (SVG), which we’ll be introducing later in the semester. It is also useful for creating a sorted list when used in conjunction with <xsl:sort> (see Michael Kay for details).

<xsl:value-of>

Although the results of <xsl:value-of> and <xsl:apply-templates> are often the same, the real usefulness of <xsl:value-of> is that it allows you to output the results of functions and non-node values. For example, if you want to output a list of unique speakers in a play, the following code will generate an error message:

<xsl:for-each select="distinct-values(//speaker)">
    <xsl:apply-templates/>
</xsl:for-each>

The problem is that you can’t apply templates to an atomic value (Michael Kay: an item such as an integer, a string, a date, or a boolean, rather than a node [element, attribute, and a few others]). What you should do instead is:

<xsl:for-each select="distinct-values(//speaker)">
    <xsl:value-of select="."/>
</xsl:for-each>

If you want to do something to every instance of a <speaker> element in a play, though, repeats and all, you should prefer <xsl:apply-templates>. The difference is that each instance of a <speaker> element is a node in the tree, but the sequence produced by applying the distinct-values() function to all of the <speaker> nodes is a sequence of atomic values, and not of nodes.

The following example creates an HTML page that lists the number of speeches by each speaker in Bad Hamlet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0"
    xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Bad Hamlet Speeches</title>
            </head>
            <body>
                <xsl:for-each select="//role">
                    <p>
                        <xsl:value-of select="."/>
                        <xsl:text>: </xsl:text>
                        <xsl:value-of select="count(//sp[contains(@who, current()/@xml:id)])"/>
                    </p>
                </xsl:for-each>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

What’s happening here is that we loop through each <role> element in the whole of Bad Hamlet (at any depth, as specified by the //) and create a <p> element:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0"
    xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Bad Hamlet Speeches</title>
            </head>
            <body>
                <xsl:for-each select="//role">
                    <p>
                        <xsl:value-of select="."/>
                        <xsl:text>: </xsl:text>
                        <xsl:value-of select="count(//sp[contains(@who, current()/@xml:id)])"/>
                    </p>
                </xsl:for-each>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

Inside each <p>, return the value of the context node (specified by the .):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0"
    xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Bad Hamlet Speeches</title>
            </head>
            <body>
                <xsl:for-each select="//role">
                    <p>
                        <xsl:value-of select="."/>
                        <xsl:text>: </xsl:text>
                        <xsl:value-of select="count(//sp[contains(@who, current()/@xml:id)])"/>
                    </p>
                </xsl:for-each>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

We then output a colon followed by a space, just as plain text:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0"
    xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Bad Hamlet Speeches</title>
            </head>
            <body>
                <xsl:for-each select="//role">
                    <p>
                        <xsl:value-of select="."/>
                        <xsl:text>: </xsl:text>
                        <xsl:value-of select="count(//sp[contains(@who, current()/@xml:id)])"/>
                    </p>
                </xsl:for-each>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

Finally we return a count of all the <sp> elements that meet a certain condition (the predicate):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0"
    xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Bad Hamlet Speeches</title>
            </head>
            <body>
                <xsl:for-each select="//role">
                    <p>
                        <xsl:value-of select="."/>
                        <xsl:text>: </xsl:text>
                        <xsl:value-of select="count(//sp[contains(@who, current()/@xml:id)])"/>
                    </p>
                </xsl:for-each>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

The predicate here says get all the <sp> elements in the entire play and check to see whether their @who attributes contain some substring. When we’re executing an <xsl:for-each> loop, the value of each item in the loop can be represented by current(). This means that we’re comparing the value of the @who attribute of each <sp> element to the @xml:id attribute of the <role> element that we’re processing at the moment. For example, when we process <role xml:id="Hamlet">Hamlet</role>, we check the @who attribute of every <sp> in the play to see whether it contains, as a substring, the value of the @xml:id attribute of that <role>. If it does, that’s a speech by Hamlet, so it gets included in our count. After we’ve gone through every <sp> and checked who the speaker is, we output the count of the speeches for the <role> we’re looking at at the moment. Then we move on to the next <role>. When we run out of roles, the <xsl:for-each> terminates gracefully. (Our use of the XPath contains() function is brittle, since it could strike a false positive if, say, there were characters named both Ham and Hamlet. In that case we would erroneously identify Hamlet’s speeches as by both Ham and Hamlet. If false positive substring matches are a risk with your data, more robust methods are available.)

This is the first time you’ve seen the function current(), and you may be wondering why you can’t write:

count(//sp[contains(@who, ./@xml:id)])

(with a dot instead of current()). The problem is that the dot refers to wherever you are at that moment in your current path. Since you’re inside a predicate that is being applied to a preceding <sp>, a dot would check the @xml:id attribute of the <sp>, and not of the <role>. Since the <sp> doesn’t have an @xml:id attribute (it’s the <role> that does), this wouldn’t find the matches we care about. That is:

You may find the following distinction helpful: From a technical perspective, current() refers to the current context at the XSLT level and the dot refers to the current context in an XPath path expression. At the first step of an XPath path expression, the two mean the same thing. In the example above, when we output <xsl:value-of select="."/> we could instead have said <xsl:value-of select="current()"/>, since in this simple path the XSLT and XPath contexts are the same. We don’t have that choice in count(//sp[contains(@who, current()/@xml:id)]), though; here the more complicated XPath includes a new step, //sp, which changes the XPath context. Here we need to use current() because the XSLT context was set at the <xsl:for-each> stage, and is unaffected by the comparison. For more discussion, with examples, see Michael Kay, p. 735.

If you try to run this code, you’ll notice that it takes a bit longer than usual to finish. That’s because it’s looping through the entire play repeatedly, looking at every speech once for every role in the play. At 1137 <sp> elements and 37 <role> elements, that’s 42069 comparisons. This is part of why we usually avoid using <xsl:for-each> unless the problem really calls for it, and in those cases there are ways to speed it up (such as by using a key, as described above).

There are situations that can be managed with either push or pull strategies. In most of those cases, your instinct, unless you are a veteran XSLT programmer, will draw you toward pull. It’s much more common in humanities-oriented XSLT to use push programming, and where there’s a choice, we’d encourage you to train yourselves to think of push first, and fall back on pull only where it is truly more appropriate.