Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2016-04-06T12:03:54+0000


XSLT identity transformation

The XSLT identity transformation is used to transform an XML document to itself, that is, to generate XML output that is identical to the XML input. By itself this is not very useful, since there are more computationally efficient ways to produce an identical copy of a document. Where it pays off, though, is if you want to make an almost identical copy, except that you want to introduce a small but systematic change or two. You can do this by using the identity template to transform everything in the document except the parts that you want to modify. The result is that you produce a copy of the document that is identical to the original except that it includes your modifications.

For example, you might have a document filled with sonnets with a structure like:

<sonnet number="I">
    <line>From fairest creatures we desire increase,</line>
    <line>That thereby beauty's rose might never die,</line>
    <line>But as the riper should by time decease,</line>
    <line>His tender heir might bear his memory:</line>
    <line>But thou contracted to thine own bright eyes,</line>
    <line>Feed'st thy light's flame with self-substantial fuel,</line>
    <line>Making a famine where abundance lies,</line>
    <line>Thy self thy foe, to thy sweet self too cruel:</line>
    <line>Thou that art now the world's fresh ornament,</line>
    <line>And only herald to the gaudy spring,</line>
    <line>Within thine own bud buriest thy content,</line>
    <line>And tender churl mak'st waste in niggarding:</line>
    <line>Pity the world, or else this glutton be,</line>
    <line>To eat the world's due, by the grave and thee.</line>
</sonnet>

that you’d like to convert to:

<sonnet>
    <number>I</number>
    <line>From fairest creatures we desire increase,</line>
    <line>That thereby beauty's rose might never die,</line>
    <line>But as the riper should by time decease,</line>
    <line>His tender heir might bear his memory:</line>
    <line>But thou contracted to thine own bright eyes,</line>
    <line>Feed'st thy light's flame with self-substantial fuel,</line>
    <line>Making a famine where abundance lies,</line>
    <line>Thy self thy foe, to thy sweet self too cruel:</line>
    <line>Thou that art now the world's fresh ornament,</line>
    <line>And only herald to the gaudy spring,</line>
    <line>Within thine own bud buriest thy content,</line>
    <line>And tender churl mak'st waste in niggarding:</line>
    <line>Pity the world, or else this glutton be,</line>
    <line>To eat the world's due, by the grave and thee.</line>
</sonnet>

That is, your input has a @number attribute that you’d like to replace with a <number> child of the <sonnet> element. To make that change you can start with an identity transformation, which applies templates to every node in the document and tells it to reproduce itself unchanged in the output, except that you also write a template that handles the numbering specially.

The identity template looks like:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

The preceding is an entire XSLT stylesheet that transforms a document to itself unchanged. The way it works is that it matches all attributes (@*) and all other nodes (node(); this includes the document node and all elements, text() nodes, comments, etc.), makes a copy of those, and then applies templates to any attributes or other nodes associated with whatever it’s processing at the moment. Since an XSLT transformation starts automatically at the document node (and because the document node is a node, this template will match it), you wind up processing everything in the document, moving down step by step through the hierarchy. Because text() nodes are nodes, too, this also winds up copying all of the text.

There are two subtle details that enable such a simple template to do so much work:

  1. The <xsl:copy> element makes a shallow copy. This means that it creates a copy of the node that it is processing, but it doesn’t do anything with any attributes or child elements or textual content. For example, if you apply <xsl:copy> to a <sonnet> element in the example above, it will create a <sonnet> element in the output, but it won’t copy the @number attribute or the <line> child elements. This is why we need to apply templates explicitly inside the <xsl:copy> tags.
  2. We have to apply templates to both nodes and attributes. We don’t usually think of it this way, but when we apply templates to node(), we’re applying templates to all nodes on the child axis of the current context (that is, of the element we matched in the template that is doing the work at the moment). In other words, <xsl:apply-templates select="node()"/> is synonymous with <xsl:apply-templates select="child::node()"/>. Since attributes are on the attribute axis, and not on the child axis, this would not apply templates to attributes, which means that attributes on elements in the input document would be lost during the transformation. To avoid losing them, we have to apply templates to the union (using the union operator |) of anything on the attribute axis (@*) and any node on the child axis (node()).

One more bit of magic is that XSLT templates have precedence rules, which specifies what happens when more than one template matches a node that is being processed, and we exploit those rules to override the identity template when we want to change something during the transformation. The most important precedence rule is that the more specific @match value wins. Since the @match value of the identity template, above, is very general (it matches any attribute and any other node), just about any other @match value would be more specific. What we’ll do for our present task, then, is let the identity template take care of everything in our document except the bits that we want to change. The identity template will match absolutely everything in the input document, but our more specific template for what we want to change will override it where we need it to.

This simplified example contains just two element types: <sonnet> and <line>, but in Real Life you would have other elements: a root element (perhaps <sonnets>), some metadata, headers or titles, and perhaps more. We want to leave all of those other element types unchanged, and we also want to leave <line> unchanged, but we want to make two changes to <sonnet>:

  1. We want to remove the @number attribute, and
  2. We want to add a <number> child element.

A full XSLT stylesheet to do that would be:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="sonnet">
        <xsl:copy>
            <number>
                <xsl:value-of select="@number"/>
            </number>
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

The identity template matches everything in the input document: the document node, all elements, all attributes, and all text() nodes. Both templates match <sonnet> elements; the second one matches them specifically and the identity template matches them because it matches all nodes. This means that the identity template will be used to make no changes in anything else, but the second template, which has the more specific @match attribute value and therefore outranks the identity template in case of a tie, will be the one that gets to handle the <sonnet> element. What the sonnet-specific template does is copy the element it just matched (that is, create a shallow copy of the <sonnet> element in the output) and then, inside the element it has just created in the output, create a new <number> child element, the content of which is the value of the @number attribute that was on the original <sonnet> we’re processing. Below that new <number> child element of the <sonnet> we apply templates without a @select attribute, which means that we apply templates to all of the child nodes of the <sonnet> we’re processing at the moment. Those child nodes are the line elements of the sonnet, and when we apply templates to them, the identity template does the processing and just copies them unchanged to the output.

So what happened to the original @number attribute? Attributes are not children because they aren’t on the child axis; they’re on the attribute axis. This means that in our template that matches <sonnet> elements, our <xsl:apply-templates/> without a @select attribute applies templates to all of the children of the current context (in this case, the <line> elements), but not to its attributes. This means that the original @number attribute disappears because we simply don’t apply templates to it.