Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2021-12-30T01:52:10+0000


XSLT identity transformation

The XSLT identity transformation is used to transform an XML document to itself, that is, to generate XML output that is identical to the XML input. By itself this is not very useful, since there are more computationally efficient ways to produce an identical copy of a document. Where it pays off, though, is if you want to make an almost identical copy, except that you want to introduce a small but systematic change or two. You can do this by using the identity template to transform everything in the document except the parts that you want to modify. The result is that you produce a copy of the document that is identical to the original except that it includes your modifications.

For example, you might have a document filled with sonnets with a structure like:

<sonnet number="I">
    <line>From fairest creatures we desire increase,</line>
    <line>That thereby beauty's rose might never die,</line>
    <line>But as the riper should by time decease,</line>
    <line>His tender heir might bear his memory:</line>
    <line>But thou contracted to thine own bright eyes,</line>
    <line>Feed'st thy light's flame with self-substantial fuel,</line>
    <line>Making a famine where abundance lies,</line>
    <line>Thy self thy foe, to thy sweet self too cruel:</line>
    <line>Thou that art now the world's fresh ornament,</line>
    <line>And only herald to the gaudy spring,</line>
    <line>Within thine own bud buriest thy content,</line>
    <line>And tender churl mak'st waste in niggarding:</line>
    <line>Pity the world, or else this glutton be,</line>
    <line>To eat the world's due, by the grave and thee.</line>
</sonnet>

that you’d like to convert to:

<sonnet>
    <number>I</number>
    <line>From fairest creatures we desire increase,</line>
    <line>That thereby beauty's rose might never die,</line>
    <line>But as the riper should by time decease,</line>
    <line>His tender heir might bear his memory:</line>
    <line>But thou contracted to thine own bright eyes,</line>
    <line>Feed'st thy light's flame with self-substantial fuel,</line>
    <line>Making a famine where abundance lies,</line>
    <line>Thy self thy foe, to thy sweet self too cruel:</line>
    <line>Thou that art now the world's fresh ornament,</line>
    <line>And only herald to the gaudy spring,</line>
    <line>Within thine own bud buriest thy content,</line>
    <line>And tender churl mak'st waste in niggarding:</line>
    <line>Pity the world, or else this glutton be,</line>
    <line>To eat the world's due, by the grave and thee.</line>
</sonnet>

That is, your input has a @number attribute that you’d like to replace with a <number> child of the <sonnet> element, but you’d like to keep the wrapper <sonnet> element and the internal <line> elements. To make that change you can start with an identity transformation, which applies templates to every node in the document and tells it to reproduce itself unchanged in the output, except that you also write a template that handles the numbering specially.

The identity template looks like:


    
    

]]>

The preceding is an entire XSLT stylesheet that transforms a document to itself unchanged. The way it works is that it says that if you encounter a node for which there is no explicit template, you make a shallow copy, that is, you copy the node and then process its children (and, if it is an element, also its attributes).

The <xsl:mode> element is a top-level element, which means that it’s a child of the root <xsl:stylesheet> element and a sibling of <xsl:output> and <xsl:template> elements. What this statement does is overwrite the built-in rules, which would otherwise be that 1) if there’s no template for an element, throw away the tags and process its children, and 2) if there’s no template that matches text nodes, output the text. This new rule says that if there’s no template for any node (element, text, anything else), make a shallow copy of it and then apply templates to all of its children and attributes. It will apply first to the document node, because that’s where XSLT starts its work, and it will then work its way down the tree.

One bit of magic is that XSLT templates have precedence rules, which specifies what happens when more than one template matches a node that is being processed, and we exploit those rules to override the identity template when we want to change something during the transformation. The most important precedence rule is that (with some simplification) the more specific match wins. Since the identity template, above, is very general (it matches all nodes of all types), any explicit template will be more specific. What we’ll do for our present task, then, is let the identity template take care of everything in our document except the bits that we want to change. The identity template will match absolutely everything in the input document, but our more specific template for what we want to change will override it where we need it to.

This simplified example contains just two element types: <sonnet> and <line>, but in Real Life you would have other elements: a root element (perhaps <sonnets>), some metadata, headers or titles, and perhaps more. We want to leave all of those other element types unchanged, and we also want to leave <line> unchanged, but we want to make two changes to <sonnet>:

  1. We want to remove the @number attribute, and
  2. We want to add a <number> child element.

A full XSLT stylesheet to do that would be:


    
    
    
        
            
                
            
            
        
    
]]>

The identity template matches everything in the input document: the document node, all elements, all attributes, and all text nodes. Both templates match <sonnet> elements; the second one matches them specifically and the identity template matches them because it matches all nodes. This means that the identity template will copy everything else without change, but the second template, which has the more specific @match attribute value and therefore outranks the identity template in case of a tie, will be the one that gets to handle the <sonnet> element. What the sonnet-specific template does is copy the element it just matched (that is, create a shallow copy of the <sonnet> element in the output, so it copies the element node but not—unless instructed to do so explicitly—any of its attributes or children) and then, inside the element it has just created in the output, create a new <number> child element, the content of which is the value of the @number attribute that was on the original <sonnet> we’re processing. Below that new <number> child element of the <sonnet> we apply templates without a @select attribute, which means that we apply templates to all of the child nodes of the <sonnet> we’re processing at the moment. Those child nodes are the <line> elements of the sonnet, and when we apply templates to them, the identity template does the processing and just copies them unchanged to the output.

So what happened to the original @number attribute? Attributes are not children because they aren’t on the child axis; they’re on the attribute axis. This means that in our template that matches <sonnet> elements, our <xsl:apply-templates/> without a @select attribute applies templates to all of the children of the current context (in this case, the <line> elements), but not to its attributes. This means that the original @number attribute disappears because we simply don’t apply templates to it.