Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2021-12-27T22:03:57+0000


What’s new in XSLT 3.0 and XPath 3.1?

Introduction

This page aims to provide a brief introduction to small but useful enhancements to XPath and XSLT that have emerged since the publication of Michael Kay’s XSLT 2.0 and XPath 2.0 programmer’s reference, 4th edition, which covers XPath 2.0 and XSLT 2.0. Two of the most significant additions to XSLT 3.0, streaming and packaging, are not covered here because, as important as they are for large files or complex transformations, we haven’t found a need for them in the smaller scale on which we usually operate.

References

Configuring <oXygen/>

To tell <oXygen/> that new XSLT files should default to XSLT 3.0, click on File → New → XSLT → Customize and select 3.0.

XPath 3.0 and 3.1

Variable declaration

XPath in XSLT allows the use of the let construction, which was previously available only in XQuery. See immediately below, under Concatenation.

Concatenation with ||

The string concatenation operator || can be used in situations that previously required the concat() function. For example, the following XPath expression:

let $a := 'hi', $b := 'bye' return $a || ' ' || $b

is equivalent to:

let $a := 'hi',  $b := 'bye' return concat($a,' ', $b)

Simple mapping with the bang operator (!)

The bang operator applies the operation to the right of the bang to each item in the sequence on the left. For example:

('curly', 'larry', 'moe') ! string-length(.)

returns a sequence of three integers: (5, 5, 3). The expression is equivalent to:

for $stooge in ('curly', 'larry', 'moe') return string-length($stooge)

The simple mapping operator is similar to /, except that 1) the sequence to the left of / must be a sequence of nodes, while the sequence to the left of ! can be a sequence of any items, and 2) / sorts the sequence on the left into document order and eliminates duplicates, while ! performs no sorting or deduplication.

Function chaining with the arrow operator (=>)

The arrow operator pipes the output of the item on the left into the first argument of the function on the right. It thus provides an alternative to nested parentheses. For example (from the XPath 3.1 spec, §3.16):

tokenize((normalize-unicode(upper-case($string))),"\s+")

is equivalent to:

$string => upper-case() => normalize-unicode() => tokenize("\s+")

The functionality of the bang and arrow operators overlaps where the operation on the right is a function, but only then. For that reason:

$book ! (@author, @title)

return the values of the @author and @title attributes of some element that is the value of the variable $book, but because the operation on the right is not function, if you replace the bang with the arrow operator, you throw an error. The arrow operator does not use the dot to specify the first argument to the function because the operator supplies that argument instead.

Because the bang operator is a mapping and the arrow operator is a pipe, the following two expressions produce different results:

'curly larry moe' => tokenize('\s+') => count()

The preceding returns the integer value 3. But

'curly larry moe' ! tokenize(.,'\s+') ! count(.)

returns a sequence of three instances of the integer value 1. The difference is that after tokenize() returns a sequence of three items, the bang operator maps each item individually as the input to the count() function, while the arrow operator counts the items in the sequence.

unparsed-text-lines()

unparsed-text-lines() works like unparsed-text(), except that it tokenizes on newlines and streams the input line by line.

Maps

The following example creates a map and then serializes it as JSON on output:

<xsl:variable name="mymap" as="map(*)"
        select='map {
        "Su" : "Sunday",
        "Mo" : "Monday",
        "Tu" : "Tuesday",
        "We" : "Wednesday",
        "Th" : "Thursday",
        "Fr" : "Friday",
        "Sa" : "Saturday"
        }'/>
    <xsl:template match="/">
        <root>
            <text>Hi, Mom! Here’s some information:</text>
            <para>{
                serialize($mymap, map{"method":"json","indent":true()})
            }</para>
        </root>
    </xsl:template>

$stuff?row ...

In eXist-db, to create a map using a for loop use something like:

declare variable $map as map(*) := 
    map:merge(for $i in $realTitles return map:entry($i, count($items/tei:title[. eq $i])));

The map:entry() function creates anonymous separate one-item maps with the string values of $realTitles as the keys and the number of times each title appears in the corpus as the value. Wrapping the FLWOR in the map:merge() function merges the individual maps into a single map, which is assigned to the variable $map. Note the syntax of the value specified by the as operator, which is necessary (should we choose to specify a datatype) because maps are not traditional atomic types. To access the values of the map, use something like:

for $bg in map:keys($map)
let $en as element(en) := $titles[bg eq $bg]/en
order by $en
return <option value="{$bg}">{$en || ' (' || $map($bg) || ')'}</option>

This gets each key from the map, uses it to retrieve the English translation of a Bulgarian title from the $titles variable, and then also uses it as the single argument of the $map() function to retrieve the number of times the Bulgarian title appears in the corpus. Note that because maps are functions, instead of indexing into them with square brackets, we execute them with the key as the single argument to the function, and the argument is in parentheses, as is usual for functions.

Arrays

Add stuff here

XSLT 3.0

Boolean values

Boolean values can be expressed as any of true/1/yes or false/0/no. For example, to turn on pretty-printed output, set the value of the @indent attribute of <xsl:output> to any of true, 1, or yes.

Starting from a named template

If you set the value of the @name attribute of an <xsl:template> element to xsl:initial-template and run a transformation from the command line with the -it (= ‘initial template’) switch, the template named xsl:initial-template is now the default. Previously you had to specify the name of your initial template on the command line.

Content Value Templates

Like Attribute Value Templates, Content Value Templates let you specify that certain text should be intepreted as XPath instead of being output literally. The syntax for CVTs is the same as for AVTs: surround the expression in curly braces (to use a literal curly brace, double them), and multiple values are output with a single space between them. CVTs work ony if you create an @expand-text attribute on the root <xsl:stylesheet> element and give it a positive Boolean value. CVTs are similar to the use of curly braces in XQuery to switch from XML mode into XQuery mode, and they can be used in situations where you may previously have had to use <xsl:value-of> or something that converts its arguments to strings, like concat() or ||. Here’s an example:

<xsl:template name="xsl:initial-template">Hello, World! It’s {current-time()}</xsl:template>

The preceding is equivalent to:

<xsl:template name="xsl:initial-template">
    <xsl:text>Hello, World! It’s </xsl:text>
    <xsl:value-of select="current-time()"/>
</xsl:template>

or

<xsl:template name="xsl:initial-template">
    <xsl:value-of select="concat('Hello, World! It’s ', current-time())"/>
</xsl:template>

or

<xsl:template match="/">
    <xsl:value-of select="'Hello, World! It’s ' || current-time()"/>
</xsl:template>

@item-separator

The @item-separator attribute on <xsl:output> can be used to change the item separator from the default space to something else. Must be combined with @build-tree="no".

Shadow attributes

Shadow attributes mask regular attribute values, and have the same name as the regular attribute, but with a leading underscore.

Variables and functions

Functions can be assigned to a variable. To reference them, add parentheses after the variable name.

Creating HTML5

XHTML (XML-conformant) 5

If you serve your HTML5 as mime type application/xhtml+xml and want to validate it as XML, use:

<xsl:output method="xhtml" version="5.0" omit-xml-declaration="no" include-content-type="no"/>
HTML (not XHTML) 5

To create HTML5 (not XHTML) output, use

<xsl:output method="html" version="5"/>

This omits the XML declaration and creates a <meta> element inside the <head>.

Identity transformation

The identity transformation can be expressed in a single top-level <xsl:mode> element:

<xsl:mode on-no-match="shallow-copy"/>

Iteration

Iteration may sometimes be easier to write than recursion. The following code returns a running total of the integers from 1 through 10:

<xsl:iterate select="1 to 10">
    <xsl:param name="total" as="xs:integer" select="0"/>
    <xsl:variable name="newTotal" as="xs:integer" select="$total + ."/>
    <xsl:value-of select="concat($total, ' + ', . , ' = ' , $newTotal, '&#x0a;')"/>
    <xsl:next-iteration>
        <xsl:with-param name="total" select="$newTotal"/>
    </xsl:next-iteration>
</xsl:iterate>

This outputs the results of each iteration. To output only the final total, remove the <xsl:value-of> statement and use <xsl:on-completion>:

<xsl:iterate select="1 to 10">
    <xsl:param name="total" as="xs:integer" select="0"/>
    <xsl:on-completion select="$total"/>
    <xsl:variable name="newTotal" as="xs:integer" select="$total + ."/>
    <xsl:next-iteration>
        <xsl:with-param name="total" select="$newTotal"/>
    </xsl:next-iteration>
</xsl:iterate>

although for this contrived problem it would, of course, be simpler to write <xsl:value-of select="sum(1 to 10)"/>.

A recursive template call might look like:

<xsl:template match="/">
    <xsl:variable name="result">
        <xsl:call-template name="accumulate">
            <xsl:with-param name="total" select="0"/>
            <xsl:with-param name="range" select="1 to 10"/>
        </xsl:call-template>
    </xsl:variable>
    <xsl:sequence select="$result"/>
</xsl:template>
<xsl:template name="accumulate">
    <xsl:param name="total" as="xs:integer"/>
    <xsl:param name="range" as="xs:integer*"/>
    <xsl:choose>
        <xsl:when test="empty($range)">
            <xsl:sequence select="'done'"/>
        </xsl:when>
        <xsl:otherwise>
            <xsl:variable name="currentValue" as="xs:integer" select="$range[1]"/>
            <xsl:variable name="newTotal" as="xs:integer" select="$total + $currentValue"/>
            <xsl:value-of select=" concat($total, ' + ', $currentValue, ' = ', $newTotal, '&#x0a;')"/>
            <xsl:call-template name="accumulate">
                <xsl:with-param name="total" as="xs:integer" select="$newTotal"/>
                <xsl:with-param name="range" as="xs:integer*" select="remove($range, 1)"/>
            </xsl:call-template>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

This returns a report on each step plus the word done at the end. To see just the steps, make <xsl:when> an empty element. To return just the total, remove the <xsl:value-of> from the <xsl:otherwise> element and set the value of the sequence returned inside <xsl:when> to $total.

The following recursive function (not using <xsl:iterate>) adds an additional consecutive value to a sequence of integers and counts the number of values until it overflows the stack (at 1896, although your mileage may vary):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="#all"
    xmlns:djb="http://www.obdurodon.org" version="3.0">
    <xsl:function name="djb:stuff" as="xs:integer*">
        <xsl:param name="in" as="xs:integer*"/>
        <xsl:message select="count($in)"/>
        <xsl:variable name="new" as="xs:integer*" select="$in, $in[last()] + 1"/>
        <xsl:sequence select="djb:stuff($new)"/>
    </xsl:function>
    <xsl:template name="xsl:initial-template">
        <xsl:sequence select="djb:stuff(1)"/>
    </xsl:template>
</xsl:stylesheet>

The following example uses <xsl:iterate> to avoid the overflow. It has been instructed arbitrarily to stop adding values after 1000000 because otherwise it would never stop:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="#all"
    version="3.0">
    <xsl:template name="xsl:initial-template">
        <xsl:iterate select="1 to 1000000">
            <xsl:param name="in" as="xs:integer*" select="1"/>
            <xsl:variable name="new" as="xs:integer*" select="$in, $in[last()] + 1"/>
            <xsl:message select="count($in)"/>
            <xsl:next-iteration>
                <xsl:with-param name="in" as="xs:integer*" select="$new"/>
            </xsl:next-iteration>
        </xsl:iterate>
    </xsl:template>
</xsl:stylesheet>