# <oo>→<dh> Digital humanities

## User-defined functions in XSLT: exercise 1

The task was to create a function that could compute a median value. The sample below embeds the function inside a stylesheet that tests it by generating output for both supplied input and some information extracted from Hamlet. See the discussion below the code block for details.

``````

(

)

]]>``````

### How the function works

The function itself occupies lines 10–37, half of which are comments. The program requires, as input, a single sequence of one or more integers (line 23), which we bind to the parameter name `\$input`. Our strategy for finding the median value depends on sorting the input, so we do that on line 24 and assign the sorted sequence to a variable called `\$sorted`. We also need to know how many values there are, so we count them and save the count in a variable called `\$count` (line 25).

The value of the median is computed differently according to whether there is an odd or even number of integers in the input sequence. If the count is odd, the middle value, after sorting, is the median. If the count is even there is no single middle value, so we take the two closest to the middle and compute their arithemetic mean, for which we use the standard XPath library function `avg()`.

We use `<xsl:choose>` (lines 26–36) to branch our processing according to whether the count is odd or even. To determine that we use the `mod` operator (line 27), which performs integer division and returns just the remainder. If we divide an odd number, such as `5`, by `2` the remainder is `1` and if we divide an even number, such as `6`, by `2` the remainder is `0`. Accordingly, we use `\$count mod 2` and test whether the remainder of that division is equal to zero, which lets us treat odd and even counts appropriately.

If the count is even (our `<xsl:when>` element, lines 27–31), we divide the count by `2`. The result will always be an integer, and if we use it in a numerical predicate to index into the sorted sequence, it will point to the item to the immediate left of the midpoint. For example, if our sorted integers are `1, 4, 9, 16, 25, 36`, when we divide the count by `2` the result is `3`, and the third item in the sorted sequence is `9`. Since the median of an even number of items is the mean of the middle two after sorting, we retrieve the value to which half of the count points and the one after it and average them (line 30). Because we select that computed value inside `<xsl:sequence>`, that’s what our function returns.

The situation with an odd number of values is easier. Dividing an odd number by `2` would return a non-integer (e.g., `5 div 2` returns `2.5`), and a numerical predicate needs to be an integer, so instead of `div` we use `idiv`, which performs integer division and throws away the remainder (e.g., `5 idiv 2` returns `2`). To find the midpoint we add `1` to the result of the integer division and use that value to index into the sorted sequence.

### Testing our function

To test our function we create a variable called `<tests>` (lines 43–50) and use the `to` operator (lines 45, 48) to generate a couple of test sequences of integers (e.g., `1 to 5` returns `1, 2, 3, 4, 5`). We use `<xsl:for-each>` (lines 52–68) to call our function to compute the median of each of the test sequences, outputting some diagnostic information along with the result: original sequence (line 55), whether it’s odd or even (lines 57–61), and the median (line 65). Because we store our test sequences in elements the system doesn’t know that they’re supposed to be integers (the value of the element could, for example, be a string, the characters of which just happen to be predominantly digits), so we explicitly split up the sequence with `tokenize()` and cast each value to integer. We don’t have to do that with the Hamlet data because we’re using the `count()` function, and our XSLT processor knows that the output of `count()` is an integer.

### Medians and Hamlet

We output the median numbers of scenes per acts (lines 74–89), speeches per act (lines 90–104), and speeches per scene (lines 105–119) for Hamlet. In each case we output the original input, the input after sorting, and the median. The procedure is to use familiar XPath to locate the items we want to count and count them, and we then use the count as input into our function that computes the median.