Maintained by: David J. Birnbaum (djbpitt@gmail.com)
Last modified:
2021-12-08T20:52:03+0000
The task was to create a function that could compute a median value. The sample below embeds the function inside a stylesheet that tests it by generating output for both supplied input and some information extracted from Hamlet. See the discussion below the code block for details.
(
)
]]>
The function itself occupies lines 10–37, half of which are comments. The program requires,
as input, a single sequence of one or more integers (line 23), which we bind to the
parameter name $input
. Our strategy for finding the median
value depends on sorting the input, so we do that on line 24 and assign the sorted sequence
to a variable called $sorted
. We also need to know how
many values there are, so we count them and save the count in a variable called
$count
(line 25).
The value of the median is computed differently according to whether there is an odd or
even number of integers in the input sequence. If the count is odd, the middle value, after
sorting, is the median. If the count is even there is no single middle value, so we take the
two closest to the middle and compute their arithemetic mean, for which we use the standard
XPath library function avg()
.
We use <xsl:choose>
(lines 26–36) to branch our
processing according to whether the count is odd or even. To determine that we use the
mod
operator (line 27), which performs integer division
and returns just the remainder. If we divide an odd number, such as
5
, by 2
the remainder is
1
and if we divide an even number, such as
6
, by 2
the remainder is
0
. Accordingly, we use
$count mod 2
and test whether the remainder of that
division is equal to zero, which lets us treat odd and even counts appropriately.
If the count is even (our <xsl:when>
element, lines
27–31), we divide the count by 2
. The result will always
be an integer, and if we use it in a numerical predicate to index into the sorted sequence,
it will point to the item to the immediate left of the midpoint. For example, if our sorted
integers are 1, 4, 9, 16, 25, 36
, when we divide the count
by 2
the result is 3
,
and the third item in the sorted sequence is 9
. Since the
median of an even number of items is the mean of the middle two after sorting, we retrieve
the value to which half of the count points and the one after it and average them (line 30).
Because we select that computed value inside
<xsl:sequence>
, that’s what our function
returns.
The situation with an odd number of values is easier. Dividing an odd number by
2
would return a non-integer (e.g.,
5 div 2
returns 2.5
),
and a numerical predicate needs to be an integer, so instead of
div
we use idiv
, which
performs integer division and throws away the remainder (e.g.,
5 idiv 2
returns 2
). To
find the midpoint we add 1
to the result of the integer
division and use that value to index into the sorted sequence.
To test our function we create a variable called
<tests>
(lines 43–50) and use the
to
operator (lines 45, 48) to generate a couple of test
sequences of integers (e.g., 1 to 5
returns
1, 2, 3, 4, 5
). We use
<xsl:for-each>
(lines 52–68) to call our function to
compute the median of each of the test sequences, outputting some diagnostic information
along with the result: original sequence (line 55), whether it’s odd or even (lines 57–61),
and the median (line 65). Because we store our test sequences in elements the system doesn’t
know that they’re supposed to be integers (the value of the element could, for example, be a
string, the characters of which just happen to be predominantly digits), so we explicitly
split up the sequence with tokenize()
and cast each value
to integer. We don’t have to do that with the Hamlet data because we’re using
the count()
function, and our XSLT processor knows that
the output of count()
is an integer.
We output the median numbers of scenes per acts (lines 74–89), speeches per act (lines 90–104), and speeches per scene (lines 105–119) for Hamlet. In each case we output the original input, the input after sorting, and the median. The procedure is to use familiar XPath to locate the items we want to count and count them, and we then use the count as input into our function that computes the median.