Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2021-12-27T22:03:57+0000


User-defined functions in XSLT: exercise 1

Context

For your last assignment you read a tutorial that introduced XSLT user-defined functions. The sample task that we used in that tutorial to illustrate how to create and use a user-defined function involved computing the arithmetic mode(s) of a sequence of integers, and then using that function to find the modal number of scenes in the acts of Hamlet.

The tutorial discussed two of the three types of average in common use in statistical overviews, the mean and the mode, but it made only passing reference to the median, which is the third type of average. The median is determined by arranging all of the values in order and selecting the one in the middle. The median is useful when the input data is unbalanced because it is not as strongly affected by outliers as the mean. For example, if all 100 persons who live in a hypothetical city earn $50,000/yr each and Elon Musk suddenly moves in and earns $2.3 billion (in unrealized stock options, rather than salary, but it’s reasonable to think that that is nonetheless a type of earned wealth), the mean earnings for the 101 persons is $22,821,782.18 (the sum of all 101 earnings amounts divided by 101). While in a certain sense this is the average earning of those persons, in a different sense the average person earns $50,000, which is the median value, since if we arrange the 101 values in order from least to most, the middle one is $50,000. None of the three types of average tells the whole story (the whole point of statistics is to summarize at the expense of individual details), but each nonetheless reports something that is true about the data.

When we constructed a function to compute a mode in the tutorial, we required the input to be sequences of one or more integers. We made that decision for the following reasons:

For this exercise we will assume that computing the median of an empty sequence is an error, but any other sequence of numerical values (integer or double; negative, positive, or zero) is acceptable.

The task

Your task for this assignment is to write a function that computes an arithmetic median and applies it to finding the median number of scenes per act Hamlet. The actual median value for the five scene counts (5, 2, 4, 7, 2 for Acts 1–5, respectively) in Hamlet is 4 because 2 of the 5 input counts are lower than 4 (2 instances of 2) and 2 are higher (5, 7).

Although the count of scenes in an act of Hamlet will always be a positive integer, you should write your function in a more general way that allows one or more doubles as input. If a function requires doubles and you give it integers it will (in most cases) convert them to doubles internally (this is called type promotion; see Kay, p. 548), so requiring doubles does not prevent you from submitting integers. There are three possible types of input: a sequence that contains an odd number of values, a sequence that contains an even number of values, and an empty sequence.

How to proceed

You can copy the sample XSLT in the Calling our user-defined function inside an XSLT transformation section of our tutorial and add your own <xsl:function> element, alongside ours. You may either use our user-defined function namespace or declare and use your own. You can then add a line to report the median number of scenes near the place where we output the mean and mode.

The logic to compute a median is easy to state in plain language, although the statements are a bit different depending on whether you have an odd or an even number of values. Here are a few code snippets that you may find helpful:

To test your function and ensure that it works with different types of input, you can insert lines where you supply alternative input. For example, if you add to the XSLT something like:

]]>

near the area where you are reporting descriptive statistics for Hamlet, it will compute the mode of the direct input. You should test, for your median function, odd numbers of values, even numbers of values, integers, doubles, and an empty sequence (the empty sequence should raise an error).

You can input an empty sequence, for testing purposes, as djb:median(()), with inner parentheses for reasons explained in the tutorial (in the first box in the Calling our user-defined function inside an XSLT transformation section). Alternatively, you can ask for the median of counts from Hamlet that don’t exist, along the lines of:

]]>

This expression will return an empty sequence because there is no element called <play> in this markup.

What to submit

Submit just your XSLT, which we will run against Hamlet and with test data to verify your function for computing the median.