Maintained by: David J. Birnbaum (djbpitt@gmail.com) 
            ![Creative Commons BY-NC-SA 3.0 Unported License [Creative Commons BY-NC-SA 3.0 Unported License]](http://www.obdurodon.org/images/cc/88x31.png) Last modified:
        
        2024-03-19T16:59:54+0000
        
    
    Last modified:
        
        2024-03-19T16:59:54+0000
This is an abridged version of our comprehensive XSLT user-defined functions tutorial. The full tutorial illustrates the development of a function to compute the mode of a sequence of integers, and is intended as preparation for XSLT functions assignment #1, which asks learners to develop their own function to compute a median. This abridged tutorial uses the median as an example, and is intended for a context where user-defined functions are introduced as a drive-by topic, that is, one that is practiced in class but has no accompanying homework assignment.
We can think of a function as a bit of code that accepts zero or more input items (called the function parameters) and returns a result (including, possibly, an empty sequence).
You already have experience using built-in XPath functions, but once you begin to
                develop any substantial XSLT transformations you are likely to discover a need for
                additional functions that you can use in your stylesheets similarly to the way you
                use standard XPath functions. For example, XPath has a built-in function to compute
                an arithmetic mean (avg()) but not to compute a
                median, and in this activity we’ll create a user-defined function to remedy that
                lack. We’ll then practice using it inside an XSLT stylesheet.
Statisticians work with three basic types of averages: the arithmetic mean, the median, and the mode. The median is
                the number in the middle of a sorted sequence of values. For example, the median of
                (3, 0, 3, 7, 8) is
                3 because 2 of the 5 values are less than or
                equal to 3 and 2 of the values are greater than
                or equal to 3. If there’s an even number of
                values there isn’t a single one in the middle, and in that case the median is
                defined as the mean of the two middle values. For example,
                (3, 0, 6, 5, 3, 4) has 6 values, and when we
                sort them, the middle 2 (the third and fourth) are
                3 and 4,
                so the median of the 6 values is 3.5, which is
                the mean of 3 and
                4.
A user may invoke a function with unusual input (sometimes called edge
                    cases). For example, what should a median function do if the input is an
                empty sequence? Should it accept only integers (whole numbers), or also doubles
                (numbers with a decimal point, like 3.14159)?
                What should it do if the input sequence includes non-numeric values? A function will
                always do something no matter what the input, even if that something is to raise an
                error, and it’s up to the developer to decide what the something should be.
In this activity we’re going to require our median function to accept a single parameter, which must be a sequence of one or more doubles. This means that it will raise an error if we call it with an empty sequence, with a sequence that includes non-numerical values, or with more than one argument. If we call it with valid input, it will return a single numerical value, which will be a double that represents the computed median of the input sequence.
The XSLT element <xsl:function> creates a
                user-defined function. A user-defined function, just like a standard XPath function,
                has a signature, which consists of its name (in a namespace),
                the input it accepts (parameters), and the result it returns.
                The rest of the function (the body) constructs the result, which is
                returned when the function is called. These parts are described in more detail
                below.
The name of a user-defined function must be in a user-defined namespace. It is
                    customary to use a URI as
                    a namespace value, but that is not required, and the URI is not obligatorily a
                        URL, which is to say that it
                    does not have to point to an existing resource on the Internet. Below we use
                    djb: as the namespace prefix and
                    http://www.obdurodon.org as the associated
                    namespace value. For your own projects you might want to use a short version of
                    your project name as the namespace prefix and the main URL of your project as
                    the associated namespace value.
The skeleton of a stylesheet that includes a user-defined function looks like the following:
  
   
  
  
  
  
]]>A function definition specifies zero or more parameters by including zero or more
                    empty <xsl:param> elements as children
                    of the <xsl:function> element. Our function
                    requires one argument, which must be a sequence of one or more doubles, so we
                    need to augment the function definition above by adding an
                    <xsl:param> child to our
                    <xsl:function> element:
  Datatype specifications use the same occurrence indicators as Relax NG, so the plus sign means that the input must be a sequence of one or more doubles.
When it comes time to use our parameters within the function body, we refer to
                    them by name, prefixed with a dollar sign, so that, for example, the parameter
                    we declare above can be referenced as $input.
                    Parameters are thus similar to XSLT variables: they are declared with a
                    @name attribute and given a name that does not
                    begin with a dollar sign, but when they are referenced subsequently, the dollar
                    sign is prepended to them. We’ll see an example of that use below.
A function returns the result of evaluating the code inside the function body.
                    Just as you should specify the datatype of input parameters by using the
                    @as attribute on
                    <xsl:param> elements, you should specify
                    the datatype of the result of a function by adding the
                    @as attribute to the
                    <xsl:function> element:
  Unlike in some other programming languages, there is no explicit return
                        statement; the result of the function that is returned is simply the
                    result of evaluating the function body. We implement the logic that computes the
                    median inside the function body, after any
                    <xsl:param> elements, as follows:
  
  
  
  
  
  
  
  
  
  
  
  
  
    
      
       
    
      
       
   
]]>We start by sorting the input values (line 15) and counting them (line 16), and
                    we bind the results of those operations to variables so that we can
                    reuse them below. Since the median is computed differently for an odd vs even
                    number of input values, we use
                    <xsl:choose> (lines 17–27) to manage the
                        branching. The XPath mod
                    operator returns the remainder of integer division, and if we divide the count
                    by 2, the result will be
                    0 if there is an even number of input items
                    and 1 if there is an odd number.
The median of an even number of sorted input values (lines 18–22) is the mean of
                    the middle two values. We know that the count is an even number, which means
                    when we divide by 2 the result will be an integer that corresponds to the
                    position in the sorted input just before the midpoint. For example, if there are
                    6 input values, 6 div 2 equals
                    3 and the third and fourth items straddle
                    the midpoint. We assign the value of our division to the variable
                    $half and then select the value at that
                    offset into the sorted input sequence plus the one after it, which gives us a
                    sequence of two values (in this case, the third and fourth items in the sorted
                    sequence of 6 items). We then use the standard XPath library function
                    avg() to average them.
The median of an odd number of values (lines 23–26) is the middle value in the
                    sorted sequence. We find that offset by performing integer division
                    (idiv, not
                    div; integer division ignores any
                    remainder) on the count and adding 1. For example, if there are 5 items in the
                    sequence 5 idiv 2 equals
                    2, and when we add 1 to that, 3 is the
                    middle position in a sequence of 5 items.
You can call a user-defined function the same way you call a standard library
                function, so djb:median((1, 3, 2, 4, 3)) returns
                the value 3. The five acts of Hamlet
                have 5, 2, 4, 7, and 2 scenes, respectively, so the median number of scenes is 4.
                You can confirm that by transforming bad-hamlet.xml
                with the following stylesheet:
  
    
    
    
    
    
    
    
    
    
    
    
    
    
      
        
         
      
        
         
     
   
  
  
  
  
    
      
        Scenes per act 
      
      
        Scenes per act
        
          
            Act 
            Scenes 
           
          
            Median 
            
               
           
        
      
    
   
  
    
      
         
      
         
     
   
 
]]>The result of that transformation is:
    
        Scenes per act 
    
    
        Scenes per act
        
            
                Act 
                Scenes 
             
            
                1 
                5 
             
            
                2 
                2 
             
            
                3 
                4 
             
            
                4 
                7 
             
            
                5 
                2 
             
            
                Median 
                4 
             
        
    
]]>