Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2022-04-13T21:48:52+0000


SVG assignment #3 answers

Overview

For this assignment, we asked you to create a bubble chart for US presidential elections from 1900 to 1912, representing:

  1. Election year along the X axis
  2. Electoral votes along the Y axis
  3. Popular vote percentage by circle area
  4. Candidate’s party by circle color

Sample solution


    
    
    
    
    
    
    
    
    
    
    
        
            
            
            
            Candidates’ electoral and popular votes
            Election
                Year
            Electoral college votes
            
            
            
            Republican
            
            Democrat
            
            Third party
            
            
            
                50% of 
                popular vote
            
            
            
            
            
            
            
            
            
                
                
                
                    
                
            
        
    
    
        
        
        
        
        
        
        
        
            
        
        
        
        
        
        
            
        
    
    
        
        
        
        
        
        
        
        
        
        
        
        
        
            
                
            
        
        
        
        
        
    

]]>

Discussion

Comments and variable names

The way you style your comments is up to you, but every XSLT stylesheet should include comments that help you find your way quickly to the part of the code that you need to work on at the moment.

You may have read Other People’s Code that uses variable names like $a, $b, or $c. This type of meaningless variable naming is Bad Practice; what you should do instead is use variable names that are self-documenting. Not only do self-documenting variable names make your code easier to understand in general, but they relieve you of having to write as many comments. For example, if you call the position along the X axis something like $xpos, you’ll understand what it represents just from the name, so you won’t have to write a comment to explain it.

Stylesheet (global) variables

We begin by declaring a few stylesheet variables, that is, variables that can be used anywhere in the transformation. We’ve let XPath compute the maximum width according to the number of elections (line 11), which means that the value will adjust itself if we use a smaller or larger data set. We hard-coded the values for the other stylesheet variables (lines 9–10), although in Real Life we might compute them according to what the data tells us, as we do for the maximum width. For example, the $max_height variable (line 9), which represents the maximal value that will need to be registered on the Y axis, depends on the maximal electoral votes received by any candidate in our data. Instead of hard-coding a value of 500, we could have used max(//@electoral_votes) to let XPath tell us that the maximum number of electoral votes earned by any candidate in these particular elections is 435.

If we want the Y position of the top ruling line to be the smallest multiple of 100 that is above the largest attained number of electoral votes (in this case, the top ruling line is at 500 because the maximum attained number of votes is 435), we could use:

(max(//@electoral_votes) div 100) => ceiling() * 100

The XPath ceiling() function rounds a numerical value up to the closest integer. This means that we can start by finding the largest electoral vote value (435), divide by 100 (4.35), round up (5), and then multiply by 100 (500).

Templates

Our stylesheet has three templates, one that matches the document node and creates the SVG output (including horizontal ruling lines, axis labels, and legend; lines 15–66), one that processes each election (including vertical ruling lines; lines 67–86), and one that processes each candidate (bubble, bubble label, black center dot; lines 87–118). Where possible, we prefer to let the data tell us how to position the different components of the chart, so that, for example, we used the maximum width (a stylesheet variable computed from the number of elections) to determine the X position of the legend. Whether to hard-code a value or compute it on the basis of the data is up to you, but in general it’s better to compute, since with a hard-coded value you won’t remember what it means or why you chose the value you did if you need to change it to accommodate new data later. (Hard-coded values that are not self-documenting are called magic numbers, and are generally considered poor practice.)

Template variables (that is, variables created inside a template) are computed afresh each time the template is called. In our solution, the value of position() inside the template that matches <election> elements (lines 72–73) returns the ordinal position of the election in the sequence of elections that we are processing; for example, the value of the position() function when we process the first election is 1, the value when we process the second election is 2, etc. We use these different position() values to compute where on the X axis to locate the information for each election. We bind that computed value to a variable called $xpos within the template that processes elections and use it there to position the vertical ruling line (line 82) and its label (lines 83–85).

We compute $xpos in the template that matches <election> elements because the vertical ruling line and the X-axis label are properties of the election. But that same horizontal position would be useful for plotting the circles for each candidate in the election, and those get drawn in a different template, the one that matches <candidate> elements. This poses a challenge because template variables (that is, variables computed inside a template, like the $xpos variable computed inside the template that matches <election> elements) are available only inside the template where they are created, and this means that $xpos is not automatically available inside the template that matches <candidate> elements. We could just recompute it inside that template, but a better alternative is to pass it from the template that matches <election> elements into the one that matches <candidate> elements elements by using <xsl:with-param> and <xsl:param>.

Using <xsl:with-param> and <xsl:param>

You don't have to use <xsl:with-param> and <xsl:param> to complete this task because you can instead compute the horizontal position of the circles inside the template that matches <candidate> elements by using count(preceding::election) + 1. That strategy works because candidates in the first election have no preceding elections (your parent does not precede you because it contains you; to precede you it would have to start and finish before you, but although your parent starts before you, it finishes after you), candidates in the second election have one preceding election, etc. If, though, we instead use <xsl:with-param> and <xsl:param> we can avoid having to compute the same election-specific values more than once.

We process the candidates for each election in a template that matches <candidate> elements (lines 87–119), and since all candidates have the same horizontal position as the vertical ruling line for the election, for which we’ve already computed $xpos, instead of recomputing it we can pass it from the template that matches <election> elements into the template that matches <candidate> elements. This strategy lets us compute the horizontal position just once and then use it for all of the items that depend on it for their position, both those that are election-specific and those that are candidate-specific. When we pass a variable with <xsl:with-param> inside <xsl:apply-templates> (lines 76–78), the called template has to retrieve that value with <xsl:param> (line 91). The <xsl:with-param> (in the caller) and <xsl:param> elements (in the called template) thus work together to make variables declared inside one template accessible inside another.

Rendering order and Z axis

You may not have encountered the term Z axis before, but just as the X and Y axes describe a location in two-dimensional space, the Z axis adds a third dimension, and represents which items are on top of others—not in the sense of vertically above, but in the sense of covering them up the way a tablecloth is on top of a table and masks the surface. SVG lets you specify the X and Y position directly for most objects, so, for example, an object with a larger X axis value is to the right of an object with a smaller value. SVG does not specify Z axis in the same way, though. What SVG does instead is render objects in the order in which they are created from top to bottom in the angle-bracketed SVG source. This means that if you draw two overlapping fully opaque circles, the one you draw second will mask out the one you draw first in the area where they overlap. For example, the code:


    
    
]]>

draws the blue circle after it draws the red one, so the blue masks the red where they overlap. If you change the order of the two <circle> elements in the raw SVG, the red will instead mask the blue.

The order of the code that creates the SVG objects matters because you have to decide which should be on top. In our experience Z order is especially important with ruling lines, which are common on graphs; sometimes we want the data objects (rectangles, circles, etc.) to mask the ruling lines, and at other times we want the ruling lines to cross or run through the shapes.

Coloring bubbles using a variable and conditional expressions

You may have approached the question of bubble color in various ways, such as through using multiple templates or <xsl:choose>. In order to draw circles of different colors within the same template we created a variable named $bubble_color, which we used as the value of the @fill attribute for our circle. The @select value of $bubble_color uses a conditional XPath expression, which you can read about on page 551 of Michael Kay. The syntax is generally self-explanatory: if establishes a condition that must be true for the subsequent then value to be used. else states what happens in all other cases, which can be either a specific value (e.g., if @party is neither 'Republican' or 'Democrat', then the value of @$bubble_color will be 'green') or another test (if the value of @party isn’t 'Republican', we test whether it’s 'Democrat'). Note that in XPath, unlike in some other programming languages, the condition (after the if keyword) must be enclosed in parentheses and the else clause must be present, even if you want to do nothing if the condition is not true. The way to say that the else option is to do nothing is to specify the value as (), that is, as an empty sequence.

Our conditional expression means that $bubble_color will take different values under different conditions. When we call $bubble_color in the @fill value, then, the value used will have been set in a way that reflects the party of the candidate the XSLT is currently processing.

if/then/else statements can become difficult to read beyond a couple of levels of nesting, and you can use an XPath map structure as an alternative way of looking up a color according to the candidate’s party. Maps are not included in Michael Kay’s book because they were introduced with XPath 3.0, and his book covers XPath only through version 2.0, but they are addressed in the documentation for Saxon (the XPath engine build into <oXygen/>).