Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2023-03-20T16:27:13+0000


XSLT assignment #2 answers

The assignment

Write an XSLT stylesheet that will transform the XML input document into an HTML document that consists entirely of tables of characters and factions. You can see the desired output at http://dh.obdurodon.org/skyrim-02.xhtml.

Our solution

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.w3.org/1999/xhtml"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="#all"
    version="3.0">
    <xsl:output method="xhtml" html-version="5" omit-xml-declaration="no" include-content-type="no"
        indent="yes"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Skyrim</title>
                <style>
                    table { border-collapse: collapse; }
                    table, th, td { border: 1px solid black; }
                </style>
            </head>
            <body>
                <h1>Skyrim</h1>
                <h2>Cast of characters</h2>
                <table>
                    <tr>
                        <th>Name</th>
                        <th>Faction</th>
                        <th>Alignment</th>
                    </tr>
                    <xsl:apply-templates select="//cast/character"/>
                </table>
                <h2>Factions</h2>
                <table>
                    <tr>
                        <th>Name</th>
                        <th>Alignment</th>
                    </tr>
                    <xsl:apply-templates select="//cast/faction"/>
                </table>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="character">
        <tr>
            <td>
                <xsl:apply-templates select="@id"/>
            </td>
            <td>
                <xsl:apply-templates select="@loyalty"/>
            </td>
            <td>
                <xsl:apply-templates select="@alignment"/>
            </td>
        </tr>
    </xsl:template>
    <xsl:template match="faction">
        <tr>
            <td>
                <xsl:apply-templates select="@id"/>
            </td>
            <td>
                <xsl:apply-templates select="@alignment"/>
            </td>
        </tr>
    </xsl:template>
</xsl:stylesheet>

Preliminaries

Before anything else: the Skyrim XML is not in a namespace. This means that you should not include an @xpath-default-namespace attribute in your XSLT. If you make the mistake of specifying, say, the TEI namespace as the default, your templates will match only TEI elements, and since there aren’t any TEI elements in Skyrim, your templates will match nothing. That isn’t an error, but it is a mistake and it isn’t what you want.

To keep everything in one place for this answer key, so that you don’t have to look at multiple files simultaneously, we’ve used a <style> element in the <head> to hold the CSS that specifies a border for the table. However, in your projects you’ll want to declare all CSS rules in a separate CSS stylesheet, so that you can assign the same styles to multiple HTML documents without having to repeat the same CSS instructions inside each of those files. The old @border attribute on <table> elements is officially non-conforming in HTML5, which means that although it may work in your browser, it should be avoided; best practice for specifying a table border is now to use CSS.

Template rules

Before writing any code to extract the information that is going to populate the rows of our tables, we begin by creating the superstructure of our HTML document, as we did in XSLT assignment #1. As always, the HTML document that we create has an <html> root element with two children, <head> (with <title> and, in this case, although not in your projects, <style>) and <body>. Inside the body, we create a prominent <h1> element to title our page as Skyrim, and then an <h2> subtitle followed by a <table> for the Cast of characters and another <h2> and <table> for Factions. You can read more about HTML tables at http://www.w3schools.com/html/html_tables.asp.

The tables that we are creating need to be filled. HTML tables are constructed row by row, and rows are defined by <tr>. Rows contain cells, of which there are two types: <th> (table header) is used to label a row or column and <td> is used to represent actual data. When, inside our template rule for the document node, we create the start- and end-tags for the two tables themselves, we also create the header rows for each table (<tr> containing <th>), since we want to create the <table> tags and the header rows just once per table. Since every document has exactly one document node, the template that matches the document node will fire exactly once, so by creating the tables and their header rows in that template, we ensure that we create each table, with its single header row, just once.

Meanwhile, though, although we want just a single header row, we want to create a separate data row for each character or faction. Since we want to create one row per character or faction, we need to create those in a template that fires once per character or faction.

Our goal in the first table is to create a row for every character, and that row should contain three cells, the first with the character’s name, the second with the character’s faction, and the third with the character’s alignment. Every character and every faction is listed in the <cast> element in our XML document, and so we want to look there to retrieve the data that we’re going to insert into our table rows. In other words, we need to apply templates to each of the <character> and <faction> elements that we find inside <cast> to create rows for the HTML tables we are creating. Since we want to create rows with information about characters after the header row for that table, we put the <xsl:apply-templates select="//cast/character"> element right there, immediately after the first table row (the header row) inside the table. The value of the @select attribute tells our stylesheet to navigate directly to the <cast>, to find all of its <character> children, and then to apply templates to those <character> elements, putting content exactly where the <xsl:apply-templates> element was. That tells the stylesheet where we want to process <character> children of <cast> (not other <character> elements!), but it doesn’t say how. In order to communicate to the stylesheet how we want to process characters, we need to define a template for <character> elements that tells our stylesheet specifically what to do.

The template we use has a @match attribute with the value character, which will match any <character> it sees, which in this case means the ones that are thrown at it when the <xsl:apply-templates select="//cast/character"/> element fires inside the first table, the one we create in the template rule for the document node. (The template that matches those characters would also match any other <character> elements it might see, but it never sees any others, since this stylesheet never applies templates to anything outside the <cast> element.) Inside the template rule for <character> we create a <tr>, which will be inserted into the table we’re creating in exactly the place where the <xsl:apply-templates select="//cast/character"/> element was located, that is, right after the header row. This new row has three <td> elements to hold the data for the character we’re processing at the moment. That data is retrieved because each of the data cells contains its own <xsl:apply-templates> element, which selects the @id, @alignment, and @loyalty attributes of the particular <character>, respectively. Since all we want from those attributes is their textual value, we don’t have to define our own templates to handle them; we can rely instead on the behavior of the build-in default template, which just outputs the textual value of any attribute that does not have its own template rule. As a result, each data cell will come to be populated with the string value of the targeted attribute.

We follow a similar process for creating the table listing each faction. We call <xsl:apply-templates select="//cast/faction">, define a template for factions (<xsl:template match="faction"/>), and create a table row inside the template with data cells applying templates to @id and @alignment. Keep in mind that <faction> elements only have two properties, so our table rows for factions will have only two data cells.

The result

Skyrim

Cast of characters

Name Faction Alignment
UrielSeptim empire blades good
hero neutral neutral
Jauffre empire blades good
MartinSeptim empire blades good
MehrunesDagon daedra evil
MankarCamoran daedra MythicDawn evil

Factions

Name Alignment
MythicDawn evil
blades good
daedra evil
empire good
DarkBrotherhood neutral

The XML document includes <character> and <faction> elements in different contexts: some are children of <cast> and others are descendants of <body>. Our templates that match <character> and <faction> elements don’t specify a context, so they would match elements of those types both in the <cast> and in the <body>. Why, then, isn’t our output cluttered with unwanted <character> and <faction> elements elements from within the <body>?

The answer is that we never apply templates to anything inside the <body>. In order for a template to fire on an element, two things have to happen:

  1. We have to apply templates to the element.

  2. A template has to match the element.

Since we never apply templates to <character> and <faction> descendants of <body>, the first requirement is never met and those instances of <character> and <faction> elements are not processed. If we were processing both the <cast> and the <body>, though, we would want <character> and <faction> elements in those two contexts to be processed differently, so we would need separate templates that match the element types in each context. This is similar to the way we wrote separate templates to match act and scene <div> elements in the first XSLT assignment, since we needed to process those <div> elements differently in different contexts.

Streamlining the XSLT

You can stop reading here if you’d like, and something like the code above is a fine solution for this assignment. At the same time, that XSLT has a lot of fragmentation and repetition. The character table and the faction table have a lot in common, yet we create them separately, and we treat all attributes pretty much the same way, yet we create a <td> to hold each type of attribute separately. We can make our XSLT more concise, and therefore easier to maintain, in the following ways:

Process all attributes in a single template

Instead of our current templates for creating character and faction rows, we can write:


    
        
    


    
        
    


    
        
    
]]>

The XPath expression @* matches any attribute. When we apply templates to the attributes for the two element types we have to use the comma operator to combine them because the comma operator specifies the order. Although attributes inside a start-tag may look ordered to us, The Real XML is a tree, and not tags, and the order in which attributes are listed inside the start-tag is not part of the information available in the tree. Were we to apply templates to all of the attributes of the current context node with <xsl:apply-templates select="@*"/> there would be no guarantee that they would be output in the order in which they appear inside the start-tag. By specifying their order with the comma separator, though,we tell XSLT to apply templates to them in the order listed in the XSLT.

Process characters and factions together

As a result of this consolidation our XSLT has only one location where it creates <td> elements, removing a lot of repetition. We can consolidate further, though. Currently we create rows for characters in one template and rows for factions in a different template, but the processing is otherwise the same: we create a <tr> element and, inside it, apply templates to the attributes of the element we’re processing in a specific order. We can take advantage of the fact that although the attributes on characters and factions differ, those that are present on both types of element observe the same order. This means that we can further refactor the code above as:


    
        
    


    
        
    
]]>

The first template matches anything that is either a <character> or a <faction> element, so we no longer need separate templates to process those two types of elements. This approach works even though characters and factions have different attributes because applying templates to something that doesn’t exist (<faction> elements do not have @loyalty attributes) is not an error. What the code says is apply templates to all of my @id, @loyalty, and @alignment attributes in that order. What happens with characters is straightforward: we get three cells per character, populated with information from those attributes in the specified order. What happens with factions is that it applies templates first to all of a faction’s @id attributes (there is always exactly one), then to all of its @loyalty attributes (there are never any, so it applies templates to all zero of them—that is, it does nothing), and then it applies templates to all the faction’s @alignment attributes (there is always exactly one).

Our revised XSLT now looks like:



    
    
        
            
                Skyrim
            
            
            
                

Skyrim

Cast of characters

Name Faction Alignment

Factions

Name Alignment
]]>

But that’s not all …

There’s one more conspicuous repetition: for both characters and factions we create an <h2> header and a table with a row of labels above the actual table data. There is the further complication that the column labels are title-cased versions of the attribute names (e.g., the attribute @alignment goes in a column labeled Alignment), except that we want the first column to be headed Name, and not Id, and we have to specify that because it can’t be computed without more information. We also cannot compute the content of the <h2> headers automatically because there is no natural way for XSLT to know, unless we specify it, that <character> elements go in a table labeled Cast of characters and <faction> elements go in a table labeled Factions. Finally, there isn’t a particularly natural way for XSLT to look at the contents of the <cast> element and know to create one table for all <character> elements and one for all <faction> elements.

In Real Life we would probably stop here because the overhead of trying to combine the character and faction processing further would offset any savings realized through the consolidation. If, though, we had to create hundreds of tables instead of just two, the consolidation would pay off by removing a lot of repetition, and therefore a lot of unnecessary opportunity for error. Here’s what that might look like:



    
    
    
    
    
    
    
    
    
        
            
                Skyrim
                
            
            
                

Skyrim

Cast of characters Factions

Name Loyalty Alignment
]]>

The XSLT above recruits some advanced features that we haven’t seen before: