Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2023-03-17T15:06:20+0000


XSLT assignment #1 answers

The assignment

Your assignment is to create an XSLT stylesheet that will transform Bad Hamlet into a hierarchical outline of the titles of acts and scenes in HTML. This isn’t very interesting on its own, of course, but if you were transforming the entire document into HTML for publication on the web, this might serve as the skeleton. It might also stand on its own as a table of contents at the top of such a publication, so that the reader could click on the title of a scene to jump to that location in the file.

Our solution

There are different ways to do this, but the details we found most important were:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" 
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math" 
    exclude-result-prefixes="#all"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0" 
    version="3.0">
    <xsl:output method="xhtml" html-version="5" 
        omit-xml-declaration="no" 
        include-content-type="no"
        indent="yes"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Hamlet</title>
            </head>
            <body>
                <ul>
                    <xsl:apply-templates select="//body/div"/>
                </ul>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="body/div">
        <li>
            <xsl:apply-templates select="head"/>
            <ul>
                <xsl:apply-templates select="div"/>
            </ul>
        </li>
    </xsl:template>
    <xsl:template match="div">
        <li>
            <xsl:apply-templates select="head"/>
        </li>
    </xsl:template>
</xsl:stylesheet>

Housekeeping

We modify the new XSLT skeletal document that <oXygen/> creates for us in the following ways (copied from our initial XSLT tutorial):

Template rules

Our solution made use of three template rules: for the document node (/), acts (body/div), and scenes (div).

The document template creates the basic structure of the HTML document we want to produce. It has the <html> root element along with the required <head> and <body> elements inside it. Inside the body, we create an unordered (that is, bulleted) list, denoted by the <ul> element, that will eventually hold five items, one for each act. Unordered lists in HTML are required to have content in the form of <li> (list item) elements, and we’ll create those later, when we actually process the acts. For now, we just create the wrapper <ul> element that will hold them. Between the start- and end- <ul> tags (since that’s where we eventually want our <li> elements to be created), we apply templates to (that is, process) all of the acts, which we point to with the XPath expression //body/div as the value of the @select attribute of the <xsl:apply-templates> elements.

The XPath expression //body/div, above, selects a sequence of all of the acts (all five act <div> elements), and the <xsl:apply-templates> elements says to get them processed, but it doesn’t specify how to process them. So how do the acts get processed? What happens is that all of the templates in the stylesheet are constantly watching for something to do, and when that <xsl:apply-templates> element fires, they all see that some <div> elements need to be processed. The first template (the one for the document node) doesn’t match <div> elements, so it doesn’t do anything. The other two templates both match <div> elements; the second template matches <div> elements that have a <body> parent and the third matches all <div> elements. So how is the conflict or competition resolved? Which one gets to handle the acts? Whenever there’s an ambiguity like that, the more specific match rule wins, so the second template rule, the one that matches body/div, takes charge of each of the five acts. (Alternatively, we could have had the last template rule match not just div, but specifically div/div, which would have removed the potential ambiguity. Whether you deal with those situations by writing only unambiguous and specific rules or by letting the more specific match rule wins principle make the decision, you’ll get the same result here.)

Each act returned by the path expression, then, triggers the second template, which creates an <li> element, and it creates that <li> element between the start- and end-tags for the <ul> that we created earlier. That’s the effect of putting the <xsl:apply-templates> element between those start- and end-tags; whatever happens as a result of that instruction will be written in the location where the instruction stood. Within that new <li> element, we tell the system to process (apply templates to) the <head> child of the current context (that is, the <head> child of the act we’re processing at the moment) and then to create a new <ul>, an embedded list. Within that new, embedded list it applies templates to another set of nodes, selected by the path expression div, which are the <div> children of the current act, that is, its scenes. Note that the <xsl:apply-templates> here executes XPath from the current context, which is the act we are processing at the moment, so it gets only the scenes of that act. Remember that the default axis is the child axis, so the expression div as the value of the @select attribute says to find all of the child <div> elements inside the current act.

When XSLT finds the <head> elements (just one for each act), it tries to apply templates, but we haven’t defined a template for <head> elements. How does the stylesheet know what to do? XSLT includes built-in, or default templates, which say that:

Note that this happens in two steps: first the built-in rule for elements processes the <head> element, so it throws away the tags and automatically applies templates to its only child, which is a text node. Since we don’t have a template that matches text nodes, the built-in template then writes the textual value of the text node into the output.

We can write templates that match text nodes if we need to, but it’s relatively uncommon. In this case we don’t have a template that matches text nodes, so the built-in template that matches text nodes processes them.

Since what we want in this case is to output the textual content of the text-node child of the <head> elements, we let the built-in rules do it for us. The effect is to print something like Act 1 before we list all of the scenes in the first act.

Let the computer do the work and don’t make unnecessary assumptions

Don’t make any assumptions in your XSLT that you don’t have to make. If the prose description of the task is process all of the acts, aim for a strategy that does not require you to know how many acts there are in advance.

After printing the header, while still inside the list item for the act we’re working on at the moment, we create another unordered list and apply templates to the results of the XPath expression div. Since our current context is a <div> element that represents an act, this finds all of the <div> children of the act that we are in. We know from studying the document that these elements must be scenes.

Use the child axis to process children of the current context

A common beginner error is to write select="//div" instead of select="div". The incorrect XPath expression starts with a double slash, so it starts at the document node and retrieves all <div> elements in the entire play. This means that you’ll include the same information from all 26 <div> elements five times, once per act. The correct XPath expression uses the default child axis to include information only from the <div> children of the current context item (the current act), that is, only from the scenes of one act at a time.

When the new <xsl:apply-templates select="div"/> element fires, then, the @select attribute rounds up the <div> children of the current context, which are the scenes in the current act. It doesn’t say how to process them though. As above, the template rules are always watching for new content, and once again the call has gone out to process some <div> elements. The first template rule, the one for the document node, doesn’t match, so it doesn’t do anything. The second template matches <div> elements only when they are the children of a <body> element. Since these new <div> elements are the children of a <div> element, the second template rule won’t touch them either. The third template rule, though, matches <div> elements without looking at their parents or anything else about them, so it takes charge in this case, and generates the list items for the scenes, inserting the values of the <head> children of the scenes, one by one, as they’re processed.

The result

Take-aways