Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2019-03-06T03:23:07+0000


XSLT assignment #1 answers

The assignment

Your assignment is to create an XSLT stylesheet that will transform Bad Hamlet into a hierarchical outline of the titles of acts and scenes in HTML. This isn’t very interesting on its own, of course, but if you were transforming the entire document into HTML for publication on the web, this might serve as the skeleton. It might also stand on its own as a table of contents at the top of such a publication, so that the reader could click on the title of a scene to jump to that location in the file.

Our solution

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0" xmlns="http://www.w3.org/1999/xhtml"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="3.0">
    <xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Hamlet</title>
            </head>
            <body>
                <ul>
                    <xsl:apply-templates select="//body/div"/>
                </ul>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="body/div">
        <li>
            <xsl:apply-templates select="head"/>
            <ul>
                <xsl:apply-templates select="div"/>
            </ul>
        </li>
    </xsl:template>
    <xsl:template match="div">
        <li>
            <xsl:apply-templates select="head"/>
        </li>
    </xsl:template>
</xsl:stylesheet>

Template rules

Our solution made use of three template rules: for the document node (/), acts (body/div), and scenes (div). (Instead of a template for the document node, we could have used a template for the root element (<TEI>), since either of those would have provided an opportunity for us to create the HTML superstructure in the output.)

The document template creates the basic structure of the HTML document we want to produce. It has the <html> root element, and the necessary <head> and <body> elements inside it. Inside the body, we create an unordered list, denoted by the <ul> element, that will eventually hold five items, one for each act, in a bulleted list (without numbering). Unordered lists in HTML are required to have content in the form of <li> (list item) elements, and we’ll create those later, when we actually process the acts. For now, we just create the wrapper <ul> element that will hold them. Between the start and end <ul> tags (since that’s where we eventually want our <li> elements to be created), we apply templates to (that is, process) all of the acts, which we point to with the XPath expression //body/div as the value of the @select attribute of the <xsl:apply-templates> elements.

The XPath expression //body/div, above, is essentially an instruction to create a sequence of all of the acts (all five act <div> elements), and the <xsl:apply-templates> elements says to get them processed, but it doesn’t specify how to process them. So how do the acts get processed? What happens is that all of the templates in the stylesheet are constantly watching for something to do, and when that <xsl:apply-templates> element fires, they all see that some <div> elements need to be processed. The first template (the one for the document node) doesn’t match <div> elements, so it doesn’t do anything. The other two templates both match <div> elements; the second template matches <div> elements that have a <body> parent and the third matches all <div> elements. So how is the conflict or competition resolved? Which one gets to handle the acts? Whenever there’s an ambiguity like that, the more specific match rule wins, so the second template rule, the one that matches body/div, takes charge of each of the five acts. (Alternatively, we could have had the last template rule match not just div, but specifically div/div, which would have removed the potential ambiguity. Whether you deal with those situations by writing only unambiguous and specific rules or by letting the more specific match rule wins principle make the decision, you’ll get the same result here.)

Each act returned by the path expression, then, triggers the second template, which creates an <li> element, and it creates that <li> element between the start and end tags for the <ul> that we created earlier. That’s the effect of putting the <xsl:apply-templates> element between those start and end tags; whatever happens as a result of that instruction will be written in the location where the instruction stood. Within that new <li> element, we tell the system to process (apply templates to) the <head> child of the current context (that is, the <head> child of the act we’re processing at the moment) and then to create a new <ul>, an embedded list. Within that new, embedded list it applies templates to another set of nodes, selected by the path expression div, which are the <div> children of the current act, that is, its scenes. Note that the <xsl:apply-templates> here executes XPath from the current context, which is the act we are processing at the moment, so it gets only the scenes of that act. Remember that the default axis is the child axis, so the expression div as the value of the @select attribute says to find all of the child <div> elements inside the current act.

When XSLT finds the <head> elements (just one for each act), it tries to apply templates, but we haven’t defined a template for <head> elements. How does the stylesheet know what to do? XSLT includes built-in, or default templates, which say that if we’re processing an element that has only textual content and there is no template rule telling us what to do with it, we should just output its text. Since that’s what we want in this case, we let the built-in rule do it for us. The effect is to print something like Act 1 before we list all of the scenes in the first act. Note that we don’t type word Act or the number 1; that gets retrieved from the <head> element in the input XML document.

After printing the header, while still inside the list item for the act we’re working on at the moment, we create another unordered list and apply templates to the results of the XPath expression div. Since our current context is still an act, this finds all of the <div> children of the act that we are in. We know from studying the document that these elements must be scenes because they occur inside acts. When the new <xsl:apply-templates select="div"/> element fires, then, the @select attribute rounds up the <div> children of the current context, which are the scenes in the current act. It doesn’t say how to process them though. As above, the template rules are always watching for new content, and once again the call has gone out to process some <div> elements. The first template rule, the one for the document node, doesn’t match, so it doesn’t do anything. The second template matches <div> elements only when they are the children of a <body> element. Since these new <div> elements are the children of a <div> element, the second template rule won’t touch them either. The third template rule, though, matches <div> elements without looking at the their parents or anything else about them, so it takes charge in this case, and generates the list items for the scenes, inserting the values of the <head> children of the scenes, one by one, as they’re processed.

The result

Further discussion

XSLT operates in an step-by-step manner. This means that, for example, when we apply templates to our acts, XSLT processes each of them individually by firing the template that knows how to handle acts separately for each act. And this means that if we’re processing an act and we get a call to process its scene children, we do that where the <xsl:apply-templates> instruction tells us to. This ensures that the output renders the scenes within their parent act.

We begin by creating an <html> element as the root node of our tree, and we then create its child nodes (<head> and <body>), the <title> child of <head> (with its textual content), and the <ul> child of <body>. These are nodes on the tree. The <xsl:apply-templates> between the <ul> start and end tags says at this location in the HTML output, create some new nodes as chldren of the <ul> context by applying templates to acts. The <xsl:apply-templates> instruction uses XPath to identify the nodes in the input tree to be processed (the acts) and asserts that the results of the processing should create child nodes of the <ul>, but it says nothing about how to process the acts.

So how should the acts be processed? What the template that matches acts (that is, that matches body/div) does for each act is create an <li> child element, and once that output <li> element has been created, it follows the instruction between the start and end <li> tags and processes all the way down. In this case, that involves processing the <head> children of the act that we’re processing at the moment (there’s always exactly one) and creating a new <ul> element after the text of the <head>. The <xsl:apply-templates> instruction between the start and end <li> tag for an act says that new nodes need to be created as children of the act that we’re processing at the moment, and they should be created by processing the <div> children of the current context, that is, the scenes of the current act.

The point of this walk-through is that: