Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2015-10-14T14:34:15+0000


XSLT assignment #1 answers

The assignment

Your assignment is to create an XSLT stylesheet that will transform Bad Hamlet into a hierarchical outline of the titles of acts and scenes in HTML. This isn’t very interesting on its own, of course, but if you were transforming the entire document into HTML for publication on the web, this might serve as the skeleton. It might also stand on its own as a table of contents at the top of such a publication, so that the reader could click on the title of a scene to jump to that location in the file.

Our solution

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xpath-default-namespace="http://www.tei-c.org/ns/1.0" xmlns="http://www.w3.org/1999/xhtml"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
    <xsl:output method="xml" indent="yes"
        doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Hamlet</title>
            </head>
            <body>
                <ul>
                    <xsl:apply-templates select="//body/div"/>
                </ul>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="body/div">
        <li>
            <xsl:apply-templates select="head"/>
            <ul>
                <xsl:apply-templates select="div"/>
            </ul>
        </li>
    </xsl:template>
    <xsl:template match="div">
        <li>
            <xsl:apply-templates select="head"/>
        </li>
    </xsl:template>
</xsl:stylesheet>
        

Template rules

Our solution made use of three template rules: for the document (/), acts (body/div), and scenes (div). (Instead of a template for the document node, we could have used a template for the root element (<TEI>), since either of those would have provided an opportunity for us to create the HTML superstructure in the output.)

The document template creates the basic structure of the HTML document we want to produce. It has the <html> root element, and the necessary <head> and <body> elements inside it. Inside the body, we create an unordered list, denoted by the <ul> element, that will eventually hold five items, one for each act, in a bulleted list (without numbering). Unordered lists in HTML are required to have content in the form of <li> (list item) elements, and we’ll create those later, when we actually process the acts. For now, we just create the wrapper <ul> element that will hold them. Between the start and end <ul> tags (since that’s where we eventually want our <li> elements to be created), we apply templates to (that is, process) all of the acts, which we point to with the XPath expression //body/div as the value of the @select attribute of the <xsl:apply-templates> elements.

The XPath //body/div, above, is essentially an instruction to collect all of the acts, and the <xsl:apply-templates> elements says to get them processed, but it doesn’t specify how to process them. So how do the acts get processed? What happens is that all of the templates in the stylesheet are constantly watching for something to do, and when that <xsl:apply-templates> element fires, they all see that some <div> elements need to be processed. The first template (the one for the document node) doesn’t match <div> elements, so it doesn’t do anything. The other two templates both match <div> elements; the second template matches <div> elements that have a <body> parent and the third matches all <div> elements. So how is the conflict or competition resolved? Which one gets to handle the acts? Whenever there’s an ambiguity like that, the more specific match rule wins, so the second template rule, the one that matches <body/div>, takes charge of each of the five acts. (Alternatively, we could have had the last template rule match not just div, but specifically div/div, which would have removed the potential ambiguity. Whether you deal with those situations by writing only unambiguous and specific rules or by letting the more specific match rule wins principle make the decision, you’ll get the same result here.)

Each act returned by the path expression, then, triggers the second template, which creates an <li> element, and it creates that <li> element between the start and end tags for the <ul> that we created earlier. That’s the effect of putting the <xsl:apply-templates> element between those start and end tags; whatever happens as a result of that instruction will be written in the location where the instruction stood. Within that new <li> element, we tell the system to process (apply templates to) the <head> child of the current context (that is, the <head> child of the act we’re processing at the moment) and then to create a new <ul>, an embedded list. Within that new list it applies templates to another set of nodes, selected by the path expression div, which are the <div> children of the current act, that is, its scenes. Note that the <xsl:apply-templates> here executes XPath from the current context, which is the act we are processing at the moment, so it gets only the scenes of that act. Remember that the default axis is the child axis, so the expression head as the value of the @select attribute simply says to find all of the child <head> elements inside the current act.

When XSL finds those <head> elements (just one for each act), it tries to apply templates, but we haven’t defined a template for <head> elements. How does the stylesheet know what to do? XSLT includes built-in, or default templates, which say that if we’re processing an element that has only textual content and there is no template rule telling us what to do with it, we should just output its text. Since that’s what we want in this case, we let the built-in rule do it for us. The effect is to print something like Act 1 before we list all of the scenes in the first act. Note that we don’t type word Act or the number 1; that gets retrieved from the <head> element in the input XML document.

After printing the header, while still inside the list item for the act we’re working on at the moment, we create another unordered list (HTML permits nested lists) and apply templates to the results of the XPath expression div. Since our current context is still an act, this finds all of the <div> children of the act that we are in. We know from studying the document that these elements must be scenes, because they occur inside acts. When the new <xsl:apply-templates select="div"/> element fires, then, the @select attribute rounds up the <div> children of the current context, which are the scenes in the current act. It doesn’t say how to process them though. As above, the template rules are always watching for new content, and once again the call has gone out to process some <div> elements. The first template rule, the one for the document node, doesn’t match, so it doesn’t do anything. The second template matches <div> elements only when they are the children of a <body> element. Since these new <div> elements are the children of a <div> element, the second template rule won’t touch them either. The third template rule, though, matches <div> elements without looking at the their parents or anything else about them, so it takes charge in this case, and generates the list items for the scenes, inserting the values of the <head> children of the scenes, one by one, as they’re processed.

The result

Further discussion

XSLT operates in an step-by-step and depth-first manner. The step-by-step part means that, for example, when we apply templates to our acts, XSLT walks through all of them. The depth-first part means that if we’re processing an act and we get a call to process its scenes, we go do that where the <xsl:apply-templates> instruction tells us to, and we then resume where we left off to finish processing the act. Here’s how this stylesheet works:

We begin by creating an <html> element as the root node of our tree, and we then create its child nodes (<head> and <body>), the <title> child of <head> (with its textual content), and the <ul> child of <body>. These are nodes on the tree. The <xsl:apply-templates> between the <ul> start and end tags says now create some new nodes as chldren of the current <ul> context by applying templates to acts. The <xsl:apply-templates> instruction uses XPath to identify the nodes in the input tree to be processed (the acts) and asserts that the results of the processing should create child nodes of the <ul>, but it says nothing about how to process the acts.

So how should the acts be processed? The <xsl:apply-templates> that we just executed found five acts, so the template that knows what to do with acts fires five separate times, once for each act. It doesn’t process all five acts at once (remember depth-first?); it processes each one to completion before it starts on the next. What it does for each act is create an <li> child element, and once that output <li> element has been created, it follows the instruction between the start and end <li> tags and processes all the way down (in this case, it processes its scenes) before it goes on to process the next act. That’s the depth-first part of the processing model. For each act, in turn, it uses <xsl:apply-templates> to specify what to do to create children of the new <li> elements. In this case, that involves processing the <head> children of the act that we’re processing at the moment and creating a new <ul> element after the text of the <head>. The <xsl:apply-templates> instruction between the start and end <li> tag for an act says that new nodes need to be created as children of the act that we’re processing at the moment, and they should be created by processing the <div> children of the current context, that is, the scenes of the current act. The scenes of an act are processed fully before the system looks at the next act; the template that matches scenes will fire once for each scene in the act being processed at the moment. Processing all of the scenes in the first act finishes what we have to do for that act, so we now move on to the next act.

The point of this walk-through is that: