Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2017-03-03T03:16:38+0000


XSLT assignment #4: answers

The assignment

Produce an HTML version of the sonnets with a table of contents at the top. The table of contents should have one entry for each sonnet, which gives the number of the sonnet and the first line. Below the full table of contents (one line for each sonnet) you should render the complete text of all of the sonnets. You can see our output at http://dh.obdurodon.org/shakespeare-sonnets.xhtml.

Our solution

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns="http://www.w3.org/1999/xhtml" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    version="2.0">
    <xsl:output indent="yes" method="xml" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Shakespeare Sonnets</title>
            </head>
            <body>
                <h1>Shakespearean Sonnets</h1>
                <h2>Contents</h2>
                <ul>
                    <xsl:apply-templates select="//sonnet" mode="toc"/>
                </ul>
                <hr/>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="sonnet" mode="toc">
        <li>
            <xsl:apply-templates select="@number"/>
            <xsl:text>. </xsl:text>
            <xsl:apply-templates select="line[1]" mode="toc"/>
        </li>
    </xsl:template>
    <xsl:template match="sonnet">
        <h2><xsl:apply-templates select="@number"/></h2>
        <p><xsl:apply-templates/></p>
    </xsl:template>
    <xsl:template match="line" mode="toc">
        <xsl:apply-templates/>
    </xsl:template>
    <xsl:template match="line">
        <xsl:apply-templates/>
        <xsl:if test="following-sibling::line">
            <br/>
            <xsl:text>&#x0a;</xsl:text>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

We begin with boilerplate configuration: we specify that output will be in the HTML namespace with xmlns="http://www.w3.org/1999/xhtml" (line 2) and we specify that we want HTML5 output using XML syntax on with <xsl:output indent="yes" method="xml" doctype-system="about:legacy-compat"/> (line 5). Note that, counterintuitively, for HTML5 the value of the @method attribute should be xml (not html or xhtml).

The stylesheet we have written accomplishes the task of processing the same element (<sonnet>) in more than one way. The first template that matches <sonnet> has a @mode attribute, which we define arbitrarily as toc, indicating that this is the template we want to use when generating our table of contents. We could have given any other name to this mode, as long as the value of @mode in our template matched the value of @mode where we applied templates. This template is called from inside an HTML unordered, or bulleted, list (<ul>), and the template it actually applies creates a list item (<li>) for each sonnet. Each of those list items contains the sonnet’s number in roman numerals (retrieved from the XML by applying templates to the @number attribute of the <sonnet> we’re processing at the moment), followed by the text . (that is, a period and a space character), followed by the first <line> child of that sonnet, which we retrieve by applying templates to that <line>, but in the toc mode.

We output the period and space between the roman numeral and the line of text by using an <xsl:text> element. We could have output the bare text, and in most cases the results will be the same, but outputting a mixture of bare text and elements inside a template can sometimes lead to funny white-space handling. To avoid problems, whenever we have a template that needs to output raw text and that also includes elements (such as <xsl:apply-templates> here), we always wrap the raw text in <xsl:text>. (You don’t have to do that when then entire content you’re outputting in an element is raw text. For example, when we create an <h1> on line 12, we just type the text directly. It’s only when an element will contain raw text plus other types of content that we have to wrap the raw text in <xsl:text> tags.)

Note that when we apply templates to the first <line> in the process of creating the table of contents entries (line 26), we do so in a special toc mode, and we create a template for <line> elements where @mode="toc" as well (lines 33–35). This lets us process the first <line> in the table of contents as we want, without affecting the way we handle <line> elements elsewhere in our transformation, when we render the full poem. The important difference is that when we render the full poem, we will need to insert an HTML line break (<br/>) element after most lines, so that they don’t all run together. In the table of contents, though, we don’t want a line break because there is nothing to break; each line in the table of contents is already being output inside its own <li> element, and <li> elements already start on new lines. Our template that processes the first <line> in toc mode outputs the contents of the <line> (by applying templates to that content, and applying templates to plain text means printing the text) and doesn’t output any <br/> after it. A different template, which processes <line> elements when we render the full poem, does output the <br/> where needed. See the discussion below.

The second time we apply templates to <sonnet> elements occurs after our list (line 18), and this time we want to output the full text of each sonnet.

Note that we’re processing the sonnets a little differently here than we did when we processed them for the table of contents. This time we’re applying templates without a @select attribute, which means that we’re applying templates to all children of the current context. The current context is the document node and it has only one child, which is a <sonnets> root element. There is no template that matches a <sonnets> element, so the built-in behavior kicks in, which is process my children. The children of the <sonnets> element are all individual <sonnet> elements, so templates get applied to them.

Whether you select the <sonnet> elements specifically (as we did for the table of contents) or let the built-in behavior take you there (as we do for the full text output) is up to you, and the results are the same. We selected them specifically for modal processing of the table of contents because it was convenient to specify the mode as we did that. We let the built-in behavior handle it for rendering the full sonnets because we didn’t need to specify a mode. That isn’t a rule, but we find it more legible than the alternatives.

The <xsl:apply-templates> instruction that is intended to lead to the output of the full text of each sonnet (line 18) has no @mode attribute, which means that it will be seen only by templates that do not specify a mode. The values for @mode on <xsl:apply-templates> and <xsl:template> must be identical for a template to be triggered, and an <xsl:apply-templates> that doesn’t have a @mode value specified will cause the nodes it selects to be processed only by template rules that also don’t specify a @mode (or, of course, if there is no applicable template at all, by the built-in default). In this case we define a template without a @mode value that matches <sonnet> elements, so this is what gets triggered here. According to this template rule, each sonnet produces a header (<h2>) that contains its number, followed by a paragraph that will contain the text of the sonnet. If we don’t do anything else, the built-in rules will kick in and cause the entire text of the sonnet in question, all fourteen lines, to be output in one continuous paragraph. To get each line to be rendered separately, we need to create a template rule (also with no @mode value!) that will process the individual lines correctly.

This modeless template that matches <line> elements first applies templates to the children of each <line> element, and since the only child of each <line> element is a single text node, the built-in default causes the text to be output as is. After each line though, we want an HTML line break (<br/>) element, so that the lines won’t all run together—except after the last, since there’s nothing to break there.

We follow the <br/> with an <xsl:text> element that contains &#x0a;, which is a numerical character reference that tells the system to output a new line. All characters can be represented by numerical character references, but because most characters are (obviously!) easier to type and read and understand if just type them literally, numerical character references are most commonly used for characters that are difficult to type or print or see, such as the new-line character here. If you have a need for numerical character references in your project, ask your project mentor or one of the instructors for more information. If you omit the new line here, the output will nonetheless be valid HTML and it will be rendered properly in the browser because in the browser it’s just the <br/> element that makes the line break happen. The new-line character we’re creating here is entirely cosmetic and optional, but we find that it makes the raw HTML easier for a human to read.

We use the conditional <xsl:if> element to enable us to treat the last line of each sonnet (which shouldn’t be followed by a <br/>) differently from the others. Code inside an <xsl:if> element is executed only if the expression specified by the @test can be evaluated as true. The test that we’ve written, following-sibling::line, tests whether the line we’re processing at the moment has another <line> after it on the following-sibling axis. If it does, it isn’t the last line, so we output the <br/>. If it doesn’t, we don’t want the <br/>, and we don’t get it because the test fails. The purpose of this test is to generate line breaks between the lines in our HTML output, without creating an extra line break following the last line of a sonnet. The HTML <p> element containing these lines is block styled, meaning the browser already renders it as separate from other elements by displaying a line break at the end. We would be adding a semantically unnecessary line break if we generated a <br/> at the end of each paragraph, because it wouldn’t function to separate any lines.

The following-sibling test isn’t the only way to identify the last line. You could, alternatively, have written your @test as position() ne last(), which will output the <br> element whenever the position of the line being processed at the moment is not last.