Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2021-04-02T15:13:54+0000


XSLT assignment #4: answers

The assignment

Produce an HTML version of the sonnets with a table of contents at the top. The table of contents should have one entry for each sonnet, which gives the number of the sonnet and the first line. Below the full table of contents (one line for each sonnet) you should render the complete text of all of the sonnets. You can see our output at http://dh.obdurodon.org/shakespeare-sonnets.xhtml.

Our solution

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.w3.org/1999/xhtml"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="#all"
    version="3.0">
    <xsl:output method="xhtml" html-version="5" omit-xml-declaration="no" 
        include-content-type="no" indent="yes"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Shakespeare Sonnets</title>
            </head>
            <body>
                <h1>Shakespearean Sonnets</h1>
                <h2>Contents</h2>
                <ul>
                    <xsl:apply-templates select="//sonnet" mode="toc"/>
                </ul>
                <hr/>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="sonnet" mode="toc">
        <li>
            <xsl:apply-templates select="@number"/>
            <xsl:text>. </xsl:text>
            <xsl:apply-templates select="line[1]" mode="toc"/>
        </li>
    </xsl:template>
    <xsl:template match="sonnet">
        <h2><xsl:apply-templates select="@number"/></h2>
        <p><xsl:apply-templates/></p>
    </xsl:template>
    <xsl:template match="line" mode="toc">
        <xsl:apply-templates/>
    </xsl:template>
    <xsl:template match="line">
        <xsl:apply-templates/>
        <xsl:if test="following-sibling::line">
            <br/>
            <xsl:text>&#x0a;</xsl:text>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

We begin with boilerplate configuration: we specify that output will be in the HTML namespace with xmlns="http://www.w3.org/1999/xhtml" (line 2) and we specify that we want HTML5 output using XML syntax on with lines 5–6.

The stylesheet we have written accomplishes the task of processing the same element (<sonnet>) in more than one way. The first template that matches <sonnet> has a @mode attribute, which we define arbitrarily as toc, indicating that this is the template we want to use when generating our table of contents. We could have given any other name to this mode, as long as the value of @mode in our template matched the value of @mode where we applied templates. This template is called from inside an HTML unordered, or bulleted, list (<ul>), and the template it actually applies creates a list item (<li>) for each sonnet. Each of those list items contains the sonnet’s number in roman numerals (retrieved from the XML by applying templates to the @number attribute of the <sonnet> we’re processing at the moment), followed by the text . (that is, a period and a space character), followed by the first <line> child of that sonnet, which we retrieve by applying templates to that <line>, but in the toc mode.

We output the period and space between the roman numeral and the line of text by using an <xsl:text> element. We could have output the bare text, and in most cases the results will be the same, but outputting a mixture of bare text and elements inside a template can sometimes lead to funny white-space handling. To avoid problems, whenever we have a template that needs to output raw text and that also includes elements (such as <xsl:apply-templates> here), we always wrap the raw text in <xsl:text>. (You don’t have to (= shouldn’t) use <xsl:text> when the entire content you’re outputting in an element is raw text. For example, when we create an <h1> on line 12, we just type the text directly. It’s only when an element will contain raw text plus other types of content that we have to wrap the raw text in <xsl:text> tags.)

When we apply templates to the first <line> in the process of creating the table of contents entries (line 27), we do so in a special toc mode, and we create a template for <line> elements where @mode="toc" as well (lines 34–36). This lets us process the first <line> in the table of contents as we want, without affecting the way we handle <line> elements elsewhere in our transformation, when we render the full poem. The important difference is that when we render the full poem, we will need to insert an HTML line break (<br/>) element after most lines, so that they don’t all run together. In the table of contents, though, we don’t want a line break because there is nothing to break; each line in the table of contents is already being output inside its own <li> element, and <li> elements already start on new lines. Our template that processes the first <line> in toc mode outputs the contents of the <line> (by applying templates to that content, and applying templates to plain text means printing the text) and doesn’t output any <br/> after it. A different template, which processes <line> elements when we render the full poem, does output the <br/> where needed. See the discussion below.

The second time we apply templates to <sonnet> elements occurs after our list (line 19), and this time we want to output the full text of each sonnet.

We’re processing the sonnets a little differently here than we did when we processed them for the table of contents. This time we’re applying templates without a @select attribute, which means that we’re applying templates to all children of the current context. The current context is the document node and it has only one child, which is a <sonnets> root element. There is no template that matches a <sonnets> element, so the built-in behavior kicks in, which is process my children. The children of the <sonnets> element are all individual <sonnet> elements, so templates get applied to them.

Whether you select the <sonnet> elements specifically (as we did for the table of contents) or let the built-in behavior take you there (as we do for the full text output) is up to you, and the results are the same. We selected them specifically for modal processing of the table of contents because it was convenient to specify the mode as we did that. We let the built-in behavior handle it for rendering the full sonnets because we didn’t need to specify a mode. That isn’t a rule, but we find it more legible than the alternatives.

The <xsl:apply-templates> instruction that is intended to lead to the output of the full text of each sonnet (line 19) has no @mode attribute, which means that it will be seen only by templates that do not specify a mode. The values for @mode on <xsl:apply-templates> and <xsl:template> must be identical for a template to be triggered, and an <xsl:apply-templates> that doesn’t have a @mode value specified will cause the nodes it selects to be processed only by template rules that also don’t specify a @mode (or, of course, if there is no applicable template at all, by the built-in default). In this case we define a template without a @mode value that matches <sonnet> elements, so this is what gets triggered here. According to this template rule, each sonnet produces a header (<h2>) that contains its number, followed by a paragraph that will contain the text of the sonnet. If we don’t do anything else, the built-in rules will kick in and cause the entire text of the sonnet in question, all fourteen lines, to be output in one continuous paragraph. To get each line to be rendered separately, we need to create a template rule (also with no @mode value!) that will process the individual lines correctly.

This modeless template that matches <line> elements first applies templates to the children of each <line> element, and since the only child of each <line> element is a single text node, the built-in default causes the text to be output as is. After each line though, we want an HTML line break (<br/>) element, so that the lines won’t all run together—except after the last, since there’s nothing to break there.

To get slightly more legible output than we got in the version we presented online, we follow the <br/> with an <xsl:text> element that contains the string &#x0a;, which is a numerical character reference that tells the system to output a new line. All characters can be represented by numerical character references, but because most characters are (obviously!) easier to type and read and understand if just type them literally, numerical character references are most commonly used for characters that are difficult to type or print or see, such as the new-line character here. If you have a need for numerical character references in your project, ask your project mentor or one of the instructors for more information. We omitted the new line when we presented this online, and the output was nonetheless valid HTML that was rendered properly in the browser because in the browser it’s just the <br/> element that makes the line break happen. The new-line character we’re creating here is entirely cosmetic and optional, what it does it cause the output to wrap when we look at it inside <oXygen/>, which makes it easier for a human to read.

We use the conditional <xsl:if> element to enable us to treat the last line of each sonnet (which shouldn’t be followed by a <br/>) differently from the others. Code inside an <xsl:if> element is executed only if the expression specified by the @test can be evaluated as True. The test that we’ve written, following-sibling::line, tests whether the line we’re processing at the moment has another <line> after it on the following-sibling:: axis. If it does, it isn’t the last line of the sonnet, so we output the <br/>. If it doesn’t, we don’t want the <br/>, and we don’t get it because the test fails. The purpose of this test is to generate line breaks between the lines in our HTML output, without creating an extra line break following the last line of a sonnet. The HTML <p> element containing these lines is block styled, meaning the browser already renders it as separate from other elements by displaying a line break at the end. We would be adding a semantically unnecessary line break if we generated a <br/> at the end of each paragraph, because it wouldn’t function to separate any lines.

Sorting

You weren’t required to sort the first lines for this assignment, but since we did it in class, here’s a discussion of that process.

An index of first lines in a collection of poems is usually alphabetized, because that’s how humans look things up in that kind of list. To learn how to sort your table of contents before you output it, start by looking up <xsl:sort> at https://www.w3schools.com/xml/xsl_sort.asp or in Michael Kay. So far, if we’ve wanted to output, say, our table of contents in the order in which they occur in the document, we’ve used a self-closing empty element to select them with something like <xsl:apply-templates select="//sonnet"/>. We’ve also said, though, that the self-closing empty element tag is informationally identical to writing the start- and end-tags separately with nothing between them, that is, <xsl:apply-templates select="//sonnet></xsl:apply-templates>. To cause the elements being processed to be sorted first, you need to use this alternative notation, with separate start- and end-tags, because you need to put the <xsl:sort> element between the start- and end-tags. If you use the first notation, the one with a single self-closing tag, there’s no between in which to put the <xsl:sort> element. In other words, you want something like:

<xsl:apply-templates select="//sonnet">
  <xsl:sort/>
</xsl:apply-templates/>

As written, the preceding will sort the <sonnet> elements alphabetically by their text value. As you’ll see at the sites mentioned above, though, it’s also possible to use the @select attribute on <xsl:sort> to sort a set of items by properties other than alphabetic order of their textual content.

Using translate() to fix the sort order

If you sort the first lines alphabetically according to their textual value, there will be one error. The first line of Sonnet #121, 'Tis better to be vile than vile esteem'd,, will show up first because in the internal representation of characters in the computer, the single straight apostrophe is alphabetically earlier than all of the letters. We can fix this by using translate() to strip the apostrophe for sorting purposes, but not for rendering. That is, we can sort as if there were no apostrophe, while still printing the apostrophe when we render the line.

We can’t easily translate away an apostropohe, though, because quotation marks have special meaning in XPath. For the purpose of this assignment, you can ignore this one missorted line. If you’re feeling ambitious, though, read Michael Kay’s answer at http://p2p.wrox.com/xslt/50152-how-do-you-translate-apostrophe.html and see whether you can apply it to fixing this problem.

The following-sibling::line test isn’t the only way to identify the last line. You could, alternatively, have written your @test as position() ne last(), which will output the <br> element whenever the position of the line being processed at the moment is not last.