Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2016-10-27T21:10:59+0000


XSLT assignment #5: answers

The assignment

Enhance your output from the last assignment in the following ways:

Our solution

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="http://www.w3.org/1999/xhtml" version="2.0">
    <xsl:output indent="yes" method="xml" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Shakespearean sonnets</title>
            </head>
            <body>
                <h1>Shakespearean sonnets</h1>
                <h2>Contents</h2>
                <ul>
                    <xsl:apply-templates select="//sonnet" mode="toc">
                        <xsl:sort select='translate(., "&apos;", "")'/>
                    </xsl:apply-templates>
                </ul>
                <hr/>
                <xsl:apply-templates/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="sonnet" mode="toc">
        <li>
            <a href="#sonnet{@number}">
                <xsl:apply-templates select="line[1]" mode="toc"/>
                <xsl:text> (</xsl:text>
                <xsl:apply-templates select="@number"/>
                <xsl:text>)</xsl:text>
            </a>
        </li>
    </xsl:template>
    <xsl:template match="line" mode="toc">
        <xsl:apply-templates/>
    </xsl:template>
    <xsl:template match="sonnet">
        <h2 id="sonnet{@number}">
            <xsl:apply-templates select="@number"/>
        </h2>
        <p>
            <xsl:apply-templates/>
        </p>
    </xsl:template>
    <xsl:template match="line">
        <xsl:apply-templates/>
        <xsl:if test="following-sibling::line">
            <br/>
            <xsl:text>&#x0A;</xsl:text>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

The description of our solution to this assignment assumes that you have read our previous answer sheet for XSLT assignment #4, because this assignment builds largely upon the work done in that one. If you haven’t studied that solution, you can do so here: http://dh.obdurodon.org/xslt-assignment-04-answers.xhtml.

To observe the effects of the changes we’ve made to our stylesheet from the last assignment, we’re going to make comparisons to that solution. For the moment, we will ignore the changes in the template that matches the document node (/). The changes there affect the sorting of our table of contents. Instead, take a look at the template that matches <sonnet> where @mode = "toc". We’ve done two things there.

Creating clickable links

The first was to wrap the content of our <li> in an <a> tag. An HTML <a> defines a clickable link. Since this template is processing sonnets for the purpose of the table of contents, we want to transform our output items (the first lines in the table of contents) into a set of links pointing to the appropriate sonnets in our output, so that when a user clicks on the first line in the table of contents, the page will scroll to what they want to read. If this is going to work, we need a consistent strategy to create @href values for the links, so that every link will point unambiguously to the text of exactly one sonnet. In order for the links to work properly (that is, to have somewhere to land), each of them has to match perfectly the @id value of exactly one element in the document, which we will be generating later. We chose to make all of our @href attributes have the form of sonnet followed by that sonnet’s specific roman numeral, so that the first sonnet’s link would start as <a href="#sonnetI"> and the second sonnet’s link would start as <a href="#sonnetII> and so on. Remember (from the description of the original assignment) that all <a> elements linking to somewhere else on the same page must begin their @href values with a hashmark (#), but that character should not be part of the corresponding @id value of the target of the link. For example, a clickable link that points to Sonnet #2 will have a start tag that reads <a href="#sonnetII"> (with a hash mark [#]), and the <h2> header for that sonnet will have a start tag that reads <h2 id="sonnetII"> (without a hash mark).

To generate a different value of @href for every sonnet’s link (and, later, for the @id to which the @href points), we need to use XPath. We describe how to do this at the page on Attribute value templates (AVT). For reasons explained at that page, to delimit our path expression in the literal HTML we need to encase it in curly braces ({}). We can retrieve the unique roman numeral for the sonnet we are processing at the moment by looking for it in the @number attribute of that <sonnet>. We concatenate the raw string sonnet with the retrieved value of the @number attribute as follows (see our documentation on AVTs for an explanation of the role of the curly braces):

<a href="#sonnet{@number}">
    ...
</a>

We also need to sort the content inside our list item. By defaut, we would process the sonnets in document order, which is numerical order because they happen to be arranged that way in the input XML document. Since we want to sort the table of contents entries alphabetically, as in the description, not only do we want to sort the lines, but we also want to render the textual content of the line before the sonnet number (roman numeral). If we don’t do that, although our lines will be alphabetical order, users won’t be able to see that easily because each line will begin with a roman numeral that isn’t part of that sort order. What we really want is for the user to glance down the left margin of the list of first lines and find the one he or she wants easily. To make the alphabetic order easier to see, we move the <xsl:apply-templates> selecting our first line to the beginning of the ouput, and move the roman numeral to the end, wrapped in parentheses. As always, instead of outputting raw text, we wrap all raw text in <xsl:text> elements, which helps prevent quirky white-space handling.

The next change we have to make in order to get our table of contents to link to our sonnets is to create @id attributes for the targets of our links, so that when the user clicks on a link, there will be somewhere to scroll to. Remember that the @href values that we‘ve created for our clickable links must start with hashmarks (#), but the values of the identifiers (@id attributes) they point to must not start with hashmarks (#). We chose to put the identifiers on the <h2> elements that we generate; this way, when users click on a link to a sonnet, the browser will scroll down so that they can see the number of the sonnet preceding its content, confirming that they made the correct selection.

Alphabetizing the first lines

Finally, we need to sort the table of contents in alphabetical order of their first lines. We can do that by putting <xsl:sort/> between the start and end tags of <xsl:apply-templates>, but if that’s all we do, we’ll get one error. The exception is that Sonnet VI begins with ‘Tis, and the apostrophe at the beginning of that line will cause that line to be sorted alphabetically before all other sonnets. This happens because if all we do is tell the system to sort, it uses all characters in the first line to sort, and it doesn’t know that the leading apostrophe should be ignored.

We can fix this problem by using the XPath translate() function inside the @select attribute of <xsl:sort/> to modify the sonnets before sorting them. Because we use translate() only for sorting purposes, we aren’t changing what we write into the output, so the apostrophe will be rendered where it belongs, that is, where it occurred in the original input. What we are doing, though, is instructing the system to ignore it for sorting purposes, and we do that by putting the instruction to translate it away into nothing (an empty string, using the translate() function) inside the value of the @select attribute of the <xsl:sort> element.

The problem we run into while trying to use this function is the question of how to properly nest our quotation marks so that our document remains well-formed. We have only two sets of quotation marks available in XPath: single and double. If we need both, we can nest them, but here we need three: the value of the @select attribute must be inside quotation marks, the second and third arguments to translate() are strings that must be inside quotation marks, and the character we want to translate away is also a quotation mark. Our solution encloses the attribute value with single quotation marks, it encloses the arguments of translate() in double quotation marks, and it uses the &apos; character entity to represent the apostrophe we want to remove from our text for sorting. There are other methods of accomplishing this, as well.

So how does &apos; work? We’ve been using three built-in XML character entities so far: &amp; (ampersand, or &), &lt; (less than, or <), and &gt; (greater than, or >). XML builds in two more character entities: &apos; for the straight apostrophe (')) and &quot; for the straight double quotation mark ("). Here we exploit the fact that when our XSLT stylesheet is checked for well-formedness, the &apos; character entity has not yet been transformed into an apostrophe, so we don’t have an unmatched quotation mark. If you copy and paste our code, above, into <oXygen/> and try replacing the character entity with a raw single apostrophe character, the squiggly red line will appear to tell you that that’s forbidden. The character entity lets us get away with having one more level of quotation marks.

XPath magic

There’s an alternative way of protecting the straight apostrophe from raising an error:

<xsl:sort select="translate(.,'''','')"/>

As Michael Kay puts it, the delimiter of a string literal can be included in the string literal by doubling it. (http://stackoverflow.com/questions/13482352/xquery-looking-for-text-with-single-quote) In other words, you can include an apostrophe inside apostrophes by doubling it, that is, by using a sequence of four apostrophes, rather than the three that correspond to what you mean.