Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2021-04-05T19:24:51+0000


XSLT assignment #5: answers

The assignment

Enhance your output from the last assignment in the following ways:

Our solution

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.w3.org/1999/xhtml"
  xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="#all"
  version="3.0">
  <xsl:output method="xhtml" html-version="5" omit-xml-declaration="no" include-content-type="no"
    indent="yes"/>
  <xsl:template match="/">
    <html>
      <head>
        <title>Shakespearean sonnets</title>
      </head>
      <body>
        <h1>Shakespearean sonnets</h1>
        <h2>Contents</h2>
        <ul>
          <xsl:apply-templates select="//sonnet" mode="toc">
            <xsl:sort select='translate(., "&apos;", "")'/>
          </xsl:apply-templates>
        </ul>
        <hr/>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
  <!-- ============================================== -->
  <!-- Table of contents mode                         -->
  <!-- ============================================== -->
  <xsl:template match="sonnet" mode="toc">
    <li>
      <a href="#sonnet{@number}">
        <xsl:apply-templates select="line[1]" mode="toc"/>
        <xsl:text> (</xsl:text>
        <xsl:apply-templates select="@number"/>
        <xsl:text>)</xsl:text>
      </a>
    </li>
  </xsl:template>
  <xsl:template match="line" mode="toc">
    <xsl:apply-templates/>
  </xsl:template>
  <!-- ============================================== -->
  <!-- Reading view                                   -->
  <!-- ============================================== -->
  <xsl:template match="sonnet">
    <section id="sonnet{@number}">
      <h2>
        <xsl:apply-templates select="@number"/>
      </h2>
      <p>
        <xsl:apply-templates select="line"/>
      </p>
    </section>
  </xsl:template>
  <xsl:template match="line">
    <xsl:apply-templates/>
    <xsl:if test="following-sibling::line">
      <br/>
      <xsl:text>&#x0A;</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

The description of our solution to this assignment assumes that you have read our previous answer sheet for XSLT assignment #4, because this assignment builds largely upon the work done in that one. If you haven’t studied that solution, you can do so here: http://dh.obdurodon.org/xslt-assignment-04-answers.xhtml.

To observe the effects of the changes we’ve made to our stylesheet from the last assignment, we’re going to make comparisons to that solution. For the moment, we will ignore the changes in the template that matches the document node (/). The changes there affect the sorting of our table of contents. Instead, take a look at the template that matches <sonnet> where @mode = "toc". We’ve done two things there.

Creating clickable links

Within our list of first lines we need to wrap the content of each of our <li> elements in an <a> tag with an @href attribute, which is how HTML defines a clickable link. We need a consistent strategy to create @href values for the links, so that every link will point unambiguously to the text of exactly one sonnet. In order for the links to work properly (that is, to have somewhere to land), each of them has to match perfectly the @id value of exactly one element in the document, which we will be generating later. We chose to make all of our @href attributes have the form of the string sonnet followed by that sonnet’s specific roman numeral, so that the first sonnet’s link, for example, would look like <a href="#sonnetI"> and the second sonnet’s link look like <a href="#sonnetII>, etc. Remember (from the description of the original assignment) that all <a> elements linking to somewhere else on the same page must begin their @href values with a hashmark (#), but that character should not be part of the corresponding @id value of the target of the link. For example, a clickable link that points to Sonnet #2 will have a start tag that reads <a href="#sonnetII"> (with a hash mark [#]), and the <section> element wrapping that sonnet will have a start tag that reads <section id="sonnetII"> (without a hash mark).

To generate a different value of @href for every sonnet’s link (and, later, for the @id to which the @href points), we need to use XPath. We describe how to do this in our page about Attribute value templates (AVT). For reasons explained at that page, to delimit our XPath expression within the otherwise literal attribute value we need to encase it in curly braces ({}). We can retrieve the unique roman numeral for the sonnet we are processing at the moment by looking for it in the @number attribute of that <sonnet>. We concatenate the raw string sonnet with the retrieved value of the @number attribute as follows (see our documentation on AVTs for an explanation of the role of the curly braces):

<a href="#sonnet{@number}">
    …
</a>

We also need to sort the first lines. Unless we specify otherwise, XSLT will process the sonnets in document order, which is numerical order because they happen to be arranged that way in the input XML document. In addition to sorting the first lines, we also want to render the textual content of the line before the sonnet number (roman numeral). If we don’t do that, although our lines will be alphabetical order, users won’t be able to see that easily because each line will begin with a roman numeral that isn’t part of that sort order. What we really want is for users to glance down the left margin of the list of first lines and find the ones they want easily. To make the alphabetic order easier to see, we move the <xsl:apply-templates> selecting our first line to the beginning of the ouput, and move the roman numeral to the end, wrapped in parentheses. As always, instead of outputting raw text when we’re creating mixed content, we wrap all raw text in <xsl:text> elements, which helps prevent quirky white-space handling.

The next change we have to make in order to get our table of contents to link to our sonnets is to create @id attributes for the targets of our links, so that when the user clicks on a link, there will be somewhere to scroll to. Remember that the @href values that we‘ve created for our clickable links must start with hashmarks (#), but the values of the identifiers (@id attributes) they point to must not start with hashmarks (#). We chose to put the identifiers on the <section> elements that we generate; this way, when users click on a link to a sonnet.

Alphabetizing the first lines

Finally, we need to sort the table of contents in alphabetical order of their first lines. We can do that by putting <xsl:sort> between the start and end tags of <xsl:apply-templates>, but if that’s all we do, we’ll get one error. The exception is that Sonnet CXXI begins with ‘Tis, and the apostrophe at the beginning of that line will cause that line to be sorted alphabetically before all other sonnets. This happens because if all we do is tell the system to sort, it uses all characters in the first line to sort, and it doesn’t know that the leading apostrophe should be ignored.

We can fix this problem by using the XPath translate() function inside the @select attribute of <xsl:sort> to modify the sonnets before sorting them. Because we use translate() only for sorting purposes, we aren’t changing what we write into the output, so although the apostrophe will be ignored during sorting, it will be rendered where it belongs, that is, where it occurred in the original input. We do that by putting the instruction to translate it away into nothing (an empty string, using the translate() function) inside the value of the @select attribute of the <xsl:sort> element.

The problem we run into while trying to use this function involves nesting our quotation marks so that our XSLT document remains well-formed. We have only two sets of quotation marks available in XPath: single and double. If we need both, we can nest them, but here we need three: the value of the @select attribute must be inside quotation marks, the second and third arguments to translate() are strings that must be inside quotation marks, and the character we want to translate away is also a quotation mark. Our solution here encloses the attribute value with single quotation marks, it encloses the arguments of translate() in double quotation marks, and it uses the &apos; character entity to represent the apostrophe we want to remove from our text for sorting. There are other methods of accomplishing this, as well (see below).

So how does &apos; work? We’ve been using three built-in XML character entities so far: &amp; (ampersand, or &), &lt; (less than, or <), and &gt; (greater than, or >). XML builds in two more character entities: &apos; for the straight apostrophe (')) and &quot; for the straight double quotation mark ("). Here we exploit the fact that when our XSLT stylesheet is checked for well-formedness, the &apos; character entity has not yet been transformed into an apostrophe, so we don’t have an unmatched quotation mark. If you copy and paste our code, above, into <oXygen/> and try replacing the character entity with a raw single apostrophe character, the squiggly red line will appear to tell you that that’s forbidden. The character entity lets us get away with having one more level of quotation marks.

XPath magic

We can think of at least two alternative ways of protecting the straight apostrophe from raising an error. One is that we can declare a variable, the value of which is just the single apostophe, and then refer to the variable inside the translate() function:

<xsl:variable name="apostrophe" as="xs:string">'</xsl:variable>

We take advantage of the fact that a variable value can be specified in two different ways in XSLT, either as the value of a @select attribute on the <xsl:variable> element or as the content of that element, between the start and end tags. Using the @select attribute wouldn’t address our problem of running out of distinct types of quotation marks, but using a literal string between the start- and end-tags is safe. We can then refer to our variable inside the translate() function:

<xsl:sort select="translate(., $apostrophe, '')">

Alternatively, we can escape the apostrophe (that is, tell it not to have its regular meaning of string-delimiter) inside quotation marks by doubling it:

<xsl:sort select="translate(.,'''','')"/>

As Michael Kay puts it, the delimiter of a string literal can be included in the string literal by doubling it. (http://stackoverflow.com/questions/13482352/xquery-looking-for-text-with-single-quote) In other words, you can include an apostrophe inside apostrophes by doubling it, that is, by using a sequence of four apostrophes, rather than the three that correspond to what you mean.