Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2023-03-28T01:11:02+0000


XSLT assignment #5: answers

The assignment

Enhance your output from the last assignment in the following ways:

Our solution

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.w3.org/1999/xhtml"
  xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="#all"
  version="3.0">
  <xsl:output method="xhtml" html-version="5" omit-xml-declaration="no" include-content-type="no"
    indent="yes"/>
  <xsl:template match="/">
    <html>
      <head>
        <title>Shakespearean sonnets</title>
      </head>
      <body>
        <h1>Shakespearean sonnets</h1>
        <h2>Contents</h2>
        <ul>
          <xsl:apply-templates select="//sonnet" mode="toc">
            <xsl:sort select='translate(., "&apos;", "")'/>
          </xsl:apply-templates>
        </ul>
        <hr/>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
  <!-- ============================================== -->
  <!-- Table of contents mode                         -->
  <!-- ============================================== -->
  <xsl:template match="sonnet" mode="toc">
    <li>
      <a href="#sonnet{@number}">
        <xsl:apply-templates select="line[1]" mode="toc"/>
        <xsl:text> (</xsl:text>
        <xsl:apply-templates select="@number"/>
        <xsl:text>)</xsl:text>
      </a>
    </li>
  </xsl:template>
  <xsl:template match="line" mode="toc">
    <xsl:apply-templates/>
  </xsl:template>
  <!-- ============================================== -->
  <!-- Reading view                                   -->
  <!-- ============================================== -->
  <xsl:template match="sonnet">
    <section id="sonnet{@number}">
      <h2>
        <xsl:apply-templates select="@number"/>
      </h2>
      <p>
        <xsl:apply-templates select="line"/>
      </p>
    </section>
  </xsl:template>
  <xsl:template match="line">
    <xsl:apply-templates/>
    <xsl:if test="following-sibling::line">
      <br/>
      <xsl:text>&#x0A;</xsl:text>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

The description of our solution to this assignment assumes that you have read our previous answer sheet for XSLT assignment #4, because this assignment builds largely upon the work done in that one. If you haven’t studied that solution, you can do so at http://dh.obdurodon.org/xslt-assignment-04-answers.xhtml.

To observe the effects of the changes we’ve made to our stylesheet from the last assignment, we’re going to make comparisons to that solution. For the moment, we will ignore the changes in the template that matches the document node (/). The changes there affect the sorting of our table of contents. Instead, take a look at the template that matches <sonnet> where @mode = "toc". We’ve done two things there.

Creating clickable links

Within our list of first lines we need to wrap the content of each of our <li> elements in an <a> tag with an @href attribute, which is how HTML defines a clickable link. We need a consistent strategy to create @href values for the links, so that every link will point unambiguously to the text of exactly one sonnet. In order for the links to work properly (that is, to have somewhere to land), each of them has to match perfectly the @id value of exactly one element in the document, which we will be generating later. We chose to make all of our @href attributes have the form of the string sonnet followed by that sonnet’s specific roman numeral, so that the first sonnet’s link, for example, would look like <a href="#sonnetI"> and the second sonnet’s link look like <a href="#sonnetII>, etc. Remember (from the description of the original assignment) that all <a> elements linking to somewhere else on the same page must begin their @href values with a hashmark (#), but that character should not be part of the corresponding @id value of the target of the link. For example, a clickable link that points to Sonnet #2 will have a start tag that reads <a href="#sonnetII"> (with a hash mark [#]), and the <section> element wrapping that sonnet will have a start tag that reads <section id="sonnetII"> (without a hash mark).

To generate a different value of @href for every sonnet’s link (and, later, for the @id to which the @href points) we need to use XPath. We describe how to do this in our page about Attribute value templates (AVT). For reasons explained at that page, to delimit our XPath expression within the otherwise literal attribute value we need to encase it in curly braces ({}). We can retrieve the unique roman numeral for the sonnet we are processing at the moment by looking for it in the @number attribute of that <sonnet>. We concatenate the raw string sonnet with the retrieved value of the @number attribute as follows (see our documentation on AVTs for an explanation of the role of the curly braces):

<a href="#sonnet{@number}">
    …
</a>

We also need to sort the first lines. Unless we specify otherwise, XSLT will process the sonnets in document order, which is numerical order because they happen to be arranged that way in the input XML document. In addition to sorting the first lines, we also want to render the textual content of the line before the sonnet number (roman numeral). If we don’t do that, although our lines will be alphabetical order, users won’t be able to see that easily because each line will begin with a roman numeral that isn’t part of that sort order. What we really want is for users to glance down the left margin of the list of first lines and find the ones they want easily. To make the alphabetic order easier to see, we move the <xsl:apply-templates> selecting our first line to the beginning of the ouput, and move the roman numeral to the end, wrapped in parentheses. As always, instead of outputting raw text when we’re creating mixed content, we wrap all raw text in <xsl:text> elements, which helps prevent quirky white-space handling.

The next change we have to make in order to get our table of contents to link to our sonnets is to create @id attributes for the targets of our links, so that when the user clicks on a link, there will be somewhere to scroll to. Remember that the @href values that we‘ve created for our clickable links must start with hashmarks (#), but the values of the identifiers (@id attributes) they point to must not start with hashmarks (#). We chose to put the identifiers on the <section> elements that we generate; this way, when users click on a link to a sonnet.

We chose to create our @id values, and the links that point to them, in a deliberate way that ensured that they would be legible to humans, since that makes it easier for us to recognize mistakes during development. Constructing our own values becomes more challenging and less appealing as our need for links (in other projects) becomes more intricate, and XPath provides, as an alternative to building your own unique identifier, the generate-id() function, which can be used to construct links automatically. You can read about how to use generate-id() in Michael Kay, but here is a summary:

Alphabetizing the first lines

Finally, we need to sort the table of contents in alphabetical order of their first lines. We can do that by putting <xsl:sort> between the start and end tags of <xsl:apply-templates>, but if that’s all we do, one sonnet will be out of place. The exception is that Sonnet CXXI begins with ‘Tis, and the apostrophe at the beginning of that line will cause that line to be sorted alphabetically before all other sonnets. This happens because if all we do is tell the system to sort, it uses all characters in the sonnet (starting with the first line) to sort, and it doesn’t know that the leading apostrophe should be ignored.

We can fix this problem by using the XPath translate() function inside the @select attribute of <xsl:sort> to modify the sonnets before sorting them. Because we use translate() only for sorting purposes, we aren’t changing what we write into the output, so although the apostrophe will be ignored during sorting, it will be rendered where it belongs, that is, where it occurred in the original input. We do that by putting the instruction to translate it into nothing (an empty string, using the translate() function) inside the value of the @select attribute of the <xsl:sort> element.

The challenge we run into while trying to use this function involves nesting our quotation marks so that our XSLT document remains well-formed. We have only two sets of quotation marks available in XPath: single and double. If we need both, we can nest them, but here we need three: the value of the @select attribute must be inside quotation marks, the second and third arguments to translate() are strings that must be inside quotation marks, and the character we want to translate away is also a quotation mark. Our solution here encloses the attribute value with single quotation marks, it encloses the arguments of translate() in double quotation marks, and it uses the &apos; character entity to represent the apostrophe we want to remove from our text for sorting. There are other methods of accomplishing this, as well (see below).

So how does &apos; work? We’ve been using three built-in XML character entities so far: &amp; (ampersand, or &), &lt; (less than, or <), and &gt; (greater than, or >). XML builds in two more character entities: &apos; for the straight apostrophe (')) and &quot; for the straight double quotation mark ("). Here we exploit the fact that when our XSLT stylesheet is checked for well-formedness, the &apos; character entity has not yet been transformed into an apostrophe, so we don’t have an unmatched quotation mark. If you copy and paste our code, above, into <oXygen/> and try replacing the character entity with a raw single apostrophe character, the squiggly red line will appear to tell you that that’s forbidden. The character entity lets us get away with having one more level of quotation marks.

XPath magic

We can think of at least two alternative ways of protecting the straight apostrophe from raising an error. One is that we can declare a variable, the value of which is just the single apostophe, and then refer to the variable inside the translate() function, and we can use either:

<xsl:variable name="apostrophe" as="xs:string">'</xsl:variable>

or

<xsl:variable name="apostrophe" as="xs:string" select="'"/>

A variable value can be specified in two different ways in XSLT, either as the value of a @select attribute on the <xsl:variable> element or as the content of that element, between the start and end tags. We can use either method here, and we can then refer to our variable inside the translate() function:

<xsl:sort select="translate(., $apostrophe, '')">

Alternatively, we can escape the apostrophe (that is, tell it not to have its regular meaning of string-delimiter) inside quotation marks by doubling it:

<xsl:sort select="translate(.,'''','')"/>

As Michael Kay puts it, the delimiter of a string literal can be included in the string literal by doubling it. (http://stackoverflow.com/questions/13482352/xquery-looking-for-text-with-single-quote) In other words, you can include an apostrophe inside apostrophes by doubling it, that is, by using a sequence of four apostrophes, rather than the three that correspond to what you mean.


Further enhancements

In Real Life we might want to take into account that Shakespearean sonnets normally have fourteen lines, structured (through the rhyme scheme) as three quatrains followed by a couplet. There are two exceptions: sonnet 99 contains fifteen lines (it begins with a quintain, rather than a quatrain) and sonnet 126 contains twelve lines that, furthermore, and structured, through the rhyme scheme, as six couplets. To deal with those exceptions we might make the following enhancements:

  1. In the reading view, group the lines according to the substructure instead of running them all together with no internal vertical spacing. For example, we might add vertical space after each quatrain. Because sonnets 99 and 126 are irregular, we have to handle those individually.
  2. Number the lines of the sonnets, but not all of the lines because that would look cluttered. We opted for numbering the first line of each of the subgroups above.
  3. Add an alert in both the index of first lines and the reading view about the two irregular sonnets.

You can see our enhanced view at http://dh.obdurodon.org/enhanced-sonnets.xhtml. If you view the source you’ll notice that the line numbers and the spacing between parts of the sonnet are not encoded explicitly in the HTML. In this way we reserve the HTML for representing the structure and we use CSS to control the appearance. You’ll notice also that we’ve used @class attributes very sparsely as a way of maintaining the legibility of our HTML. The @id attributes on the two main sections (index of first lines, reading view), combined with the CSS descendant combinator pattern, let us fine-tune our selectors without the clutter that pervasive @class attributes would introduce.

Placing classes (and especially classes with multiple values) on most of your elements is a rookie mistake that makes your HTML illegible, and therefore hard to debug and maintain, and you aren’t rookies anymore. Learn to use the CSS combinators to address locations in your HTML that don’t have their own @class (or @id) attributes. See the W3Schools CSS Combinators or MDN Combinators pages for introductions with examples. You can also select elements in your CSS according to their attributes; see the MDN Attribute selectors page for an introduction with examples.

The XSLT that produces our enhanced view is below, with embedded comments to explain what each part does. We use one feature that we haven’t introduced yet, a user-defined function, and you can peek ahead to learn about those at http://dh.obdurodon.org/xslt-functions.xhtml.

In our earlier approaches we tagged sonnets in the output as paragraph and used line breaks to separate the lines. We now want to use CSS to number the lines, and that isn’t easy if the individual lines are not individual elements in the HTML. For that reason we change strategies: we now tag each sonnet section (e.g., quatrain, couplet) as an unordered list and each line as a list item in those lists.



  
  
  
  
  
  
  
  
  
  
    
    
    
  
  
  
  
  
    
      
        
          <xsl:value-of select="$title"/>
        
        
      
      
        


Index of first lines


Contains lines

  • ( ; contains lines )
    ]]>

    The line numbering and the general layout are controlled with the following CSS (see the embedded comments for explanation):

     instead of border because of semantic break */
      width: 95%;
      border: none;
      height: 1px;
      background-color: gray;
    }
    #reading > section {
      /* Line counting restarts for each 
    element that wraps a sonnet */ counter-reset: lineno; } #reading ul { /* Turn off bullets, use variable defined at top * to set left padding */ display: flex; padding-left: var(--sonnet-indent); flex-direction: column; list-style-type: none; margin: 0 0 1em 0; } #reading li { /* Flex to center smaller line number (see :before, below) vertically */ counter-increment: lineno; margin-left: 1rem; display: flex; align-items: center; } #reading li:first-of-type:before { /* Number first line of each part of sonnet * Absolute position to remove from horizontal alignment of line content * Outdent to place before line * Width is needed to make text-align work * Flex (set on li) needed to center smaller number vertically */ content: counter(lineno); margin-left: -2em; position: absolute; width: 1.5em; text-align: right; color: gray; font-size: smaller; } .line-count-warning { /* Line counts other than 14 are reported and highlighted */ color: red; } p.line-count-warning { /* Align with sonnet number and start of lines */ margin-left: var(--sonnet-indent); }]]>