Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2023-01-08T19:22:20+0000


XSLT assignment #5

The input text

For this assignment you will be working with Shakespearean sonnets, which you can download from http://dh.obdurodon.org/shakespeare-sonnets.xml. You should right-click on this link, download the file, and open it in <oXygen/>. You will be building on our XSLT assignment #4, and you can take your stylesheet from that assignment and modify it for this one.

Overview of the assignment

For your last assignment you used the XSLT @mode attribute to create a table of contents for the Shakespearean sonnets, using the first line of each sonnet as a surrogate for the title (since they don’t have real titles). Our output is at http://dh.obdurodon.org/shakespeare-sonnets.xhtml.

What’s a table of contents good for anyway?

In a digital edition, we can just do a full-text search and scroll in the browser, so we don’t really need a table of contents at all. We can search for a roman numeral, we can search for the text of the first line of a sonnet, or we can search for a memorable phrase. But suppose we want to produce a paper edition, where the only organized access our users will get is the organization we decide to give them. What would be a useful table of contents or index?

A table of contents in the same order as the full text (numerical order), which is what we produced in the last assignment, duplicates ordering information. How useful is that? If we want to find a sonnet with a low number, we already know without a table of contents that we should look near the beginning. On the other hand, it’s very common in published poetry collections to include an index of first lines, sorted in alphabetical order, so that a user who remembers just the first line of a poem can find it easily.

For this assignment we’re going to enhance our output from the last assignment in the following ways:

Our HTML output is at http://dh.obdurodon.org/shakespeare-sonnets-sorted.xhtml.

The tools we need

To create links between the first lines in the table of contents and the sonnets in the full text section of the page below we’re going to use attribute value templates (AVT). If you haven’t done so already, you should read about AVTs at http://dh.obdurodon.org/avt.xhtml.

To sort the table of contents we’re going to use <xsl:sort>.

When we sort the first lines, they won’t sort correctly for a quirky reason. We’re going to fix that using the XPath translate() function, which we discuss below.

How HTML linking works

The <li> items in the table of contents should include <a> (anchor) elements, which is how HTML identifies a clickable link. An anchor that is a clickable link has an @href attribute, which points to the target to which you want to move when you click on the link. For example, the table of contents might contain the following list item for Sonnet VI:

<li><a href="#sonnetVI">Then let not winter's ragged hand deface, (VI)</a></li>

HTML <a> elements that have @href attributes normally appear blue and underlined in the browser, to advertise that they are links. The target of a link can be a different web page (on the same site or anywhere on the Internet), but it can also be any element in the same document that has an @id attribute. If you click on this line in the browser, the window will scroll to the element elsewhere in the document that has an @id attribute with the value sonnetVI. In our case, we’ve assigned that @id attribute value to the <h2> for that sonnet in the main body:

<h2 id="sonnetVI">VI</h2>

Note that the value of the @href attribute on the <a> element begins with a hash mark (#), but the value of the @id attribute on the target <h2> doesn’t. We explain below why those two values differ in this way.

One reason HTML @id attributes must be unique in a document (this is an HTML validation requirement) is that they can be link targets, and if there were duplicate values, there would be no way for the system to know which one to select.

An @href value that is just a URL (e.g., <a href="http://dh.obdurodon.org">) points to a page somewhere on the Internet, and a value that begins with a hash mark (e.g., <a href="#sonnet6">) point to an element in the same document with a matching @id value. These can be combined, so that, for example, <a href="http://dh.obdurodon.org/#xslt"> loads the page at the dh.obdurodon.org and scrolls to the section in it that has an @id value of xslt.

Adding links to your output

You should first read our page on Attribute value templates (AVT), which describes a strategy you can use to create a unique @id attribute for each sonnet. When we approached this task we gave the sonnets @id values that were a concatenation of the string sonnet and the roman numeral of the sonnet, e.g., sonnetVI for Sonnet #6. We attached those @id attributes to the <h2> elements that we used as titles for each sonnet in the body of our page, e.g., <h2 id="sonnetVI">. Meanwhile, in the table of contents at the top we created <a> elements with @href attributes that point to these @id values. The value of the @href attribute must begin with a leading # character, but that # must not be part of the value of the @id attribute to which it points. For example,

<li><a href="#sonnetVI">Then let not winter's ragged hand deface, (VI)</a></li>

means if the user clicks on this line, the browser will scroll to the line that reads <h2 id="sonnetVI"> in the main body of the page. Remember: the value of the @href attribute begins with #, but the value of the corresponding @id attribute on the <h2> element you want to scroll to doesn’t.

You can use any @id values that make sense to you except that they are subject to the same restrictions as the names of XML elements and attributes. That means, for example, that they cannot contain space characters and although they can contain digits, they cannot begin with a digit.

For a task like linking the sonnets we prefer to create human-readable @id values using the method described above because human-readable values are relatively easy to debug. That approach may not be practical with other types of data, though, where there may be no easy way to create a formula for generating a unique human-readable value. In situations like that you can use the XPath generate-id() function, which is guaranteed to 1) create a unique value for any node in an XML document and 2) always create the same value for the same node. That second property means that if you create @id attributes inside one template in your XSLT and corresponding @href values inside a different template, as long as they refer to the same node in the XML, they will always yield the same generate-id() value. You can read more about generate-id() in Kay, pp. 797–800 (with an example of using generate-id() to create links on pp. 798–99).

Armed with that information, you can first complete all of the assignment tasks except creating the clickable links and then, using AVTs, modify your XSLT to create the <a> elements with the @href attributes and the @id attributes for the targets. We use this incremental development method in our own work, too; develop one bit of functionality at a time and make sure it’s correct before you move on to the next one. If you do several things at once and you don’t get correct results, you’ll have to puzzle out where you made a mistake. If you do one thing at a time and your code goes from working to not working, that one thing is the obvious first place to look for an error.

Sorting

An index of first lines in a collection of poems is usually alphabetized because that’s how humans look things up in that kind of list. To learn how to sort your table of contents before you output it, start by looking up <xsl:sort> in Michael Kay. So far, if we’ve wanted to output, say, our table of contents in the order in which they occur in the document, we’ve used a self-closing empty element to select them with something like:

<xsl:apply-templates select="//sonnet"/>

We’ve also said, though, that the self-closing empty element tag is informationally identical to writing the start- and end-tags separately with nothing between them, that is:

<xsl:apply-templates select="//sonnet"></xsl:apply-templates>

To cause the elements being processed to be sorted first, you need to use this alternative notation, with separate start- and end-tags, because you need to put the <xsl:sort> element between the start- and end-tags. If you use the first notation, the one with a single self-closing tag, there’s no between in which to put the <xsl:sort> element. In other words, you want something like:

<xsl:apply-templates select="//sonnet">
  <xsl:sort/>
</xsl:apply-templates/>

As written, the preceding will sort the <sonnet> elements alphabetically by their text value. As you’ll see in the description of <xsl:sort> in Michael Kay, though, it’s also possible to use the @select attribute on <xsl:sort> to sort a set of items by properties other than alphabetic order of their textual content, and we discuss below why we need to that here.

After the sort

At this point we’d make other adjustments in the output. The original table of contents begins with a roman numeral, but if you’re going to sort the table of contents, you want the text of the first line of the poem at the left side of the line, not preceded by the roman numeral, so that you can see the alphabetic order easily. Putting the roman numeral first would make it harder to discern the alphabetization, since the user wouldn’t be able to see it by just glancing down the left margin. For that reason, you should now adjust the output to put the roman numeral after the text of the line, in parentheses.

Using translate() to fix the sort order

If you sort the first lines alphabetically according to their textual value, there will be one error. The first line of Sonnet #121, 'Tis better to be vile than vile esteem'd,, will show up first because in the internal representation of characters in the computer, the single straight apostrophe is alphabetically earlier than all of the letters. We can fix this by using translate() to strip the apostrophe for sorting purposes, but not for rendering. That is, we can sort as if there were no apostrophe, while still printing the apostrophe when we render the line.

We can’t easily translate away an apostropohe, though, because quotation marks have special meaning in XPath. XPath gives you two types of quotation marks, single and double, but here you need three: you need quotation marks around attribute values, you need quotation marks around strings, and the string value for this task includes a quotation mark, since that’s what you need to translate away. See Michael Kay’s answer at http://p2p.wrox.com/xslt/50152-how-do-you-translate-apostrophe.html; either method there will work, but we recommend the XSLT 2.0 one because we find it easier to read.

Optional finishing touches

Some lists of first lines of poetry put quotation marks around the lines. We haven’t done the following in our solution, but if you’d like to add it, you should use the HTML <q> (quoted text) element, instead of outputting the raw quotation marks as plain text. By default text inside HTML <q> tags will be rendered with straight quotation marks around it, but you really want curly quotation marks, and you can use CSS to control that (see Sara Cope’s quotes for details).

You can use CSS not only to control the shape of (optional) quotation marks around first lines, but also, more generally, to make your HTML look more interesting than what you get by default in a web browser. If you do that, write the CSS in separate file and let your XSLT create the <link> (don’t add it manually afterwards).

What to submit

You should upload the XSLT stylesheet you created to run the transformation, the HTML it produced, and your CSS (if you used it). The XSLT must include meaningful comments, along the lines described in our first XSLT assignment. The HTML must be valid; if it isn’t, the XSLT must include, as properly formatted code comments, information about how you tried to debug your transformation.