Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2023-01-08T19:22:20+0000
For this assignment you will be working with Shakespearean sonnets, which you can download from http://dh.obdurodon.org/shakespeare-sonnets.xml. You should right-click on this link, download the file, and open it in <oXygen/>. You will be building on our XSLT assignment #4, and you can take your stylesheet from that assignment and modify it for this one.
For your last assignment you used the XSLT @mode
attribute to create a table of contents for the Shakespearean sonnets, using the first
line of each sonnet as a surrogate for the title (since they don’t have real titles).
Our output is at http://dh.obdurodon.org/shakespeare-sonnets.xhtml.
In a digital edition, we can just do a full-text search and scroll in the browser, so we don’t really need a table of contents at all. We can search for a roman numeral, we can search for the text of the first line of a sonnet, or we can search for a memorable phrase. But suppose we want to produce a paper edition, where the only organized access our users will get is the organization we decide to give them. What would be a useful table of contents or index?
A table of contents in the same order as the full text (numerical order), which is what we produced in the last assignment, duplicates ordering information. How useful is that? If we want to find a sonnet with a low number, we already know without a table of contents that we should look near the beginning. On the other hand, it’s very common in published poetry collections to include an index of first lines, sorted in alphabetical order, so that a user who remembers just the first line of a poem can find it easily.
For this assignment we’re going to enhance our output from the last assignment in the following ways:
VI. Then let not winter’s ragged hand deface,. If we’re now going to sort by the first line of text, though, having those roman numerals at the far left edge of the entry will be disorienting, since they’ll obscure the fact that we’re using alphabetical order. We do want to retain the roman numerals (after all, the sonnet numbers are meaningful for Shakespeare scholars), but we’re going to move them to the end of the line, so that when the user reads down the left edge of the table of contents, scanning for a particular first line, the alphabetic order will be immediately accessible.
Our HTML output is at http://dh.obdurodon.org/shakespeare-sonnets-sorted.xhtml.
To create links between the first lines in the table of contents and the sonnets in the full text section of the page below we’re going to use attribute value templates (AVT). If you haven’t done so already, you should read about AVTs at http://dh.obdurodon.org/avt.xhtml.
To sort the table of contents we’re going to use
<xsl:sort>
.
When we sort the first lines, they won’t sort correctly for a quirky reason. We’re going
to fix that using the XPath translate()
function,
which we discuss below.
The <li>
items in the table of contents should
include <a>
(anchor
) elements, which is
how HTML identifies a clickable link. An anchor that is a clickable link has an
@href
attribute, which points to the target to
which you want to move when you click on the link. For example, the table of contents
might contain the following list item for Sonnet VI:
<li><a href="#sonnetVI">Then let not winter's ragged hand deface, (VI)</a></li>
HTML <a>
elements that have
@href
attributes normally appear blue and
underlined in the browser, to advertise that they are links. The target of a
link can be a different web page (on the same site or anywhere on the Internet), but it
can also be any element in the same document that has an
@id
attribute. If you click on this line in the
browser, the window will scroll to the element elsewhere in the document that has an
@id
attribute with the value sonnetVI
. In
our case, we’ve assigned that @id
attribute value
to the <h2>
for that sonnet in the main
body:
<h2 id="sonnetVI">VI</h2>
Note that the value of the @href
attribute on the
<a>
element begins with a hash mark
(#
), but the value of the
@id
attribute on the target
<h2>
doesn’t. We explain below why those two
values differ in this way.
One reason HTML @id
attributes must be unique in
a document (this is an HTML validation requirement) is that they can be link
targets, and if there were duplicate values, there would be no way for the system to
know which one to select.
An @href
value that is just a URL (e.g.,
<a href="http://dh.obdurodon.org">
) points
to a page somewhere on the Internet, and a value that begins with a hash mark (e.g.,
<a href="#sonnet6">
) point to an element in
the same document with a matching @id
value.
These can be combined, so that, for example,
<a href="http://dh.obdurodon.org/#xslt">
loads the page at the dh.obdurodon.org
and
scrolls to the section in it that has an @id
value of xslt
.
You should first read our page on Attribute value templates
(AVT), which describes a strategy you can use to create a unique
@id
attribute for each sonnet. When we approached
this task we gave the sonnets @id
values that were
a concatenation of the string sonnet
and the roman numeral of the sonnet, e.g.,
sonnetVI
for Sonnet #6. We attached those
@id
attributes to the
<h2>
elements that we used as titles for each
sonnet in the body of our page, e.g.,
<h2 id="sonnetVI">
. Meanwhile, in the table of
contents at the top we created <a>
elements with
@href
attributes that point to these
@id
values. The value of the
@href
attribute must begin with a leading
#
character, but that #
must not be part of the value of the
@id
attribute to which it points. For
example,
<li><a href="#sonnetVI">Then let not winter's ragged hand deface, (VI)</a></li>
means if the user clicks on this line, the browser will scroll to the line that reads
<h2 id="sonnetVI">
in the main body of the page.
Remember: the value of the @href
attribute
begins with #
, but the value of the corresponding
@id
attribute on the
<h2>
element you want to scroll to
doesn’t.
You can use any @id
values that make sense to
you except that they are subject to the same restrictions as the names of XML
elements and attributes. That means, for example, that they cannot contain space
characters and although they can contain digits, they cannot begin with a digit.
For a task like linking the sonnets we prefer to create human-readable
@id
values using the method described above
because human-readable values are relatively easy to debug. That approach may not be
practical with other types of data, though, where there may be no easy way to create
a formula for generating a unique human-readable value. In situations like that you
can use the XPath generate-id()
function, which
is guaranteed to 1) create a unique value for any node in an XML document and 2)
always create the same value for the same node. That second property means that if
you create @id
attributes inside one template
in your XSLT and corresponding @href
values
inside a different template, as long as they refer to the same node in the XML, they
will always yield the same generate-id()
value.
You can read more about generate-id()
in Kay,
pp. 797–800 (with an example of using
generate-id()
to create links on pp.
798–99).
Armed with that information, you can first complete all of the assignment tasks except
creating the clickable links and then, using AVTs, modify your XSLT to create the
<a>
elements with the
@href
attributes and the
@id
attributes for the targets. We use this
incremental development method in our own work, too; develop one bit of functionality at
a time and make sure it’s correct before you move on to the next one. If you do several
things at once and you don’t get correct results, you’ll have to puzzle out where you
made a mistake. If you do one thing at a time and your code goes from working to not
working, that one thing is the obvious first place to look for an error.
An index of first lines in a collection of poems is usually alphabetized because that’s
how humans look things up in that kind of list. To learn how to sort your table of
contents before you output it, start by looking up
<xsl:sort>
in Michael Kay. So far, if we’ve
wanted to output, say, our table of contents in the order in which they occur in the
document, we’ve used a self-closing empty element to select them with something
like:
<xsl:apply-templates select="//sonnet"/>
We’ve also said, though, that the self-closing empty element tag is informationally identical to writing the start- and end-tags separately with nothing between them, that is:
<xsl:apply-templates select="//sonnet"></xsl:apply-templates>
To cause the elements being processed to be sorted first, you need to use this
alternative notation, with separate start- and end-tags, because you need to put the
<xsl:sort>
element between the start- and
end-tags. If you use the first notation, the one with a single self-closing tag, there’s
no between
in which to put the <xsl:sort>
element. In other words, you want something like:
<xsl:apply-templates select="//sonnet">
<xsl:sort/>
</xsl:apply-templates/>
As written, the preceding will sort the <sonnet>
elements alphabetically by their text value. As you’ll see in the description of
<xsl:sort>
in Michael Kay, though, it’s also
possible to use the @select
attribute on
<xsl:sort>
to sort a set of items by properties
other than alphabetic order of their textual content, and we discuss below why we need
to that here.
At this point we’d make other adjustments in the output. The original table of contents begins with a roman numeral, but if you’re going to sort the table of contents, you want the text of the first line of the poem at the left side of the line, not preceded by the roman numeral, so that you can see the alphabetic order easily. Putting the roman numeral first would make it harder to discern the alphabetization, since the user wouldn’t be able to see it by just glancing down the left margin. For that reason, you should now adjust the output to put the roman numeral after the text of the line, in parentheses.
translate()
to fix the sort orderIf you sort the first lines alphabetically according to their textual value, there will
be one error. The first line of Sonnet #121, 'Tis better to be vile than vile
esteem'd,
, will show up first because in the internal representation of
characters in the computer, the single straight apostrophe is alphabetically
earlier than all of the letters. We can fix this by using
translate()
to strip the apostrophe for sorting
purposes, but not for rendering. That is, we can sort as if there were no apostrophe,
while still printing the apostrophe when we render the line.
We can’t easily translate away an apostropohe, though, because quotation marks have special meaning in XPath. XPath gives you two types of quotation marks, single and double, but here you need three: you need quotation marks around attribute values, you need quotation marks around strings, and the string value for this task includes a quotation mark, since that’s what you need to translate away. See Michael Kay’s answer at http://p2p.wrox.com/xslt/50152-how-do-you-translate-apostrophe.html; either method there will work, but we recommend the XSLT 2.0 one because we find it easier to read.
Some lists of first lines of poetry put quotation marks around the lines. We haven’t done
the following in our solution, but if you’d like to add it, you should use the HTML
<q>
(quoted text
) element, instead of
outputting the raw quotation marks as plain text. By default text inside HTML
<q>
tags will be rendered with straight
quotation marks around it, but you really want curly quotation marks, and you can use
CSS to control that (see Sara Cope’s quotes for
details).
You can use CSS not only to control the shape of (optional) quotation marks around first
lines, but also, more generally, to make your HTML look more interesting than what you
get by default in a web browser. If you do that, write the CSS in separate file and let
your XSLT create the <link>
(don’t add it
manually afterwards).
You should upload the XSLT stylesheet you created to run the transformation, the HTML it produced, and your CSS (if you used it). The XSLT must include meaningful comments, along the lines described in our first XSLT assignment. The HTML must be valid; if it isn’t, the XSLT must include, as properly formatted code comments, information about how you tried to debug your transformation.