Maintained by: David J. Birnbaum (djbpitt@gmail.com)
Last modified:
2023-03-28T01:11:02+0000
Enhance your output from the last assignment in the following ways:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://www.w3.org/1999/xhtml"
xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="#all"
version="3.0">
<xsl:output method="xhtml" html-version="5" omit-xml-declaration="no" include-content-type="no"
indent="yes"/>
<xsl:template match="/">
<html>
<head>
<title>Shakespearean sonnets</title>
</head>
<body>
<h1>Shakespearean sonnets</h1>
<h2>Contents</h2>
<ul>
<xsl:apply-templates select="//sonnet" mode="toc">
<xsl:sort select='translate(., "'", "")'/>
</xsl:apply-templates>
</ul>
<hr/>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<!-- ============================================== -->
<!-- Table of contents mode -->
<!-- ============================================== -->
<xsl:template match="sonnet" mode="toc">
<li>
<a href="#sonnet{@number}">
<xsl:apply-templates select="line[1]" mode="toc"/>
<xsl:text> (</xsl:text>
<xsl:apply-templates select="@number"/>
<xsl:text>)</xsl:text>
</a>
</li>
</xsl:template>
<xsl:template match="line" mode="toc">
<xsl:apply-templates/>
</xsl:template>
<!-- ============================================== -->
<!-- Reading view -->
<!-- ============================================== -->
<xsl:template match="sonnet">
<section id="sonnet{@number}">
<h2>
<xsl:apply-templates select="@number"/>
</h2>
<p>
<xsl:apply-templates select="line"/>
</p>
</section>
</xsl:template>
<xsl:template match="line">
<xsl:apply-templates/>
<xsl:if test="following-sibling::line">
<br/>
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
The description of our solution to this assignment assumes that you have read our previous answer sheet for XSLT assignment #4, because this assignment builds largely upon the work done in that one. If you haven’t studied that solution, you can do so at http://dh.obdurodon.org/xslt-assignment-04-answers.xhtml.
To observe the effects of the changes we’ve made to our stylesheet from the last
assignment, we’re going to make comparisons to that solution. For the moment, we will ignore
the changes in the template that matches the document node
(/
). The changes there affect the sorting of our table
of contents. Instead, take a look at the template that matches
<sonnet>
where
@mode = "toc"
. We’ve done two things there.
Within our list of first lines we need to wrap the content of each of our
<li>
elements in an
<a>
tag with an
@href
attribute, which is how HTML defines a clickable
link. We need a consistent strategy to create @href
values for the links, so that every link will point unambiguously to the text of exactly one
sonnet. In order for the links to work properly (that is, to have somewhere to land), each
of them has to match perfectly the @id
value of exactly
one element in the document, which we will be generating later. We chose to make all of our
@href
attributes have the form of the string
sonnet
followed by that sonnet’s specific roman numeral, so that the first sonnet’s
link, for example, would look like
<a href="#sonnetI">
and the second sonnet’s link
look like <a href="#sonnetII>
, etc. Remember (from
the description of the original assignment) that all
<a>
elements linking to somewhere else on the same
page must begin their @href
values with a hashmark
(#
), but that character should not be part of the corresponding
@id
value of the target of the link. For example, a
clickable link that points to Sonnet #2 will have a start tag that reads
<a href="#sonnetII">
(with a hash mark
[#
]), and the
<section>
element wrapping that sonnet will have a
start tag that reads <section id="sonnetII">
(without a hash mark).
To generate a different value of @href
for every
sonnet’s link (and, later, for the @id
to which the
@href
points) we need to use XPath. We describe how to
do this in our page about Attribute value templates (AVT). For
reasons explained at that page, to delimit our XPath expression within the otherwise literal
attribute value we need to encase it in curly braces
({}
). We can retrieve the unique roman numeral for the
sonnet we are processing at the moment by looking for it in the
@number
attribute of that
<sonnet>
. We concatenate the raw string
sonnet
with the retrieved value of the @number
attribute as follows (see our documentation on AVTs for an
explanation of the role of the curly braces):
<a href="#sonnet{@number}">
…
</a>
We also need to sort the first lines. Unless we specify otherwise, XSLT will process the
sonnets in document order, which is numerical order because they happen to be arranged that
way in the input XML document. In addition to sorting the first lines, we also want to
render the textual content of the line before the sonnet number (roman numeral). If
we don’t do that, although our lines will be alphabetical order, users won’t be able to see
that easily because each line will begin with a roman numeral that isn’t part of that sort
order. What we really want is for users to glance down the left margin of the list of first
lines and find the ones they want easily. To make the alphabetic order easier to see, we
move the <xsl:apply-templates>
selecting our first
line to the beginning of the ouput, and move the roman numeral to the end, wrapped in
parentheses. As always, instead of outputting raw text when we’re creating mixed content, we
wrap all raw text in <xsl:text>
elements, which
helps prevent quirky white-space handling.
The next change we have to make in order to get our table of contents to link to our
sonnets is to create @id
attributes for the targets of
our links, so that when the user clicks on a link, there will be somewhere to scroll to.
Remember that the @href
values that we‘ve created for
our clickable links must start with hashmarks
(#
), but the values of the identifiers
(@id
attributes) they point to must not start
with hashmarks (#
). We chose to put the identifiers on
the <section>
elements that we generate; this way, when
users click on a link to a sonnet.
We chose to create our @id
values, and the links
that point to them, in a deliberate way that ensured that they would be legible to humans,
since that makes it easier for us to recognize mistakes during development. Constructing
our own values becomes more challenging and less appealing as our need for links (in other
projects) becomes more intricate, and XPath provides, as an alternative to building your
own unique identifier, the generate-id()
function,
which can be used to construct links automatically. You can read about how to use
generate-id()
in Michael Kay, but here is a
summary:
generate-id()
is guaranteed to create the same
value for the same node in the input tree each time it is evaluated during an XSLT
transformation. This means that if you apply it to sonnets in two different templates
(as we do for this exercise), it will create the same value, and that means that you
can use the value to construct a link. But:
You still have to prepend the hash mark to the clickable link yourself, but not
to the target @id
.
To get the same value you need to evaluate exactly the same node from the input tree. For example, you’ll get a different value for a sonnet than you will for its first line.
The value created for a particular node in the input XML is guaranteed to be unique and repeatable only within a transformation, which means that if you run the transformation twice you could get different values. This isn’t an issue as long as the links are internal, but it limits the possibility of using this method to create links from one file to a specific location in a different file.
The advantages are that you don’t have to construct a complex value manually and
the values are guaranteed to be unique (which is a requirement for
@id
attributes in HTML) and repeatable (which is
needed to ensure that your clickable links and targets will agree).
The disadvantage is that the value is not human-readable, which can make this
approach more difficult to debug than working with values like sonnetI
,
sonnetII
, etc.
Finally, we need to sort the table of contents in alphabetical order of their first lines.
We can do that by putting <xsl:sort>
between the
start and end tags of <xsl:apply-templates>
, but if
that’s all we do, one sonnet will be out of place. The exception is that Sonnet CXXI begins
with ‘Tis
, and the apostrophe at the beginning of that line will cause that line to
be sorted alphabetically before all other sonnets. This happens because if all we do is tell
the system to sort, it uses all characters in the sonnet (starting with the first line) to
sort, and it doesn’t know that the leading apostrophe should be ignored.
We can fix this problem by using the XPath translate()
function inside the @select
attribute of
<xsl:sort>
to modify the sonnets before sorting
them. Because we use translate()
only for sorting
purposes, we aren’t changing what we write into the output, so although the apostrophe will
be ignored during sorting, it will be rendered where it belongs, that is, where it occurred
in the original input. We do that by putting the instruction to translate it into nothing
(an empty string, using the translate()
function)
inside the value of the @select
attribute of the
<xsl:sort>
element.
The challenge we run into while trying to use this function involves nesting our quotation
marks so that our XSLT document remains well-formed. We have only two sets of quotation
marks available in XPath: single and double. If we need both, we can nest them, but here we
need three: the value of the @select
attribute must be
inside quotation marks, the second and third arguments to
translate()
are strings that must be inside quotation
marks, and the character we want to translate away is also a quotation mark. Our solution
here encloses the attribute value with single quotation marks, it encloses the arguments of
translate()
in double quotation marks, and it uses the
'
character entity to represent the apostrophe
we want to remove from our text for sorting. There are other methods of accomplishing this,
as well (see below).
So how does '
work? We’ve been using three
built-in XML character entities so far: &
(ampersand, or &
), <
(less than, or
<
), and >
(greater than, or
>
). XML builds in two more character entities:
'
for the straight apostrophe ('
)) and
"
for the straight double quotation mark
("
). Here we exploit the fact that when our XSLT stylesheet is checked for
well-formedness, the '
character entity has
not yet been transformed into an apostrophe, so we don’t have an unmatched quotation mark.
If you copy and paste our code, above, into <oXygen/> and try replacing the character
entity with a raw single apostrophe character, the squiggly red line will appear to tell you
that that’s forbidden. The character entity lets us get away with having one more level of
quotation marks.
We can think of at least two alternative ways of protecting the straight apostrophe from
raising an error. One is that we can declare a variable, the value of which is just the
single apostophe, and then refer to the variable inside the
translate()
function, and we can use either:
<xsl:variable name="apostrophe" as="xs:string">'</xsl:variable>
or
<xsl:variable name="apostrophe" as="xs:string" select="'"/>
A variable value can be specified in two different ways in XSLT, either as the value of a
@select
attribute on the
<xsl:variable>
element or as the content of that
element, between the start and end tags. We can use either method here, and we can then
refer to our variable inside the translate()
function:
<xsl:sort select="translate(., $apostrophe, '')">
Alternatively, we can escape the apostrophe (that is, tell it not to have its regular meaning of string-delimiter) inside quotation marks by doubling it:
<xsl:sort select="translate(.,'''','')"/>
As Michael Kay puts it, the delimiter of a string literal can be included in the string
literal by doubling it
. (http://stackoverflow.com/questions/13482352/xquery-looking-for-text-with-single-quote)
In other words, you can include an apostrophe inside apostrophes by doubling it, that is, by
using a sequence of four apostrophes, rather than the three that correspond to what you
mean.
In Real Life we might want to take into account that Shakespearean sonnets normally have fourteen lines, structured (through the rhyme scheme) as three quatrains followed by a couplet. There are two exceptions: sonnet 99 contains fifteen lines (it begins with a quintain, rather than a quatrain) and sonnet 126 contains twelve lines that, furthermore, and structured, through the rhyme scheme, as six couplets. To deal with those exceptions we might make the following enhancements:
You can see our enhanced view at http://dh.obdurodon.org/enhanced-sonnets.xhtml. If you view the source you’ll notice
that the line numbers and the spacing between parts of the sonnet are not encoded explicitly
in the HTML. In this way we reserve the HTML for representing the structure and we use CSS
to control the appearance. You’ll notice also that we’ve used
@class
attributes very sparsely as a way of maintaining
the legibility of our HTML. The @id
attributes on the
two main sections (index of first lines, reading view), combined with the CSS descendant
combinator pattern, let us fine-tune our selectors without the clutter that pervasive
@class
attributes would introduce.
Placing classes (and especially classes with multiple values) on most of your elements is
a rookie mistake that makes your HTML illegible, and therefore hard to debug and maintain,
and you aren’t rookies anymore. Learn to use the CSS combinators to address locations in
your HTML that don’t have their own @class
(or
@id
) attributes. See the W3Schools CSS Combinators or MDN Combinators pages for introductions with examples. You can also select elements in
your CSS according to their attributes; see the MDN Attribute
selectors page for an introduction with examples.
The XSLT that produces our enhanced view is below, with embedded comments to explain what each part does. We use one feature that we haven’t introduced yet, a user-defined function, and you can peek ahead to learn about those at http://dh.obdurodon.org/xslt-functions.xhtml.
In our earlier approaches we tagged sonnets in the output as paragraph and used line breaks to separate the lines. We now want to use CSS to number the lines, and that isn’t easy if the individual lines are not individual elements in the HTML. For that reason we change strategies: we now tag each sonnet section (e.g., quatrain, couplet) as an unordered list and each line as a list item in those lists.
Index of first lines
Contains
lines
(
;
contains
lines
)
]]>
The line numbering and the general layout are controlled with the following CSS (see the embedded comments for explanation):
instead of border because of semantic break */
width: 95%;
border: none;
height: 1px;
background-color: gray;
}
#reading > section {
/* Line counting restarts for each element that wraps a sonnet */
counter-reset: lineno;
}
#reading ul {
/* Turn off bullets, use variable defined at top
* to set left padding */
display: flex;
padding-left: var(--sonnet-indent);
flex-direction: column;
list-style-type: none;
margin: 0 0 1em 0;
}
#reading li {
/* Flex to center smaller line number (see :before, below) vertically */
counter-increment: lineno;
margin-left: 1rem;
display: flex;
align-items: center;
}
#reading li:first-of-type:before {
/* Number first line of each part of sonnet
* Absolute position to remove from horizontal alignment of line content
* Outdent to place before line
* Width is needed to make text-align work
* Flex (set on li) needed to center smaller number vertically */
content: counter(lineno);
margin-left: -2em;
position: absolute;
width: 1.5em;
text-align: right;
color: gray;
font-size: smaller;
}
.line-count-warning {
/* Line counts other than 14 are reported and highlighted */
color: red;
}
p.line-count-warning {
/* Align with sonnet number and start of lines */
margin-left: var(--sonnet-indent);
}]]>