Maintained by: David J. Birnbaum (djbpitt@gmail.com)
Last modified:
2023-03-17T15:06:20+0000
Your assignment is to create an XSLT stylesheet that will transform Bad Hamlet into a hierarchical outline of the titles of acts and scenes in HTML. This isn’t very interesting on its own, of course, but if you were transforming the entire document into HTML for publication on the web, this might serve as the skeleton. It might also stand on its own as a table of contents at the top of such a publication, so that the reader could click on the title of a scene to jump to that location in the file.
There are different ways to do this, but the details we found most important were:
Know what the output HTML should look like. If you don’t know how to create HTML by hand that looks like the output you want, you won’t be able to create XSLT that produces that HTML. This may mean looking up some of the HTML elements in a reference until you become familiar with them through experience.
Apply templates not directly to the
<head>
elements, but to the acts and
scenes. Applying templates to the acts
(<div>
elements) makes it easy,
inside a templates that matches acts, to do one thing with its
<head>
child and then create an HTML
<ul>
wrapper inside which you can
apply templates to the scenes. If you apply templates directly to the
<head>
children of acts, you make it
harder (not impossible, but more awkward) to navigate to the
scenes.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
exclude-result-prefixes="#all"
xpath-default-namespace="http://www.tei-c.org/ns/1.0"
version="3.0">
<xsl:output method="xhtml" html-version="5"
omit-xml-declaration="no"
include-content-type="no"
indent="yes"/>
<xsl:template match="/">
<html>
<head>
<title>Hamlet</title>
</head>
<body>
<ul>
<xsl:apply-templates select="//body/div"/>
</ul>
</body>
</html>
</xsl:template>
<xsl:template match="body/div">
<li>
<xsl:apply-templates select="head"/>
<ul>
<xsl:apply-templates select="div"/>
</ul>
</li>
</xsl:template>
<xsl:template match="div">
<li>
<xsl:apply-templates select="head"/>
</li>
</xsl:template>
</xsl:stylesheet>
We modify the new XSLT skeletal document that <oXygen/> creates for us in the following ways (copied from our initial XSLT tutorial):
xmlns="http://www.w3.org/1999/xhtml"
inside the start-tag of the root element to declare that elements created in the
output will be in the HTML namespace (unless we say otherwise).xpath-default-namespace="http://www.tei-c.org/ns/1.0"
inside the start-tag of the root element to declare that XPath expressions that
refer to the input document will look for elements in the TEI namespace (unless
we say otherwise).exclude-result-prefixes="#all"
. This isn’t
strictly necessary with this particular document because don’t declare any
additional namespace prefixes, but it does no harm, so we always do it when we
write XSLT.<xsl:output>
elements that
specifies that we are creating XHTML 5 output using the XML syntax, that we want
it to include the initial XML declaration, that we do not want it to insert a
<meta>
element into the
<head>
(we only need that if we use HTML
syntax), and that we want the output to be pretty-printed.Our solution made use of three template rules: for the document node
(/
), acts
(body/div
), and scenes
(div
).
The document template creates the basic structure of the HTML document we want to
produce. It has the <html>
root element
along with the required <head>
and
<body>
elements inside it. Inside the body,
we create an unordered
(that is, bulleted) list, denoted by the
<ul>
element, that will eventually hold five
items, one for each act. Unordered lists in HTML are required to have content in the
form of <li>
(list item) elements, and we’ll
create those later, when we actually process the acts. For now, we just create the
wrapper <ul>
element that will hold them.
Between the start- and end- <ul>
tags (since
that’s where we eventually want our <li>
elements to be created), we apply templates to (that is, process) all of the acts,
which we point to with the XPath expression
//body/div
as the value of the
@select
attribute of the
<xsl:apply-templates>
elements.
The XPath expression //body/div
, above, selects
a sequence of all of the acts (all five act
<div>
elements), and the
<xsl:apply-templates>
elements says to get
them processed, but it doesn’t specify how to process them. So how do the
acts get processed? What happens is that all of the templates in the stylesheet are
constantly watching for something to do, and when that
<xsl:apply-templates>
element fires, they
all see that some <div>
elements need to be
processed. The first template (the one for the document node) doesn’t match
<div>
elements, so it doesn’t do anything.
The other two templates both match <div>
elements; the second template matches <div>
elements that have a <body>
parent and the
third matches all <div>
elements. So how is
the conflict or competition resolved? Which one gets to handle the acts? Whenever
there’s an ambiguity like that, the more specific match rule wins, so the
second template rule, the one that matches
body/div
, takes charge of each of the five
acts. (Alternatively, we could have had the last template rule match not just
div
, but specifically
div/div
, which would have removed the potential
ambiguity. Whether you deal with those situations by writing only unambiguous and
specific rules or by letting the more specific match rule wins
principle make
the decision, you’ll get the same result here.)
Each act returned by the path expression, then, triggers the second template, which
creates an <li>
element, and it creates that
<li>
element between the start- and end-tags
for the <ul>
that we created earlier. That’s
the effect of putting the
<xsl:apply-templates>
element between those
start- and end-tags; whatever happens as a result of that instruction will be
written in the location where the instruction stood. Within that new
<li>
element, we tell the system to process
(apply templates to) the <head>
child of the
current context (that is, the <head>
child
of the act we’re processing at the moment) and then to create a new
<ul>
, an embedded list. Within that new,
embedded list it applies templates to another set of nodes, selected by the path
expression div
, which are the <div>
children of the current act, that is, its scenes. Note that the
<xsl:apply-templates>
here executes XPath
from the current context, which is the act we are processing at the moment,
so it gets only the scenes of that act. Remember that the default axis is
the child axis, so the expression div
as the
value of the @select
attribute says to find all
of the child <div>
elements inside the
current act.
When XSLT finds the <head>
elements (just one
for each act), it tries to apply templates, but we haven’t defined a template for
<head>
elements. How does the stylesheet
know what to do? XSLT includes built-in, or default templates, which say
that:
<head>
elements each have one
child node, which is a text node.Note that this happens in two steps: first the built-in rule for elements processes
the <head>
element, so it throws away the
tags and automatically applies templates to its only child, which is a text node.
Since we don’t have a template that matches text nodes, the built-in template then
writes the textual value of the text node into the output.
We can write templates that match text nodes if we need to, but it’s relatively uncommon. In this case we don’t have a template that matches text nodes, so the built-in template that matches text nodes processes them.
Since what we want in this case is to output the textual content of the text-node
child of the <head>
elements, we let the
built-in rules do it for us. The effect is to print something like Act 1
before we list all of the scenes in the first act.
We don’t type the word Act
or the number 1
; that gets
retrieved from the <head>
element in the input XML document.
We don’t have separate instructions to process Act 1, Act 2, Act 3, etc., since 1) that would prevent us from reusing this code on a play that didn’t have exactly five acts; 2) it would require us to know something that we don’t have to know in advance, that is, the number of acts in the play; and 3) it would make us write the same code multiple times, which creates unnecessary opportunities for error..
Don’t make any assumptions in your XSLT that you don’t have to make. If the prose
description of the task is process all of the acts
, aim for a strategy
that does not require you to know how many acts there are in advance.
After printing the header, while still inside the list item for the act we’re working
on at the moment, we create another unordered list and apply templates to the
results of the XPath expression div
. Since our
current context is a <div>
element that
represents an act, this finds all of the
<div>
children of the act that we are
in. We know from studying the document that these elements must be
scenes.
A common beginner error is to write
select="//div"
instead of
select="div"
. The incorrect XPath
expression starts with a double slash, so it starts at the document node and
retrieves all <div>
elements in the
entire play. This means that you’ll include the same information from all 26
<div>
elements five times, once per act.
The correct XPath expression uses the default child axis to include
information only from the <div>
children of the current context item (the current act), that is, only
from the scenes of one act at a time.
When the new
<xsl:apply-templates select="div"/>
element
fires, then, the @select
attribute rounds up
the <div>
children of the current context,
which are the scenes in the current act. It doesn’t say how to process them though.
As above, the template rules are always watching for new content, and once again the
call has gone out to process some <div>
elements. The first template rule, the one for the document node, doesn’t match, so
it doesn’t do anything. The second template matches
<div>
elements only when they are the
children of a <body>
element. Since these
new <div>
elements are the children of a
<div>
element, the second template rule
won’t touch them either. The third template rule, though, matches
<div>
elements without looking at their
parents or anything else about them, so it takes charge in this case, and generates
the list items for the scenes, inserting the values of the
<head>
children of the scenes, one by one,
as they’re processed.
<div>
elements inside a
template that matches acts, you will process only the scenes in that specific
act. This feature is what lets you construct separate lists for the scenes in
each act.<ul>
wrapper inside the <body>
because you
want one list. Don’t create any list items yet, though! Inside that wrapper
apply templates to the acts, and create an
<li>
element inside the template that
matches acts. That combination creates what you want: one list wrapper for all
acts, one list item for each act.<xsl:apply-templates>
determines which
nodes in the input tree should be processed and where the result of processing
them should be inserted into the output, but it doesn’t say anything about
how to process those nodes.
<xsl:template>
then does the processing.
It may be helpful to think of this as a matter of cooperative throwing and
catching:
<xsl:apply-templates>
rounds up nodes to
be processed and tosses them into the air, and the various template
rules then reach up and catch only the elements they know how to
process and take care of them. If you haven’t written a template to process some
type of node, it will be caught (processed) by a built-in template,
that is, by default handling rules.<xsl:apply-templates>
elements, and the
order of processing within a sequence follows document order.