Author: Andrew Nitz (acn23@pitt.edu) Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2023-03-18T18:28:11+0000
This tutorial was written when XSLT 2.0 was the most recent version, and the XSLT
stylesheets therefore contain boilerplate that was correct for XSLT 2.0, but that can be
improved for XSLT 3.0, which is now the most recent version. In particular, the version
number should now be specified as 3.0 and the
<xsl:output>
element should follow the guidelines
in our first XSLT tutorial.
While developing an XSLT stylesheet, we observe the following guidelines, which keep our overall goals in mind as develop working components piece by piece. This helps us to write more concise and accurate code, while also reducing time spent troubleshooting. We recommend re-reading this page as a reminder before beginning to write an XSLT stylesheet until you feel that you’ve internalized the model.
Start by making sure that your XSLT can read your input file and generate output, even if the output is just a placeholder. As you continue to run your transformation each time you add a new step, it should continue to read input and generate output (even if the output continues to include placeholder text). Any time you can’t read input and generate output, fix the problem before you do anything else. It’s easier to fix a problem as soon as you learn about it than to try to track it down in an ocean of new code.
This is the most important guideline for avoiding tedious, confusing, and unproductive troubleshooting. When writing XSLT, you should add one bit of functionality at a time, and then run the transformation to verify that the new code works before moving on to the next step. You need to watch out for two types of problems:
The point of the coding, testing, and debugging in small cycles is that it’s easiest to
find and fix mistakes when you’ve written only a few new lines of code since the last cycle.
For example, if you are trying to apply templates to all
<p>
tags in an XML document and format them in some
way, it’s a good idea first, before you think about applying the formatting, to ensure that
you are actually finding the <p>
tags correctly. You
can do that by writing a template that matches <p>
elements and just outputs some placeholder text, and once that works, you can replace the
placeholder with more refined code that processes them the way you want. And when you do
that formatting, you need to test each feature as you add it, instead of writing the entire
block and checking only then. If you do the latter and something doesn’t work, you’ll have
set yourself up for painfully confusing debugging.
A stub in coding terminology is a snippet of code used to stand in for
something that will be developed later—basically a placeholder for functionality that has
not been written yet. You want to be coding only one piece of functionality at a time, but
sometimes you’ll want to use stubs to help keep your overall goals in mind while working in
different sections. For example, if you have a template that will eventually output a table
of contents, initially it might just output plain text that says Table of contents to go
here
. This lets you verify, in the output, that you’re calling the template and it’s
returning output.
Unless the purpose of a piece of code is self-documenting (obvious, self-explanatory),
describe its purpose inside an XML comment. (XML comment start with
<!--
and end with
-->
can contain anything—including markup—except two
consecutive hyphens.) This helps to keep your code organized for your own use and makes it
easier to collaborate with others while working on XSLT for your projects. Your project
teammates need be able to read and understand your XSLT without feeling as if they’re
solving a puzzle.
To demonstrate the use of these guidelines in practice, we’ve traced through the steps of creating an XSLT to convert an XML file to HTML5. For this example we’ve used one of Anton Chekhov’s letters, which you may remember from your first XML assignments. We’ve marked up the letter as simple TEI-compliant XML:
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>To his Brother Mihail</title>
<author>Anton Chekhov</author>
</titleStmt>
<publicationStmt>
<publisher>Project Gutenberg</publisher>
</publicationStmt>
<sourceDesc>
<ab>This would be more thoroughly researched for a non-tutorial XML file</ab>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div>
<opener>
<dateline>
<name key="taganrog" type="place">TAGANROG</name>
<date when="1876-01-07">July 1, 1876.</date>
</dateline>
<salute>DEAR <name key="misha" type="person">BROTHER MISHA</name>,</salute>
</opener>
<p>I got your letter when I was fearfully bored and was sitting at the gate yawning,
and so you can judge how welcome that immense letter was. Your writing is good,
and in the whole letter I have not found one mistake in spelling. But one thing
I don't like: why do you style yourself "your worthless and insignificant
brother"? You recognize your insignificance? … Recognize it before God; perhaps,
too, in the presence of beauty, intelligence, nature, but not before men. Among
men you must be conscious of your dignity. Why, you are not a rascal, you are an
honest man, aren't you? Well, respect yourself as an honest man and know that an
honest man is not something worthless. Don't confound "being humble" with
"recognizing one's worthlessness." …</p>
<p>It is a good thing that you read. Acquire the habit of doing so. In time you will
come to value that habit. <name key="beecherStowe" type="person">Madame
Beecher-Stowe</name> has wrung tears from your eyes? I read her once, and
six months ago read her again with the object of studying her—and after reading
I had an unpleasant sensation which mortals feel after eating too many raisins
or currants…. Read <title key="donQ" type="lit">"Don Quixote."</title> It is a
fine thing. It is by <name key="cervantes" type="person">Cervantes</name>, who
is said to be almost on a level with <name key="shakespeare" type="person">Shakespeare</name>.
I advise my <name key="brothersChekhov" type="person">brothers</name> to read—if
they haven't already done so—<name key="turgenev" type="person">Turgenev's</name>
<title key="hamletAndDonQ" type="lit">"Hamlet and Don Quixote."</title> You
won't understand it, my dear. If you want to read a book of travel that won't
bore you, read <name key="goncharov" type="person">Gontcharov's</name>
<title key="frigatePallada" type="lit">"The Frigate Pallada."</title></p>
<p>… I am going to bring with me a boarder who will pay twenty roubles a month and
live under our general supervision. Though even twenty roubles is not enough if
one considers the price of food in <name key="moscow" type="place">Moscow</name>
and <name key="mamaChekhova" type="person">mother's</name> weakness for feeding
boarders with righteous zeal. <note>[Footnote: This letter was written by
<name key="chekhov" type="person">Chekhov</name> when he was in the fifth
class of the <name key="taganrog" type="place">Taganrog high school</name>.]</note>
</p>
</div>
</body>
</text>
</TEI>
Our first step is to create a new XSLT file and to verify that we are creating some output. We begin by creating a new XSLT file, adjusting the boilerplate information at the top to specify that we’ll be outputting HTML5, creating a template rule to match our document node, and applying templates. We can also add the most basic structural components of HTML. We’ve deliberately made an error here, and we’ll discuss below how to fix it.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml"
version="2.0">
<xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
<xsl:template match="/">
<html>
<head>
<title>Chekhov to Mihail</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
At this point we select our two files from the dropdowns in <oXygen/>’s XSLT debugger interface and run the transformation to make sure we are getting what we want. Sure enough, the stylesheet is reading the letter as input and outputting (still very basic) HTML.
Let’s begin by adding a template that will write HTML
<p>
tags around our paragraphs. To do this, we
create a new template that will match the TEI <p>
elements in the input and apply templates, creating an HTML
<p>
element in the output:
<xsl:template match="p">
<p><xsl:apply-templates/></p>
</xsl:template>
When we run the transformation, we expect our input paragraphs to be output with HTML
<p>
tags around them. Since that doesn’t happen, we
know that there’s a problem that the new template is revealing, and it turns out to be a
namespace error. Our first template matched the document node
(/
), which isn’t an element and isn’t in a namespace, and
now that we try and fail to match our first TEI element, we discover that we have forgotten
to include the @xpath-default-namespace
attribute of the
<xsl:stylesheet>
element. By testing our code piece
by piece, we’ve narrowed the places we need to check for the error. Note in this case that
the error isn’t in the new template, but the new template made it visible; we have to
recognize that the new template is our first attempt to match an element from the input
document, and that tips us off to look for a namespace error. After adding the attribute our
entire program now looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml"
xpath-default-namespace="http://www.tei-c.org/ns/1.0"
version="2.0">
<xsl:output method="xml" doctype-system="about:legacy-compat"/>
<xsl:template match="/">
<html>
<head>
<title>Chekhov to Mihail</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="p">
<p><xsl:apply-templates/></p>
</xsl:template>
</xsl:stylesheet>
When we rerun the transformation, our paragraphs are now wrapped properly in HTML
<p>
tags.
After verifying our code, we can move on to the next functionality. Our original XML file
contains many references to literary works and figures, which we’ll want to style somehow in
our HTML. To do this, we’ll need two new templates, one to match
<name>
elements, and another to match
<title>
elements, along the lines of:
<xsl:template match="name">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="title">
<xsl:apply-templates/>
</xsl:template>
So how do we verify that we’re finding and processing the
<name>
and
<title>
elements? As a quick check, we can wrap our
<xsl:apply-templates/>
tags in visual display tags
to italicize or embold or color or otherwise highlight our text. Although those tags
probably won’t be a part of our real output, we can use them as stubs to verify that we’re
matching correctly and that we need to write real functionality later. If we wrap the output
of processing <name>
elements in
<b>
tags and the output of processing
<title>
elements in
<i>
tags, we can see that templates are firing
properly in, for example, the following output snippet:
I advise my brothers to read—if they haven't already done so—Turgenev's "Hamlet and Don Quixote." You won't understand it, my dear. If you want to read a book of travel that won't bore you, read Gontcharov's "The Frigate Pallada."
With that confirmed, this would also be a good time to add some documentation to remind
ourselves that we’ll need to edit these later. We can start working with our
<name>
template immediately, so we can remove the
<b>
tags from there, and so that we don’t forget
that we also need to add real functionality for processing titles, we can add a comment, so
that the code block will now look something like:
<xsl:template match="name">
<xsl:apply-templates/>
</xsl:template>
<!-- To do: format titles -->
<xsl:template match="title">
<i><xsl:apply-templates/></i>
</xsl:template>
Now that we’ve removed our placeholder <b>
tags,
let’s turn our <name>
elements into
<span>
tags by wrapping our
<xsl:apply-templates/>
in
<span>
tags, and to have some more specificity,
let’s preserve their original @type
attribute value as the
value of the HTML @class
attribute. Our new template rule
looks like:
<xsl:template match="name">
<span class="@type"><xsl:apply-templates/></span>
</xsl:template>
Oops! Running the transformation again shows that though our
<span>
elements and being created properly, the
value of the @class
attribute is the literal string
@type
, instead of the value of the @type
attribute inside the TEI tags. Since we’re using an attribute value
template (AVT), we can fix the error by wrapping curly braces
({ }
) around the XPath expression, along the lines of:
<xsl:template match="name">
<span class="{@type}"><xsl:apply-templates/></span>
</xsl:template>
Now that we’re getting good output, we add a quick comment above this block of code documenting what it does:
<!-- Convert <name> tags to spans, preserving their @type value -->
<xsl:template match="name">
<span class="{@type}"><xsl:apply-templates/></span>
</xsl:template>
Returning to our <title>
tags, we can replace the
<i>
tag with
<cite>
elements, which provide a richer semantic
meaning, and adjust our comment accordingly, to get something like this:
<!-- Add <cite> tags around titles -->
<xsl:template match="title">
<cite>
<xsl:apply-templates/>
</cite>
</xsl:template>
Finally, for practice, let’s make a list of all of the individuals referenced or addressed
in the letter using a modal template rule for our
<name>
elements. We can begin as in the Modal XSLT tutorial by creating an
<h2>
header and an
<xsl:apply-templates/>
tag with a
@mode
attribute value of toc
inside a
<ul>
element in the body of the document, telling it
to select all <name>
elements with a
@type
attribute that has the value person
:
<body>
<h2>Referenced individuals:</h2>
<ul>
<xsl:apply-templates select="//name[@type='person']" mode="toc"/>
</ul>
<xsl:apply-templates/>
</body>
When we run it, our program spits out all of the names of people at the top of the page
inside of a set of <ul>
tags, so we know that we’re
selecting them correctly, and we can move on to formatting them. We create a new template
rule with @match="name"
and
@mode="toc"
attributes to correspond to the
<xsl:apply-templates/>
element above. Inside this
rule, we create an <li>
element for each name in our
list and apply templates inside it:
<xsl:template match="name" mode="toc">
<li>
<xsl:apply-templates/>
</li>
</xsl:template>
The output will look like:
<h2>Referenced individuals:</h2>
<ul>
<li>BROTHER MISHA</li>
<li>Madame Beecher-Stowe</li>
<li>Cervantes</li>
<li>Shakespeare</li>
<li>brothers</li>
<li>Turgenev's</li>
<li>Gontcharov's</li>
<li>mother's</li>
<li>Chekhov</li>
</ul>
This is close to what we want, but we can improve it. For consistency, let’s convert all of
the names to all lower-case characters (in Real Life we would capitalize just the first
letter of each part of a person’s title and name, and we’d get rid of the possessive endings
where they occur). We can do this by replacing the
<xsl:apply-templates/>
tag in our modal template
rule with <xsl:value-of select="lower-case(.)"/>
.
Let’s also sort the list by adding an
<xsl:sort select="lower-case(.)">
element inside the
<xsl:apply-templates>
rule in the body. We can now
run the code again to get a more attractive list in alphabetical order.
We can now take stock of our entire stylesheet, which appears something like below (a few extra features not discussed in the tutorial have been added for formatting purposes):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml" xpath-default-namespace="http://www.tei-c.org/ns/1.0"
version="2.0">
<xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<html>
<head>
<title>Chekhov to Mihail</title>
</head>
<body>
<h2>Referenced Individuals:</h2>
<ul>
<!-- Modal template to create a sorted list of individuals referenced in the letter -->
<xsl:apply-templates select="//name[@type = 'person']" mode="toc">
<xsl:sort select="lower-case(.)"/>
</xsl:apply-templates>
</ul>
<h2>Contents</h2>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<!-- Convert metadata to <p> elements -->
<xsl:template match="titleStmt/title">
<p>
<cite>
<xsl:apply-templates/>
</cite>
</p>
</xsl:template>
<xsl:template match="author">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="publisher">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="ab"/>
<xsl:template match="dateline">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<xsl:template match="salute">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<!-- Preserve paragraphs -->
<xsl:template match="p">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
<!-- Convert <name >tags to spans, preserving their @type attribute values -->
<xsl:template match="name">
<span class="{@type}">
<xsl:apply-templates/>
</span>
</xsl:template>
<!-- Tag titles as <cite> element -->
<xsl:template match="title">
<cite>
<xsl:apply-templates/>
</cite>
</xsl:template>
<!-- Convert name to lower case for list of referenced persons -->
<xsl:template match="name" mode="toc">
<li>
<xsl:value-of select="lower-case(.)"/>
</li>
</xsl:template>
</xsl:stylesheet>
Our output is as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
SYSTEM "about:legacy-compat">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Chekhov to Mihail</title>
</head>
<body>
<h2>Referenced Individuals:</h2>
<ul>
<li>brother misha</li>
<li>brothers</li>
<li>cervantes</li>
<li>chekhov</li>
<li>gontcharov's</li>
<li>madame
beecher-stowe</li>
<li>mother's</li>
<li>shakespeare</li>
<li>turgenev's</li>
</ul>
<h2>Contents</h2>
<p>
<cite>To his Brother Mihail</cite>
</p>
<p>Anton Chekhov</p>
<p>Project Gutenberg</p>
<p>
<span class="place">TAGANROG</span>July 1, 1876.</p>
<p>DEAR <span class="person">BROTHER MISHA</span>,</p>
<p>I got your letter when I was fearfully bored and was sitting at the gate yawning,
and so you can judge how welcome that immense letter was. Your writing is good,
and in the whole letter I have not found one mistake in spelling. But one thing
I don't like: why do you style yourself "your worthless and insignificant
brother"? You recognize your insignificance? … Recognize it before God; perhaps,
too, in the presence of beauty, intelligence, nature, but not before men. Among
men you must be conscious of your dignity. Why, you are not a rascal, you are an
honest man, aren't you? Well, respect yourself as an honest man and know that an
honest man is not something worthless. Don't confound "being humble" with
"recognizing one's worthlessness." …</p>
<p>It is a good thing that you read. Acquire the habit of doing so. In time you will
come to value that habit. <span class="person">Madame
Beecher-Stowe</span> has wrung tears from your eyes? I read her once, and
six months ago read her again with the object of studying her—and after reading
I had an unpleasant sensation which mortals feel after eating too many raisins
or currants…. Read <cite>"Don Quixote."</cite> It is a
fine thing. It is by <span class="person">Cervantes</span>, who
is said to be almost on a level with <span class="person">Shakespeare</span>.
I advise my <span class="person">brothers</span> to read—if they haven't already done
so—<span class="person">Turgenev's</span> <cite>"Hamlet and Don Quixote."</cite> You
won't understand it, my dear. If you want to read a book of travel that won't
bore you, read <span class="person">Gontcharov's</span>
<cite>"The Frigate Pallada."</cite>
</p>
<p>… I am going to bring with me a boarder who will pay twenty roubles a month and
live under our general supervision. Though even twenty roubles is not enough if
one considers the price of food in <span class="place">Moscow</span>
and <span class="person">mother's</span> weakness for feeding
boarders with righteous zeal. [Footnote: This letter was written by
<span class="person">Chekhov</span> when he was in the fifth
class of the <span class="place">Taganrog high school</span>.]</p>
</body>
</html>
The errors that we made deliberately in this tutorial are similar to those that we make by accident during real development. By building one small component at a time and testing frequently, we were able to find and correct our errors quickly. Testing frequently may seem like extra work, but that’s true only if you never make a mistake, and in our experience, your development process will be more robust and productive if you 1) make sure you can always read input and write output; 2) add functionality one step at a time, developing and testing in small cycles; 3) use stubs; and 4) document your code.