Digital humanities


Author: Andrew Nitz (acn23@pitt.edu) Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2016-03-05T19:31:02+0000


Developing an XSLT Stylesheet

Introduction

While developing an XSLT stylesheet, we observe the following guidelines, which keep our overall goals in mind as develop working components piece by piece. This helps us to write more concise and accurate code, while also reducing time spent troubleshooting. We recommend re-reading this page as a reminder before beginning to write an XSLT stylesheet until you feel that you’ve internalized the model.

Input in, output out

Start by making sure that your XSLT can read your input file and generate output, even if the output is just a placeholder. As you continue to run your transformation each time you add a new step, it should continue to read input and generate output (even if the output continues to include placeholder text). Any time you can’t read input and generate output, fix the problem before you do anything else. It’s easier to fix a problem as soon as you learn about it than to try to track it down in an ocean of new code.

Add functionality one step at a time

This is the most important guideline for avoiding tedious, confusing, and unproductive troubleshooting. When writing XSLT, you should add one bit of functionality at a time, and then run the transformation to verify that the new code works before moving on to the next step. You need to watch out for two types of problems:

  1. Your new code doesn’t do what you think it should do.
  2. Something unrelated to the new code that used to work stops working when you add the new code. This is called a regression.

The point of the coding, testing, and debugging in small cycles is that it’s easiest to find and fix mistakes when you’ve written only a few new lines of code since the last cycle. For example, if you are trying to apply templates to all <p> tags in an XML document and format them in some way, it’s a good idea first, before you think about applying the formatting, to ensure that you are actually finding the <p> tags correctly. You can do that by writing a template that matches <p> elements and just outputs some placeholder text, and once that works, you can replace the placeholder with more refined code that processes them the way you want. And when you do that formatting, you need to test each feature as you add it, instead of writing the entire block and checking only then. If you do the latter and something doesn’t work, you’ll have set yourself up for painfully confusing debugging.

Use stubs

A stub in coding terminology is a snippet of code used to stand in for something that will be developed later—basically a placeholder for functionality that has not been written yet. You want to be coding only one piece of functionality at a time, but sometimes you’ll want to use stubs to help keep your overall goals in mind while working in different sections. For example, if you have a template that will eventually output a table of contents, initially it might just output plain text that says Table of contents to go here. This lets you verify, in the output, that you’re calling the template and it’s returning output.

Document your code

Unless the purpose of a piece of code is self-documenting (obvious, self-explanatory), describe its purpose inside an XML comment. (XML comment start with <!-- and end with --> can contain anything—including markup—except two consecutive hyphens.) This helps to keep your code organized for your own use and makes it easier to collaborate with others while working on XSLT for your projects. Your project teammates need be able to read and understand your XSLT without feeling as if they’re solving a puzzle.

In practice

To demonstrate the use of these guidelines in practice, we’ve traced through the steps of creating an XSLT to convert an XML file to HTML5. For this example we’ve used one of Anton Chekhov’s letters, which you may remember from your first XML assignments. We’ve marked up the letter as simple TEI-compliant XML:

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>To his Brother Mihail</title>
        <author>Anton Chekhov</author>
      </titleStmt>
      <publicationStmt>
        <publisher>Project Gutenberg</publisher>
      </publicationStmt>
      <sourceDesc>
        <ab>This would be more thoroughly researched for a non-tutorial XML file</ab>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <body>
      <div>
        <opener>
          <dateline>
            <name key="taganrog" type="place">TAGANROG</name>
            <date when="1876-01-07">July 1, 1876.</date>
          </dateline>
          <salute>DEAR <name key="misha" type="person">BROTHER MISHA</name>,</salute>
        </opener>
        <p>I got your letter when I was fearfully bored and was sitting at the gate yawning,
          and so you can judge how welcome that immense letter was. Your writing is good,
          and in the whole letter I have not found one mistake in spelling. But one thing
          I don't like: why do you style yourself "your worthless and insignificant
          brother"? You recognize your insignificance? … Recognize it before God; perhaps,
          too, in the presence of beauty, intelligence, nature, but not before men. Among
          men you must be conscious of your dignity. Why, you are not a rascal, you are an
          honest man, aren't you? Well, respect yourself as an honest man and know that an
          honest man is not something worthless. Don't confound "being humble" with
          "recognizing one's worthlessness." …</p>
          
        <p>It is a good thing that you read. Acquire the habit of doing so. In time you will
          come to value that habit. <name key="beecherStowe" type="person">Madame
            Beecher-Stowe</name> has wrung tears from your eyes? I read her once, and
          six months ago read her again with the object of studying her—and after reading
          I had an unpleasant sensation which mortals feel after eating too many raisins
          or currants…. Read <title key="donQ" type="lit">"Don Quixote."</title> It is a
          fine thing. It is by <name key="cervantes" type="person">Cervantes</name>, who
          is said to be almost on a level with <name key="shakespeare" type="person">Shakespeare</name>.
          I advise my <name key="brothersChekhov" type="person">brothers</name> to read—if 
          they haven't already done so—<name key="turgenev" type="person">Turgenev's</name>
          <title key="hamletAndDonQ" type="lit">"Hamlet and Don Quixote."</title> You
          won't understand it, my dear. If you want to read a book of travel that won't
          bore you, read <name key="goncharov" type="person">Gontcharov's</name>
          <title key="frigatePallada" type="lit">"The Frigate Pallada."</title></p>
          
        <p>… I am going to bring with me a boarder who will pay twenty roubles a month and
          live under our general supervision. Though even twenty roubles is not enough if
          one considers the price of food in <name key="moscow" type="place">Moscow</name>
          and <name key="mamaChekhova" type="person">mother's</name> weakness for feeding
          boarders with righteous zeal. <note>[Footnote: This letter was written by 
          <name key="chekhov" type="person">Chekhov</name> when he was in the fifth
          class of the <name key="taganrog" type="place">Taganrog high school</name>.]</note>
        </p>
      </div>
    </body>
  </text>
</TEI>

How to begin

Our first step is to create a new XSLT file and to verify that we are creating some output. We begin by creating a new XSLT file, adjusting the boilerplate information at the top to specify that we’ll be outputting HTML5, creating a template rule to match our document node, and applying templates. We can also add the most basic structural components of HTML. We’ve deliberately made an error here, and we’ll discuss below how to fix it.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml"
  version="2.0">
  <xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>  
  <xsl:template match="/">
    <html>
      <head>
        <title>Chekhov to Mihail</title>
      </head>
      <body>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>

At this point we select our two files from the dropdowns in <oXygen/>’s XSLT debugger interface and run the transformation to make sure we are getting what we want. Sure enough, the stylesheet is reading the letter as input and outputting (still very basic) HTML.

Beginning to add functionality

Let’s begin by adding a template that will write HTML <p> tags around our paragraphs. To do this, we create a new template that will match the TEI <p> elements in the input and apply templates, creating an HTML <p> element in the output:

<xsl:template match="p">
  <p><xsl:apply-templates/></p>
</xsl:template>

When we run the transformation, we expect our input paragraphs to be output with HTML <p> tags around them. Since that doesn’t happen, we know that there’s a problem that the new template is revealing, and it turns out to be a namespace error. Our first template matched the document node (/), which isn’t an element and isn’t in a namespace, and now that we try and fail to match our first TEI element, we discover that we have forgotten to include the @xpath-default-namespace attribute of the <xsl:stylesheet> element. By testing our code piece by piece, we’ve narrowed the places we need to check for the error. Note in this case that the error isn’t in the new template, but the new template made it visible; we have to recognize that the new template is our first attempt to match an element from the input document, and that tips us off to look for a namespace error. After adding the attribute our entire program now looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml"
  xpath-default-namespace="http://www.tei-c.org/ns/1.0"
  version="2.0">
  <xsl:output method="xml" doctype-system="about:legacy-compat"/>  
  <xsl:template match="/">
    <html>
      <head>
        <title>Chekhov to Mihail</title>
      </head>
      <body>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
  <xsl:template match="p">
    <p><xsl:apply-templates/></p>
  </xsl:template>
</xsl:stylesheet>

When we rerun the transformation, our paragraphs are now wrapped properly in HTML <p> tags.

Continuing the development

After verifying our code, we can move on to the next functionality. Our original XML file contains many references to literary works and figures, which we’ll want to style somehow in our HTML. To do this, we’ll need two new templates, one to match <name> elements, and another to match <title> elements, along the lines of:

<xsl:template match="name">
  <xsl:apply-templates/>
</xsl:template>
<xsl:template match="title">
  <xsl:apply-templates/>
</xsl:template>

So how do we verify that we’re finding and processing the <name> and <title> elements? As a quick check, we can wrap our <xsl:apply-templates/> tags in visual display tags to italicize or embold or color or otherwise highlight our text. Although those tags probably won’t be a part of our real output, we can use them as stubs to verify that we’re matching correctly and that we need to write real functionality later. If we wrap the output of processing <name> elements in <b> tags and the output of processing <title> elements in <i> tags, we can see that templates are firing properly in, for example, the following output snippet:

I advise my brothers to read—if they haven't already done so—Turgenev's "Hamlet and Don 
Quixote." You won't understand it, my dear. If you want to read a book of travel that won't
bore you, read Gontcharov's "The Frigate Pallada."

With that confirmed, this would also be a good time to add some documentation to remind ourselves that we’ll need to edit these later. We can start working with our <name> template immediately, so we can remove the <b> tags from there, and so that we don’t forget that we also need to add real functionality for processing titles, we can add a comment, so that the code block will now look something like:

<xsl:template match="name">
  <xsl:apply-templates/>
</xsl:template>
<!-- To do: format titles -->
<xsl:template match="title">
  <i><xsl:apply-templates/></i>
</xsl:template>

Now that we’ve removed our placeholder <b> tags, let’s turn our <name> elements into <span> tags by wrapping our <xsl:apply-templates/> in <span> tags, and to have some more specificity, let’s preserve their original @type attribute value as the value of the HTML @class attribute. Our new template rule looks like:

<xsl:template match="name">
  <span class="@type"><xsl:apply-templates/></span>
</xsl:template>

Oops! Running the transformation again shows that though our <span> elements and being created properly, the value of the @class attribute is the literal string @type, instead of the value of the @type attribute inside the TEI tags. Since we’re using an attribute value template (AVT), we can fix the error by wrapping curly braces ({ }) around the XPath expression, along the lines of:

<xsl:template match="name">
  <span class="{@type}"><xsl:apply-templates/></span>
</xsl:template>

Now that we’re getting good output, we add a quick comment above this block of code documenting what it does:

<!-- Convert <name> tags to spans, preserving their @type value -->        
<xsl:template match="name">
  <span class="{@type}"><xsl:apply-templates/></span>
</xsl:template>

Returning to our <title> tags, we can replace the <i> tag with <cite> elements, which provide a richer semantic meaning, and adjust our comment accordingly, to get something like this:

<!-- Add <cite> tags around titles -->
<xsl:template match="title">
    <cite>
        <xsl:apply-templates/>
    </cite>
</xsl:template>

More practice

Finally, for practice, let’s make a list of all of the individuals referenced or addressed in the letter using a modal template rule for our <name> elements. We can begin as in the Modal XSLT tutorial by creating an <h2> header and an <xsl:apply-templates/> tag with a @mode attribute value of toc inside a <ul> element in the body of the document, telling it to select all <name> elements with a @type attribute that has the value person:

<body>
  <h2>Referenced individuals:</h2>
  <ul>
    <xsl:apply-templates select="//name[@type='person']" mode="toc"/>
  </ul>
  <xsl:apply-templates/>
</body>

When we run it, our program spits out all of the names of people at the top of the page inside of a set of <ul> tags, so we know that we’re selecting them correctly, and we can move on to formatting them. We create a new template rule with @match="name" and @mode="toc" attributes to correspond to the <xsl:apply-templates/> element above. Inside this rule, we create an <li> element for each name in our list and apply templates inside it:

<xsl:template match="name" mode="toc">
  <li>
    <xsl:apply-templates/>
  </li>
</xsl:template>

The output will look like:

<h2>Referenced individuals:</h2>
<ul>
  <li>BROTHER MISHA</li>
  <li>Madame Beecher-Stowe</li>
  <li>Cervantes</li>
  <li>Shakespeare</li>
  <li>brothers</li>
  <li>Turgenev's</li>
  <li>Gontcharov's</li>
  <li>mother's</li>
  <li>Chekhov</li>
</ul>

This is close to what we want, but we can improve it. For consistency, let’s convert all of the names to all lower-case characters (in Real Life we would capitalize just the first letter of each part of a person’s title and name, and we’d get rid of the possessive endings where they occur). We can do this by replacing the <xsl:apply-templates/> tag in our modal template rule with <xsl:value-of select="lower-case(.)"/>. Let’s also sort the list by adding an <xsl:sort select="lower-case(.)"> element inside the <xsl:apply-templates> rule in the body. We can now run the code again to get a more attractive list in alphabetical order.

We can now take stock of our entire stylesheet, which appears something like below (a few extra features not discussed in the tutorial have been added for formatting purposes):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/1999/xhtml" xpath-default-namespace="http://www.tei-c.org/ns/1.0"
  version="2.0">
  <xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
  <xsl:strip-space elements="*"/>
  <xsl:template match="/">
    <html>
      <head>
        <title>Chekhov to Mihail</title>
      </head>
      <body>
        <h2>Referenced Individuals:</h2>
        <ul>
          <!-- Modal template to create a sorted list of individuals referenced in the letter -->
          <xsl:apply-templates select="//name[@type = 'person']" mode="toc">
            <xsl:sort select="lower-case(.)"/>
          </xsl:apply-templates>
        </ul>
        <h2>Contents</h2>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>
  <!-- Convert metadata to <p> elements -->
  <xsl:template match="titleStmt/title">
    <p>
      <cite>
        <xsl:apply-templates/>
      </cite>
    </p>
  </xsl:template>
  <xsl:template match="author">
    <p>
      <xsl:apply-templates/>
    </p>
  </xsl:template>
  <xsl:template match="publisher">
    <p>
      <xsl:apply-templates/>
    </p>
  </xsl:template>
  <xsl:template match="ab"/>
  <xsl:template match="dateline">
    <p>
      <xsl:apply-templates/>
    </p>
  </xsl:template>
  <xsl:template match="salute">
    <p>
      <xsl:apply-templates/>
    </p>
  </xsl:template>
  <!-- Preserve paragraphs -->
  <xsl:template match="p">
    <p>
      <xsl:apply-templates/>
    </p>
  </xsl:template>
  <!-- Convert <name >tags to spans, preserving their @type attribute values -->
  <xsl:template match="name">
    <span class="{@type}">
      <xsl:apply-templates/>
    </span>
  </xsl:template>
  <!-- Tag titles as <cite> element -->
  <xsl:template match="title">
    <cite>
      <xsl:apply-templates/>
    </cite>
  </xsl:template>
  <!-- Convert name to lower case for list of referenced persons -->
  <xsl:template match="name" mode="toc">
    <li>
      <xsl:value-of select="lower-case(.)"/>
    </li>
  </xsl:template>
</xsl:stylesheet>

Our output is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
  SYSTEM "about:legacy-compat">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Chekhov to Mihail</title>
    </head>
    <body>
        <h2>Referenced Individuals:</h2>
        <ul>
            <li>brother misha</li>
            <li>brothers</li>
            <li>cervantes</li>
            <li>chekhov</li>
            <li>gontcharov's</li>
            <li>madame
                beecher-stowe</li>
            <li>mother's</li>
            <li>shakespeare</li>
            <li>turgenev's</li>
        </ul>
        <h2>Contents</h2>
        <p>
            <cite>To his Brother Mihail</cite>
        </p>
        <p>Anton Chekhov</p>
        <p>Project Gutenberg</p>
        <p>
            <span class="place">TAGANROG</span>July 1, 1876.</p>
        <p>DEAR <span class="person">BROTHER MISHA</span>,</p>
        <p>I got your letter when I was fearfully bored and was sitting at the gate yawning,
            and so you can judge how welcome that immense letter was. Your writing is good,
            and in the whole letter I have not found one mistake in spelling. But one thing
            I don't like: why do you style yourself "your worthless and insignificant
            brother"? You recognize your insignificance? … Recognize it before God; perhaps,
            too, in the presence of beauty, intelligence, nature, but not before men. Among
            men you must be conscious of your dignity. Why, you are not a rascal, you are an
            honest man, aren't you? Well, respect yourself as an honest man and know that an
            honest man is not something worthless. Don't confound "being humble" with
            "recognizing one's worthlessness." …</p>
        <p>It is a good thing that you read. Acquire the habit of doing so. In time you will
            come to value that habit. <span class="person">Madame
                Beecher-Stowe</span> has wrung tears from your eyes? I read her once, and
            six months ago read her again with the object of studying her—and after reading
            I had an unpleasant sensation which mortals feel after eating too many raisins
            or currants…. Read <cite>"Don Quixote."</cite> It is a
            fine thing. It is by <span class="person">Cervantes</span>, who
            is said to be almost on a level with <span class="person">Shakespeare</span>. 
            I advise my <span class="person">brothers</span> to read—if they haven't already done 
            so—<span class="person">Turgenev's</span> <cite>"Hamlet and Don Quixote."</cite> You
            won't understand it, my dear. If you want to read a book of travel that won't
            bore you, read <span class="person">Gontcharov's</span>
            <cite>"The Frigate Pallada."</cite>
        </p>
        <p>… I am going to bring with me a boarder who will pay twenty roubles a month and
            live under our general supervision. Though even twenty roubles is not enough if
            one considers the price of food in <span class="place">Moscow</span>
            and <span class="person">mother's</span> weakness for feeding
            boarders with righteous zeal. [Footnote: This letter was written by 
            <span class="person">Chekhov</span> when he was in the fifth
            class of the <span class="place">Taganrog high school</span>.]</p>
    </body>
</html>

In conclusion

The errors that we made deliberately in this tutorial are similar to those that we make by accident during real development. By building one small component at a time and testing frequently, we were able to find and correct our errors quickly. Testing frequently may seem like extra work, but that’s true only if you never make a mistake, and in our experience, your development process will be more robust and productive if you 1) make sure you can always read input and write output; 2) add functionality one step at a time, developing and testing in small cycles; 3) use stubs; and 4) document your code.