Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2016-10-11T21:14:50+0000


XSLT assignment #1

The assignment

Your assignment is to create an XSLT stylesheet that will transform Bad Hamlet into a hierarchical outline of the titles of acts and scenes in HTML. This isn’t very interesting on its own, of course, but if you were transforming the entire document into HTML for publication on the web, this might serve as the skeleton. It might also stand on its own as a table of contents at the top of such a publication, so that the reader could click on the title of a scene to jump to that location in the file.

If you’re feeling adventurous, you’re welcome to include more information, whether of a publication-oriented sort (e.g., speakers, speeches, stage directions, etc., as if you were publishing the entire play) or as a foray into exploration and analysis (e.g., list of characters who speak in each scene, perhaps with a count of their speeches, length of speeches, etc). The only required content of your homework, though, is the HTML outline of act and title chapters, which might look something like:

The underlying HTML, which we generated using XSLT, is:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <title>Hamlet</title>
   </head>
   <body>
      <ul>
         <li>Act 1
            <ul>
               <li>Act 1, Scene 1</li>
               <li>Act 1, Scene 2</li>
               <li>Act 1, Scene 3</li>
               <li>Act 1, Scene 4</li>
               <li>Act 1, Scene 5</li>
            </ul>
         </li>
         <li>Act 2
            <ul>
               <li>Act 2, Scene 1</li>
               <li>Act 2, Scene 2</li>
            </ul>
         </li>
         <li>Act 3
            <ul>
               <li>Act 3, Scene 1</li>
               <li>Act 3, Scene 2</li>
               <li>Act 3, Scene 3</li>
               <li>Act 3, Scene 4</li>
            </ul>
         </li>
         <li>Act 4
            <ul>
               <li>Act 4, Scene 1</li>
               <li>Act 4, Scene 2</li>
               <li>Act 4, Scene 3</li>
               <li>Act 4, Scene 4</li>
               <li>Act 4, Scene 5</li>
               <li>Act 4, Scene 6</li>
               <li>Act 4, Scene 7</li>
            </ul>
         </li>
         <li>Act 5
            <ul>
               <li>Act 5, Scene 1</li>
               <li>Act 5, Scene 2</li>
            </ul>
         </li>
      </ul>
   </body>
</html>

We’ve used HTML unordered lists (<ul>) elements. The only content allowed inside a <ul> element is list items (<li>), and we’ve nested them, so that each each list item that represents an act contains the title of that act followed by an embedded <ul> that contains, in turn, a separate list item for the title of each scene. This isn’t the only way to format this type of outline and you’re welcome to take a different approach. For example, if you’d like to include the full text of the play, that is, the stage directions and speeches, the embedded list format isn’t really appropriate. In that case, we would use the HTML header elements (<h1> through <h6>) to create hierarchical headers. You can read more about HTML lists and headers at http://www.w3schools.com.

Before you begin

Bad Hamlet is in the TEI namespace, which means that your XSLT stylesheet must include an instruction specifying that when it tries to match elements, it needs to match them in that namespace. When you create a new XSLT document in <oXygen/> it won’t contain that instruction, so you need to add it (in blue below). To ensure that the output would be in the XHTML namespace, we added a default namespace declaration (in fuchsia below). To output the required DOCTYPE declaration, we also created <xsl:output> element as the first child of our root <xsl:stylesheet> element (in green below). Our modified skeleton looks like the following:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" 
    xpath-default-namespace="http://www.tei-c.org/ns/1.0"
    xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" doctype-system="about:legacy-compat"/>
    
</xsl:stylesheet>

Guide to approaching the problem

Our XSLT transformation (after all this housekeeping) has three template rules:

  1. We have a template rule for the document node (<xsl:template match="/">), in which we create the basic HTML output: the <html> element, <head> and its contents, and <body>. Inside the <body> element that we’re creating, we use <xsl:apply-templates> and select the acts (using an XPath expression as the value of the @select attribute).
  2. We have a separate template rule that matches acts, so it will be invoked as a result of the preceding <xsl:apply-templates> instruction, and will fire once for each act. Inside that template rule we create a new list item (<li>) for the act being processed and inside the tags for that new list item we do two things. First, we apply templates to the <head> for the act, which will eventually cause its title to be output. Second, we create wrapper <ul> tags for the nested list that will contain the titles of the scenes. Inside that new <ul> element, we use an <xsl:apply-templates> rule to apply templates to (that is, to process) the scenes of that act.
  3. We have a separate template rule that matches scenes, and that just applies templates to the <head> element in each scene, which ultimately causes the textual content of the <head> element to be output. This rule will fire once for each scene in the play, and it will be called separately for the scenes of each act, so that the scenes will be rendered properly under their acts.

We don’t need a template rule for the <head> elements themselves because the built-in (default) template rule in XSLT for an element that doesn’t have an explicit, specified rule is just to apply templates to its children. The only child of the <head> elements is a text node, and the built-in rule for text nodes is to output them literally. In other words, if you apply templates to <head> and you don’t have a template rule that matches that element, ultimately the transformation will just output the textual content of the head, that is, the title that you want.

Important