Digital humanities

Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2021-12-27T22:03:43+0000

Count speeches per act in Hamlet

The task

In class we began to develop an XSLT stylesheet to convert an XML version of Hamlet into a table listing the number of speeches in each act in the play.

In order to create an HTML table, you need to know that a table in HTML is a <table> element that contains one <tr> (table row) element for each row of the table. Each cell in the row is a <td> (table data) element for regular rows and a <th> (table header) element for the header row. We specify that we want a thin border around each cell in the table by creating a @border attribute on the <table> element and setting its value to 1. (This isn’t the best way to specify this formatting feature, and in Real Life we would use CSS. We’ve taken a shortcut here to avoid the overhead of introducing CSS at a time when we want you to concentrate on learning XSLT.) You can read more about HTML tables at http://www.w3schoos.com/html/html_tables.asp.

The desired output will look like:

Act	Speeches
Act 1	251
Act 2	201
Act 3	249
Act 4	179
Act 5	257

and the underlying raw HTML looks like:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      <title>Speeches per act in Hamlet</title>
   </head>
   <body>
      <table border="1">
         <tr>
            <th>Act</th>
            <th>Speeches</th>
         </tr>
         <tr>
            <td>Act 1</td>
            <td>251</td>
         </tr>
         <tr>
            <td>Act 2</td>
            <td>201</td>
         </tr>
         <tr>
            <td>Act 3</td>
            <td>249</td>
         </tr>
         <tr>
            <td>Act 4</td>
            <td>179</td>
         </tr>
         <tr>
            <td>Act 5</td>
            <td>257</td>
         </tr>
      </table>
   </body>
</html>

Here is the completed XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
  xpath-default-namespace="http://www.tei-c.org/ns/1.0" xmlns="http://www.w3.org/1999/xhtml">
  <xsl:output method="xhtml" indent="yes"/>
  <xsl:template match="/">
    <html>
      <head>
        <title>Speeches per act in Hamlet</title>
      </head>
      <body>
        <table border="1">
          <tr>
            <th>Act</th>
            <th>Speeches</th>
          </tr>
          <xsl:apply-templates select="//body/div"/>
        </table>
      </body>
    </html>
  </xsl:template>
  <xsl:template match="div">
    <tr>
      <td>
        <xsl:apply-templates select="head"/>
      </td>
      <td>
        <xsl:value-of select="count(.//sp)"/>
      </td>
    </tr>
  </xsl:template>
</xsl:stylesheet>

The superstructure for creating an XSLT stylesheet to transform a TEI document, such as our version of Hamlet, into XHTML, is described in XSLT Assignment #1. Once that’s all in place here, the heavy lifting is done by two template rules, one that matches the document node (/) and one that matches acts (div).

XSLT transformation always begin at the document node, so the template for that is the first to fire. It builds the HTML document with the <table> element inside the <body>, and inside the <table> tags it creates the header row. In the eventual output, you want to insert one table row for each act in the play, so you tell the stylesheet where to put those rows by inserting an instruction in that place, just below the header row, that reads <xsl:apply-templates select="//body/div"/>. The <xsl:apply-templates instruction means go look for a template rule to take care of whatever I’m selecting here, and the @select attribute selects the acts (<div> elements directly under the <body>). Where the rows for the acts will be created depends on where you put this instruction, so you need to put it where you want those rows to appear.

<xsl:apply-templates> tells the system to round up everything specified by the value of the @select attribute, which in this case is the five acts, and then process them. The stylesheet processes them by looking for a template that knows what to do with them. Since we have a template that knows what to do with a <div> element (it’s the template that says match="div"), it will do the processing. That template rule will fire five times, once for each of the acts that were collected and passed to it by the <xsl:apply-templates> element in the template rule above.

The template rule for <div> elements fires five times, once for each act, and creates a table row for that act. Inside the row it creates two cells. The first cell holds the title of the act, which the system retrieves by applying templates to (that is, processing) the <head> child of the act, with is the title of the act. The second cell holds a count of the speeches in that act. The current context (for XPath purposes) for a template rule is the element that was used to call the template. That means that each time this template rule for an act fires, the current context will be the particular act that is being processed. This means that <xsl:apply-templates select="head"> will process on the <head> child of that particular act (a different act each time the rule fires), and in the count(.//sp) function that dot (.) refers to the current context, that is, the current act, and therefore retrieves only the speeches that are descendants of that individual act.

Note that we apply templates to <head> elements, but we don’t have a template rule that matches those elements. XSLT has a built-in rule that says that if you’re applying templates to an element that contains only plain text and there’s no explicit template rule, by default you just output the text. Since that’s just what we want to do with <head> elements, we don’t have to write a rule to do it for us, and we can just rely on the built-in behavior.

How come the template rule for <div> elements doesn’t process scenes? It would match scenes because it matches any <div> elements, but it never even knows there are any scenes because the program flow makes sure that it never sees them. The program grabs control with the template for the document node, and specifies that only acts should be processed. The template for <div> elements processes acts, but it never touches the scenes. How would you modify this stylesheet to count by scene, and not just by act?