Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2013-10-05T19:48:20+0000


Finding the characters in Shakespearean plays

The problem

You want to generate a list of characters in the three Shakespearean plays available at in the XML database running on Obdurodon. The output should be in HTML, and for each play it should give the title of the play followed by an alphabetized list of characters.

A simple solution

for $play in collection('/db/shakespeare/plays')//PLAY
return
    <div>
        <h2>{$play/TITLE/string()}</h2>
        <ul>{
            for $char in $play//PERSONA
            order by $char
            return
                <li>{$char/string()}</li>
        }</ul>
    </div>

The first line uses the collection() function to create a sequence of three <PLAY> elements, one for each of the three plays in the corpus, and it sets each of them in turn to the variable $play. For each of those three values, it returns an HTML <div> element, which contains the title of the play and the list of characters. The title of the play in the original XML is found in the <TITLE> child element of the root <PLAY> element, that is, at /PLAY/TITLE. Since we’ve set the variable $play equal to the <PLAY> element of each play, the path expression $play/TITLE retrieves the <TITLE> element of the play we’re looking at at the moment. We add /string() to the path expression to convert the <TITLE> element to its string value, so that we’ll be retrieving just the title text, without the start tag and end tag (which would be invalid in the HTML output).

After we output the title, we create an unordered list to hold the cast of characters. We put another FLWOR expression inside the <ul> tags, and we have to wrap the FLWOR in curly braces, so that the expression will be interpreted as XQuery (otherwise we would output the literal FLWOR expression, rather than the result of evaluating it; try removing the curly braces and running the XQuery to see the difference). Since the variable $play is still equal to the play that is of interest to us at the moment, we look on its descendant axis and retrieve all of the characters, which are encoded as <PERSONA> elements. We sort those alphabetically with an order by statement and then return each one, wrapped in <li> tags. Since we want the value of the <PERSONA> element inserted into our output, and not the literal characters $char/string(), we wrap the XQuery inside the <li> tags in curly braces. As above, curly braces shift from HTML mode (where the text between tags is output literally) to XQuery mode (where it’s interpreted, that is, where the output is the result of evaluating the expression, rather than the literal expression itself).

Enchancements

Alphabetizing the casts

Alphabetic sorting in XQuery orders all upper-case letters before all lower-case letters, so that, in the case of Hamlet, GUILDENSTERN sorts before Ghost of Hamlet's Father because upper-case U sorts before lower-case h. We can make the sort case-insensitive, the way humans expect, by using the lower-case() function inside the order by statement:

order by lower-case($char)

We still output the original form; we convert it behind the scenes to lower case for sorting, but we don’t output that converted version.

Alphabetizing the plays

We can add an order by statement to the external FLWOR, the one that collects the three plays, and we can sort those in a case-insensitive way by title. That requires us to write the same XPath expression ($play/TITLE) twice, though, once for sorting and once for output:

for $play in collection('/db/shakespeare/plays')//PLAY
order by lower-case($play/TITLE)
return
    <div>
        <h2>{$play/TITLE/string()}</h2>

We can avoid this duplication by setting the title equal to a variable and then using the variable twice, once to sort and once for output:

for $play in collection('/db/shakespeare/plays')//PLAY
let $title := $play/TITLE
order by lower-case($title)
return
    <div>
        <h2>{$title/string()}</h2>

Note that the assignment of a value to a variable requires a colon followed by an equal sign (:=):

let $title := $play/TITLE

The final version is:

for $play in collection('/db/shakespeare/plays')//PLAY
let $title := $play/TITLE
order by lower-case($title)
return
    <div>
        <h2>{$title/string()}</h2>
        <ul>{
            for $char in $play//PERSONA
            order by lower-case($char)
            return
                <li>{$char/string()}</li>
        }</ul>
    </div>