Digital humanities

Maintained by: David J. Birnbaum ( [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2021-10-29T02:31:12+0000

Test #4: XPath

The task

For the XPath test, you will be using XPath to take a closer look at individual characters, and specifically at Hamlet, in Shakespeare’s Hamlet. We recommend working in the XPath/Xquery Builder view instead of the small XPath toolbar at the top of your <oXygen/> window because some of these expressions can get long, and it is easier to stay organized if you can see the entirety of your XPath expression as you are typing. Feel free to consult other resources, such as notes and information published on the Internet, as you are completing the test, but you are not allowed to ask another person for help. Please submit your answers (the full XPath expressions, not just what is returned) in a well-formatted markdown document.

Using the version of Hamlet that we have been using for all the previous XPath assignments (Bad Hamlet), complete the following tasks. Some of them may have multiple solutions.


  1. Write an XPath expression that selects all the lines (<l> or <ab> elements) in which Hamlet mentions Ophelia by name. That is, your expression should select the <l> or <ab> elements that are spoken by Hamlet and contain the string Ophelia.

  2. What is Hamlet’s last line in the play? Your expression should select only the last line that Hamlet utters, and not his entire final speech. The last line could be either an <l> or an <ab> element, and your XPath expression should find it without knowing in advance whether it is of type <l> or of type <ab>.

  3. How many times is the word madness uttered in Hamlet (by any character)? For this task you need to find and count all instances of madness together, regardless of whether it is spelled Madness or madness.

  4. Which characters utter the word madness (regardless of capitalization, as in the question above) in the play? Your XPath expression should return a sequence of the names of the characters (in the <speaker> element of the speech where the word appears) without duplicates.


Don’t hesitate to read up on unfamiliar functions in order to complete these tasks. That’s what we do all the time, and you’ll be doing it for your projects anyway, so the bonus tasks here are good practice.

  1. Alphabetize the list of characters who utter the word madness. We used the sort() function, which is new in XPath 3.1, so it isn’t in Michael Kay’s book (which was written when XPath 2.0 was the latest version), but it is described at some of the other resources listed in the XPath section of our main course page.

  2. How many of Hamlet's sentences are questions? You can’t just count the lines (<l> or <ab> elements) that contain question marks because some lines may contain more than one question, e.g.:

    Ah, ha, boy! say'st thou so? art thou there, truepenny?]]>
    You can approach this task with either tokenize() or string-to-codepoints() (which is easiest to use if you combine it with codepoints-to-string()).
  3. Write an XPath expression that selects all the lines (<l> or <ab> elements) that contain hyphenated words, but excludes any that have a hyphenated word only inside of a <stage> element, but not in a spoken word.

    As it happens, there is no line that contains both a hyphenated spoken word and a hyphenated word in an embedded stage direction, but you can’t be certain of that in advance, so your XPath expression should also match a hypothetical line like:

    The king doth wake to-nightRe-enter]]>

    You want to match this line because although it contains a hyphenated word in a stage direction, that is not the only hyphen in the line; there is also a hyphen in the spoken word to-night.