Digital humanities

Maintained by: David J. Birnbaum ( [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2019-03-01T05:01:34+0000

Test #4: XPath

The task

Using Bad Hamlet, provide an XPath expression that retrieves the following items. You must use XPath to find the answers without using any information in parentheses at the ends of the questions. (That information in parentheses is there so that you’ll know when you’ve gotten the right answer, but you can’t use that information directly in your XPath expressions.)

  1. All speeches (<sp>) by Ophelia that contain Hamlet’s name. Requires, at least in our solution, contains(). (There are two such speeches.)
  2. A semicolon-separated list of all unique speakers (<speaker>) in Act IV, without duplicates. Requires, at least in our solution, string-join() and distinct-values(). (Your list will include, among other, Rosencrantz, Guildenstern, and, because they sometimes speak together, Rosencrantz and Guildenstern. For the purpose of this test, you don’t have to get rid of the last of these—but see the bonus question, below.)
  3. The number of speeches (<sp>) in each act (//body/div). Our solution requires count(). (The number of speeches you should find are 251 for Act 1, 201 for Act 2, 249 for Act 3, 179 for Act 4, and 257 for Act 5.)
  4. The speaker elements (<speaker>) for all speeches (<sp>) that are greater than 4000 characters long. Requires, at least in our solution, string-length(). Hint: to approach this in stages: 1) find all speeches; 2) filter them to keep just the ones that are more than 4000 characters in length; 3) find the speakers of those speeches. (There are two such speeches, one by Hamlet and one by Ghost.)
  5. This question has four parts, which build incrementally on one another:
    1. The number of lines (<l> elements) in each speech (<sp> element). Here and in the other parts of this question, count the descendants of the speech, not just the children.
    2. The number of lines in the longest speech (measured in <l> elements). (The answer is 58.)
    3. The longest speech (<sp>) itself. (It’s the speech by Hamlet that begins with the line that has the @xml:id value sha-ham202553.)
    4. The <head> of the scene that contains that speech. (The answer is Act 2, Scene 2.)


How can you use XPath to get the semi-spurious Rosencrantz and Guildenstern out of the answer to #2? Your answer should cater to the following possibilities: