Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2020-02-27T22:21:50+0000


Test #4: XPath

The task

Using Bad Hamlet, provide a single XPath expression that retrieves each of the following items.

  1. All speeches (<sp>) by Hamlet. (There are 357 such speeches.)

    //sp[@who eq 'Hamlet'] or //sp[speaker eq 'Hamlet']. You can use general comparison (=) or value comparison (eq) to check for equality.

  2. All scenes that contain speeches by Hamlet. (There are 13 such scenes. You can see which ones they are in the table below. Your XPath expression has to find them without knowing what they are in advance, that is, without asking for them individually or by number.)

    //div/div[descendant::sp[@who eq 'Hamlet']]. As in the example above, you can use either general comparison or value comparison, and you can use either the @who attribute or the <speaker> element to identify the speaker.

  3. The titles (<head> elements) of all scenes that contain speeches by Hamlet. Don’t ask for each scene individually; you want a single XPath expression that returns these titles without your having to know in advance which scenes contain speeches by Hamlet. (There are 13, one for each scene in which Hamlet speaks.)

    Use the answer to the previous question, but add another path step to move from the scenes to their <head> children: //div/div[descendant::sp[@who eq 'Hamlet']]/head

  4. The number of speeches (<sp>) by Hamlet in each scene. Don’t ask for each scene individually; you want a single XPath expression that finds the count of Hamlet’s speeaches for all of the scenes, without your having to know in advance anything about the number of acts or scenes or the ones in which Hamlet speaks. The numbers you should find are:

    Location Speeches by Hamlet
    Act 1, Scene 1 0
    Act 1, Scene 2 33
    Act 1, Scene 3 0
    Act 1, Scene 4 10
    Act 1, Scene 5 29
    Act 2, Scene 1 0
    Act 2, Scene 2 59
    Act 3, Scene 1 12
    Act 3, Scene 2 65
    Act 3, Scene 3 1
    Act 3, Scene 4 26
    Act 4, Scene 1 0
    Act 4, Scene 2 9
    Act 4, Scene 3 10
    Act 4, Scene 4 7
    Act 4, Scene 5 0
    Act 4, Scene 6 0
    Act 4, Scene 7 0
    Act 5, Scene 1 38
    Act 5, Scene 2 58

    //div/div/count(descendant::sp[@who eq 'Hamlet']).

Bonus

These questions are optional.

  1. All scenes in which Hamlet does not speak, but in which his name nonetheless appears. (There are 6 such scenes.)

    //div/div[not(descendant::sp[@who eq 'Hamlet'])][contains(.,'Hamlet')]. We find all scenes, filter them to keep only the ones that don’t have a speech by Hamlet (using the not() function), and then filter those to keep only the ones that contain the string Hamlet.

  2. All speeches by Hamlet that contain an odd number of line (<l>) child elements. (There are 106 such speeches. You can use the mod operator, which returns the remainder of integer division, to determine when the line count is odd or even. For example, 5 mod 2 returns 1 because when you use integer division to divide 5 by 2, the remainder is 1.)

    //div/div/sp[@who eq 'Hamlet'][count(l) mod 2 eq 1]. We find all speeches by Hamlet and then filter them by counting their line children, dividing the count by 2, and keeping only the ones where the remainder after integer division is 1. Since only odd numbers have a remainder of 1 when you divide them by 2, this finds the speeches with an odd number of lines.