Digital humanities

Maintained by: David J. Birnbaum ( [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2023-01-08T17:47:32+0000

XPath assignment #1

You can find an XML (TEI) version of Shakespeare’s Hamlet at We’ve deliberately damaged some of the markup in this edition to introduce some inconsistencies, but the file is well-formed XML, which means that you can use XPath to explore it. You should download this file to your computer (typically that means right-clicking on the link and selecting save as) and open it in <oXygen/>.

Prepare your answers to the following questions in a markdown file upload it to Canvas as an attachment. As always, code snippets (including XPath snippets) in markdown must be surrounded with backticks.

Some of these tasks are thought-provoking, and even difficult. If you get stuck, do the best you can, and if you can’t get a working answer, give the answers you tried and explain where they failed to get the results you wanted. As always, you are encouraged to ask questions in the #xpath channel in Slack, but because you want to make progress in learning to debug your own code, your questions should tell us what you tried, what you expected, exactly what you got instead (not just didn’t work or got an error), and what you think the source of the problem is. Sometimes writing that sort of request for advice that will help you figure out what’s wrong on your own (see Rubber duck debugging), and even when it doesn’t, it will help us identify the difficult moments.

These tasks require the use of path expressions, predicates, and the functions count() and not(), but they should not require any other XPath functions. There may be more than one possible answer.

Using the Bad Hamlet document and the XPath browser window in <oXygen/>, construct XPath expressions that will do the following. Give the full XPath expressions in your answers, and not just the results:

  1. Hamlet, like a typical Shakespearean tragedy, contains five acts, each of which contains scenes. Both acts and scenes are encoded as division (<div>) elements.
    1. How can XPath tell them apart?

    2. What XPath would find just the acts?

    3. What XPath would find just the scenes?

    4. What XPath would find just the scenes in Act III?

  2. Stage directions (<stage>) occur in a variety of contexts.
    1. What XPath would find all of the stage directions that are inside a metrical line (<l>), that is, between the starting <l> and the ending </l>. How many are there?

    2. What XPath would find all of the stage directions that are directly inside a speech (<sp>), that is, inside a speech but not inside a line within a speech?

    3. What XPath would find all of the stage directions that are not directly inside a speech or a line. How many are there?

      A correct answer cannot rely on knowing what the parent of those stage directions is ahead of time; the only information you have is that you want stage directions that do not have speeches or lines as parents, so those are the only conditions you can use to find the results you want.

    4. For the stage directions you identified in #2c, above, write an XPath expression that will return not the <stage> elements themselves, but their parent elements, whatever they might be. What are those parent elements? (You haven't yet learned the XPath to return just the names of the parent elements [rather than the elements themselves], but you can locate them, click on each one in the list <oXygen/> returns, and look at it directly.)

What to submit

You should turn in your answers to the above questions in a markdown file, that is, a file with the extension .md.