Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2023-03-03T15:57:47+0000


Test #4: XPath

Instructions

This test has two required parts plus an optional bonus (extra credit) section. The first part asks questions about your understanding of XPath and the second asks you to create XPath expressions and use them to learn about a Bad Hamlet file similar to the one you’ve been using for practice. You’ll find the file at http://dh.obdurodon.org/even-worse-hamlet.xml. This file contains altered content that is different from the Bad Hamlet version that you’ve been using in your XPath assignments, so be sure to work with this new file.

Don’t forget to set the XPath version in the <oXygen/> XPath toolbar or XPath builder to 3.1. You may also want to revisit our XPath functions we use most tutorial.

Part 1: Questions about XPath

  1. Define nodes, sequences and atomic values. Give an example of how each of those concepts might arise when you use XPath to explore Hamlet in <oXygen/>. Your examples of these three concepts might involve either XPath expressions themselves or the results that XPath expressions return.

  2. What is the difference between an axis and a predicate in a path expression? To answer this question, give an example of each within an XPath expression, explain how they are distinguished syntactically (that is, how each is spelled when used in an XPath expression), and explain what each contributes to the overall meaning of the XPath expression you use to illustrate them.

  3. Explain the difference between the simple map operator ! and the arrow operator =>. For example, consider the two expressions //sp ! count(.) and //sp => count() and how they return different results. Give one example each of a reasonable way you might use these operators to explore Hamlet.

Part 2: Creating and using XPath expressions

The functions we used to answer the following questions include contains(), count(), distinct-values(), not(), sort(), string-join(). All of these are described in Michael Kay except sort() because it was introduced in XPath 3.1 and Mike’s book was written when 2.0 was the most recent version. The sort() function returns a sequence of items sorted into alphabetical order. There may be more than one correct answer to some of the questions.

Questions 5–9 build on one another. If you get stuck at some point, you can still receive partial credit for the following questions by explaining and illustrating how you would answer them if you had the requisite input. For example, if you can’t get the 77 lines you want for question 5, select some alternative lines as input into question 6 and describe and illustrate how you would find the speakers of speeches that contain those lines.

  1. All line elements in the play <l> are supposed to have attributes of type @n, but some don't, which is a markup mistake. What XPath expression will select the lines that don’t have @n attributes? (Hint: There are five such lines.)

  2. Building on the preceding question, what XPath expression will tell you how many such lines there are? Your expression must return a single integer value, that is, XPath needs to do the counting instead of returning the lines and your finding the answer with your human eyeballs by looking next to the Description.

  3. Hamlet’s Ghost (referred to as Ghost), although not appearing much, is an important symbol in the play as it represents Hamlet’s dead father. What XPath expression finds the scenes where Ghost is featured as a speaker? (Hint: There are 2 such scenes.)

  4. What XPath expression finds all speeches spoken by Ghost? Your XPath expression must select the speeches themselves, and not just the speakers. (Hint: there are 14 such speeches.)

  5. What XPath expression will find every line (<l> or <ab> element) in which the name Hamlet is spoken? Caution: There are lines that contain stage direction (<stage>) elements the mention Hamlet’s name, but being mentioned inside a stage direction isn’t the same as being spoken. Your XPath expression must include only lines where the name Hamlet is spoken within speech. (Hint: there are 77 such lines, 10 instances of <l> and 67 of <ab>.)

  6. What XPath expression will return the speakers of each speech that contains a line (<l> or <ab> element) that mentions Hamlet? (Hint: There are 68 such speakers because some speeches contain more than one line that mentions Hamlet. Some of the speaker names are repeats because the same person may have multiple speeches that mention Hamlet by name.)

  7. What expression would deduplicate the results of the last expression? In other words, you should return a sequence of strings where each name is listed only once. (Hint: There are 13 such speaker names.)

  8. What XPath expression will sort the sequence in alphabetical order?

  9. What Xpath expression will return the sequence as a comma-separated list?

Part 3: Optional extra-credit questions

  1. What XPath expression will return a deduplicated list of all element names within the document? (Hint: You’ll need the name() function, which you can look up in Michael Kay. There are 28 distinct element names.)

  2. What XPath expression will select all speech <sp> elements that have both <l> and <ab> children? (Hint: There are 7 such speeches.)

  3. What XPath expression will return the ratio of <l> to <ab> children for each of the speeches selected in the previous step and sort them from lowest to highest? (Hint: There are 7 such ratios, ranging from a low of 0.117 to a high of 6, and the number 1 appears twice in that list because two of the speeches in question have the same number of elements of both types.)

  4. Given the 7 values in the preceding question, what XPath expressions will return just the lowest value,just the highest value, and just the average (arithmetic mean) of all 7 values? (Hint: You’ll want to look up the appropriate functions in Michael Kay.)

What to submit

Write your answers in a properly formatted markdown file with a filename that conforms to our usual filenaming conventions, with an .md filename extension and upload it to Canvas. You can remind yourself about markdown syntax at the GitHub three-minute guide to Mastering markdown that you read earlier. The test is open book and you can use any references you’d like, except that you cannot receive help from another person.

Should you have any questions, please ask in the #xpath channel in our Slack workspace. We can’t give you the answer, but we’ll do whatever we can short of that to help.