Maintained by: David J. Birnbaum (djbpitt@gmail.com)
Last modified:
2022-03-18T16:20:27+0000
For the XPath test, you will be using XPath to take a closer look at individual characters in Shakespeare’s Hamlet. We recommend working in the XPath/XQuery Builder view instead of the small XPath toolbar at the top of your <oXygen/> window because some of these expressions can get long, and it is easier to stay organized if you can see the entirety of your XPath expression as you are typing. Like all of our tests, this one is open-book, which means that you can consult notes, books, the Internet and other resources, except that you cannot receive any assistance from another person. Should you get stuck in a way that does not respond to your best rubber duck debugging efforts, feel free to post an inquiry in our Slack workspace and we’ll try to point you in the right direction. You should submit your answers (the full XPath expressions, not just the result of evaluating them) in a properly formatted markdown document. That includes surrounding XPath expressions in backticks.
Using the version of Hamlet that we have been using for all the previous
XPath assignments (http://dh.obdurodon.org/bad-hamlet.xml), complete the following tasks.
There are alternative good solutions for some of them, so you will not need to use
all of the following functions, but some that we used include
avg()
,
distinct-values()
,
matches()
,
normalize-space()
,
round()
,
sort()
,
string-join()
,
tokenize()
, and
translate()
. Don’t guess at how these work; if you
aren’t already familiar with them, look up the number of arguments they require and
what each argument means in Michael Kay or an alternative reference.
Here are two details to keep in mind:
Some lines in the XML begin with spurious space characters. For example:
I say, away! Go on; I'll follow thee.
Exeunt Ghost and Hamlet.
]]>
begins with a space even though the first word spoken is the word I
.
Whether those spaces require special handling depends on how you approach
the tasks below, but whatever approach you take, you’ll want to verify that
these lines are being treated properly.
You may need to select the text node children of an element that contains
mixed content. For example, the line above contains mixed content, in this
case consisting of a single text node (that is, plain text) followed by a
single <stage>
element. Much as you
can select the <stage>
element
children of all lines with //l/stage
,
you can select the text node children of all lines with
//l/text()
. The path step spelled
text()
is not a function (even though
it looks like one with its trailing parentheses); it’s the way to say, in
this path expression, select the text nodes on the child axis of each of
the nodes selected by the previous path step
.
The technical term for text()
is
that it’s a node test for text nodes, that is, it tests for
and selects text nodes. This is similar to the way that
*
in
//l/*
tests for and selects all
element children of each <l>
element (but not any children that are not element nodes, like text
nodes or comment nodes) and stage
in //l/stage
tests for and selects
element nodes that are of type
<stage>
(but not any children
that are not element nodes or that are element nodes of other types).
See Kay, p. 614.
Find all of Hamlet’s spoken lines (which may be represented by
<l>
or
<ab>
elements) in the play and
select only those that begin with the word I
(not just the letter
I
). The result should be a sequence of elements of type
<l>
or
<ab>
.
The elements you select above all contain text nodes, which represent spoken text, but some also contain stage directions (see the example above). Extend your XPath expression above to return just the spoken text from each of the lines, without any accompanying stage directions. The result will be plain text, without any markup. (Hint: See above about selecting text nodes.)
There is a lot of extra white-space that gets included with these lines because of pretty-printing. For example, where the XML contains:
A little more than kin, and less than
kind.
]]>
the pretty-printing introduces a newline and some extra spaces for indentation. Extend your XPath expression above (the one that selects only the text nodes inside lines, but not stage directions) to remove this extra space, so that each spoken line of text will be continuous, with single space characters between words.
Modify the preceding XPath expression to return the length in characters of
each line spoken by Hamlet that begins with I
(after ignoring stage
directions and removing extra whitespace). The result will be a sequence of
integers, each representing the character count of a single line of
speech.
Write an XPath expression to compute the average length (in character count)
of the lines spoken by Hamlet that begin with I
. If the value is not
already an integer, round it to the nearest integer value.
Who speaks in the fifth act of the play?
Iin all of Hamlet’s spoken lines that start with the word
I.