The task below uses Bad Hamlet to create an alphabetized list of all of the distinct words spoken by Hamlet in Act 5, Scene 1. The steps are ordered so that each builds on the preceding one, and if you get stuck on a step (required or extra-credit; for example, if you aren’t able to convert words to lower case or remove punctuation), it’s okay to skip it and proceed to the next step (but tell us that you’ve done that). Submit your answers in a markdown document, with backticks around the XPath expressions, specifying the XPath expression for each step (they are cumulative, so each will look like a modified version of the immediately preceding one). That is, you answer will look like a numbered list of XPath expressions, one expression for each of the steps below.
We suggest working in the XPath/XQuery Builder
view (accessible from the
<oXygen/> menus at Window → Show View) because the expression gets very long.
(To run an expression in the Builder, click on the red sideways triangle at the top
of the Builder panel. The Enter key just creates a new line; it doesn’t execute the
expression.) Using the simple mapping operator (!
) and the arrow
operator (=>
), where possible, will improve the legibility, but it is
possible to complete these steps without those operators. You can split your
expression into multiple lines in the Builder, which will also improve
legibility.
Required steps
- Write an XPath expression that returns all of Hamlet’s lines
(
<l>
elements) in Act 5, Scene 1. There are 43 of them. Our solution does not use any functions. - Modify your answer above to provide both the
<l>
elements and the anonymous blocks (<ab>
elements). There are 91 anonymous blocks, so the total number of lines of both types is 134. Our solution does not use any functions. -
Modify your answer to the preceding step to break the text into a single list of all of the words in all of those lines. Our solution used the
tokenize()
function.This word list will include words inside the four stage directions that are children of the line elements in this scene. For extra credit, exclude those from your result, so that you are returning only the words spoken by Hamlet.
- Modify your answer to return all of those words in lower case, that is, convert
words with initial capital letters to all lower case. Our solution uses the
lower-case()
function.
Extra-credit steps
- Modify your answer to strip all punctuation except hyphens and apostophes. Our
solution uses the
replace()
function. - Modify your answer to remove duplicate words. Our solution uses the
distinct-values()
function. - Modify your answer to sort the distinct words alphabetically. Our solution uses
the
sort()
function. - The first word in our alphabetized list is just a blank, so remove it. Our solution uses a predicate (checking whether the word is equal to the empty string), but no additional functions.