Digital humanities

Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2021-12-27T22:03:57+0000

XQuery assignment #1

Use the 42 Shakespeare plays that have been uploaded to Obdurodon to do the following:

Find all of the titles of all of the Shakespeare texts in the corpus. You’ll need to read our posting on the main course page on Obdurodon for information about how to address the collection of plays, and also about how to retrieve the full text of one of the plays so that you can look at it and see where the title is, which you’ll need to know in order to construct the XPath to retrieve it. The simplest answer is a single XPath expression. The output should look something like (there are 42 of them):
```
<title xmlns="http://www.tei-c.org/ns/1.0">Othello, the Moor of Venice</title>
<title xmlns="http://www.tei-c.org/ns/1.0">The Second Part of King Henry the Fourth</title>
<title xmlns="http://www.tei-c.org/ns/1.0">The Taming of the Shrew</title>
```
Here are two important issues:
- The Shakespeare texts are in the TEI namespace. To use namespaces in XQuery, you declare them as follows:
```
declare namespace tei="http://www.tei-c.org/ns/1.0";
```
  This should go on the second line of your XQuery, just below the XQuery declaration. It declares that you will use the prefix tei: to refer to elements in the TEI namespace. That means, for example, that you should describe the TEI header as tei:teiHeader.
- There are several <title> elements in the plays, and not all of them are titles of plays. Some, for example, may be titles of acts or scenes. You can find the titles of plays by using XPath, and you may want to examine a sample play to remind yourself of how the TEI encodes that type of title.
Modify your XPath above to return just the text of the titles, without the tags. You can do that by using text() or data() or string() (which you might want to look up in Kay or at w3schools). Your answer should look something like:
```
Othello, the Moor of Venice
The Second Part of King Henry the Fourth
The Taming of the Shrew
```
Fourteen of the 42 plays have more than 40 unique speakers. Find those plays and return their titles. You will need to use count() and distinct-values() (and don’t forget the TEI namespace!). Find the collection, drill down to the <TEI> elements in the collection (you know there are 42 of them), then filter them based on whether or not they contain more than 40 distinct <speaker> element values. Once you’re getting the 14 plays that meet that description, you can add a path step to retrieve their titles.
Modify your solution to the preceding question #3 to return just the text of the play titles, without the <title> tags. You can take the same approach as you did for the transition from question #1 to question #2.

Copy and paste your XQuery expressions from eXide into a Markdown document and upload that document as your homework submission, remembering to tag your code examples correctly in markdown. That means that:

An inline code snippet should be surrounded by single backtick (`) characters, e.g., …the doc() function …

A code block should be fenced by preceding it with a blank line and then a line that contains only three backticks, and following it with a line that contains only backticks and then a blank line. For example:

… and here is an example:

```
declare namespace tei="http://www.tei-c.org/ns/1.0";
let $ham := doc('/db/apps/shakespeare/data/ham.xml')
let $acts := $ham/descendant::tei:body/tei:div
let $act-count := count($acts)
return "Hamlet contains " || $act-count || " acts."
```

In the preceding example ¬

We do not need the results returned by your query; all we need is the XQuery expression itself.