Digital humanities

Maintained by: David J. Birnbaum ( [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2021-03-30T23:17:44+0000

Test #5: Schematron

The text

This test uses a poem from this semester’s Dickinson project, which you can find at schematron-instance.xml. We have modified the XML for use in this Schematron test, including introducing content and markup that was not present in the project data. The original developers are not responsible for these modifications, which were made only for testing purposes.

You may assume that Relax NG validation is ensuring that all required elements and attributes are present and in allowed contexts. For example, you do not have to validate that the document contains a <date>, that the date is where it should be in the document, that the date is a four-digit year, that the date is allowed to have a @period attribute, or that late is a permitted value for that attribute.

Required tasks

Create Schematron rules that enforce the following constraints:

  1. A <line> element cannot begin or end with a space character.
  2. A <line> element cannot be empty or contain just whitespace.
  3. A <stanza> element with a @type attribute that has the value quatrain must contain exactly four lines. Your error message should report the actual line count of stanzas that fail this test.
  4. A poem with a <date> element that has a @period attribute with the value late must be dated between 1875 and 1886, inclusive. Your error message should report the actual date that appears in the poem.

Optional, extra-credit tasks

  1. The first two types of constraints described above for <line> elements also apply to all element descendants of the <metadata> element. That is, none of those elements can begin or end with a space character and they cannot be blank or consist entirely of whitespace. You should write one rule that tests all of these elements, and not a separate rule for each of them.
  2. The real first line of the poem must match the text of the first line as given in the metadata section, except that there may be whitespace differences. (This is a common real-life exception because pretty-printing could introduce whitespace differences.)
  3. Any theme mentioned in the <poem_themes> metadata element must correspond to the name of at least one child element inside a <line>. For example, if there were a <theme> element with the value obdurodon, there would have to be an element called <obdurodon> in a line of the poem.* No fair checking by specific names; this rule has to work with any element type, including those not present in this particular poem. Your error message should report the value of the spurious <theme> element.
  4. All child elements of <line> must have names that correspond to <theme> element children of the <poem_themes> metadata element. This is the mirror image of the preceding rule, and your error message should report the name of the element type that appears in the body but is not listed among the metadata themes.

After you’re done

You should associate your schema with your XML document instance in <oXygen/> and verify that it works by changing some of the values to ensure that it raises errors when it should and does not raise errors when it shouldn’t. When you are satisfied with your answer, please upload just the Schematron file (not the XML) to Canvas.

* Poetry about Australian animals is not as rare one might think. Dante Gabriel Rossetti and Christina Rossetti, sibling Pre-Raphaelite rock stars, both published poems about wombats. See Angus Trumble’s O Uommibatto How the Pre-Raphaelites became obsessed with the wombat.