Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2017-04-19T18:19:49+0000


Schematron test

The XML Document

This test uses an XML document containing the total reference counts from the U.S. presidential inaugural addresses: So help me God project, which you can download by right-clicking on http://dh.obdurodon.org/schematron-test-2174-xml-instance.xml

The structure of this document is that the root element, <speeches>, contains data for 57 inaugural speeches (<speech>), each of which has a count of 0 or more references. Each speech also has a @year number, which denotes the year that the speech was given.

The task

Your task is to write a Schematron schema to validate the following:

  1. There are exactly 57 speeches.
  2. No year has a negative reference count.
  3. The year number (the value of the @year attribute) for each speech is not less than 1789 and not greater than 2013, since the project only contains speech data up until 2013 and no inaugural speeches were given prior to 1789.
  4. (Optional bonus question) Inaugural addresses are given every four years, even though presidents may succeed to office at other times when the incumbent dies. Verify that sequential years of the inaugural addresses differ by exactly 4.

You should associate your schema with your XML document instance in <oXygen/> and verify that it works by changing the total number of <speech> elements, changing some of the reference counts inside of the <speech> elements, and changing some of the @year attribute values on the <speech> elements. When you are satisfied with your answer, please upload just the Schematron file (not the XML).

Our Answer

The Schematron file used to validate the XML can be seen below.

<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
    <pattern>
        <rule context="speeches">
            <assert test="count(speech) eq 57">There should be exactly 57 speeches.</assert>
        </rule>
        <rule context="speech">
            <assert test="number(.) ge 0">The number of references must be greater than or equal to
                0.</assert>
            <assert test="number(@year) ge 1789 and number(@year) le 2013">The year must be no
                earlier than 1789 and no later than 2013.</assert>
            <assert test="number(@year) = (1789, number(preceding-sibling::speech[1]/@year) + 4)"
                >This year is not exactly four years later than the preceding year.</assert>
        </rule>
    </pattern>
</schema>

The first rule has a @context attribute value of speeches which means this rule will fire on the root <speeches> element. Our <assert> tests whether the total number of speech elements is equal to 57. We use the count() function to count speeches on the child axis, and compare that value to 57.

This rule could have also been written containing a <report> instead of an <assert>. The only change that would need to be made is the addition of an exclamation point (which you should know is the negation operator in XPath) before the equal sign.

The second rule is responsible for checking whether the reference counts are valid, non-negative numbers. Since this is something that applies separately to each individual <speech>, and not the <speeches> element as a whole, we created a new <rule> where the value of the @context attribute is now speech. This means it will fire once for each <speech> element, and that it will check the value for that individual speech. Inside that rule, we used an <assert> and tested whether the reference count for the speech being examined is greater than or equal to 0.

Note: Value comparison (eq, ne, lt, le, gt, ge) is fussy about data types, and it won’t compare numbers to numbers unless it knows that both are numbers. It knows that the zero is a number because digits in XPath expressions that aren’t in quotation marks are numbers, but when it plucks the reference count from the XML, it doesn’t know whether it’s a number or a string. We wrap the XPath number() function around it to instruct the comparison to treat it as a number. It’s a peculiarity of XPath comparison that you don’t need the number() function in this context for general comparison (=, !=, <, <=, >, >=), which will convert the value from the XML to a number automatically. Value comparison, however, requires you to specify the type.

The third task is to confirm that for any given speech, the year is between 1789 and 2013, inclusive. This also applies separately to each individual <speech>, so we can place this new <assert> inside the same <rule> as the last one we created, since it will have the same context. The test for this <assert> is whether the @year attribute for the speech being examined has a value greater than or equal to 1789 and less than or equal to 2013. We separate these using the XPath and logical operator.

The last (optional, bonus) task was to verify that inaugural addresses happened at regular four-year intervals. Our rule uses general equality with a sequence of two items on the right. The first is the value 1789 (see below). The second is the year of the immediately preceding speech incremented by 4. We treat the first inaugural address separately because it doesn’t have a preceding sibling <speech>, and the comparison therefore wouldn’t be useful in this case. Note that we used a numerical predicate to run the comparison against only the immediately preceding speeches, and not against all of the preceding speeches, going back to the beginning.