Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2020-04-27T21:36:43+0000


Test #5: Schematron

The text

This test uses a selection of 5 of 15 of Petrach's poems in Italian, which you should open from http://dh.obdurodon.org/schematron-test-Petrarch.xml. (Our file is based on one we found on Project Gutenburg, but you’ll need to use our version for the test because we’ve introduced some changes.)

The XML document has a root element <book> with three main child elements, <title> for the title of the book, <toc> for the table of contents, and <text> for the body of the text. In the table of contents, there are <sonnet> elements that have an @n attribute containing the sonnet number. The <text> element contains <sonnet> child elements with @n attributes, and the child elements for each sonnet are <l> elements, which correspond to individual lines. Pause now and look at that markup to see how chapters and chapter numbers are represented in the table of contents and in the main body.

The task

Your task is to write a Schematron schema that will validate the following things in both the table of contents and the body of the document:

  1. The first sonnet number (the numerical values of the @n attributes) in both the table of contents and the body of the document is 1.
  2. The numbers are consecutive and in the correct order, that is, that the number on each sonnet after the first is greater by 1 than the number on the preceding sonnet.
  3. The table of contents and the text have the same number of sonnets. Since you are verifying that the @n attributes in both the table of contents and the <text> begin with 1 and proceed consecutive, if you now verify that the two sequences of <sonnet> elements have the same length, that will confirm that every entry in the table of contents corresponds to a value in the <text>, and vice versa.

Our Answer

Our Schematron schema used to validate the document is below:

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"
    xmlns:sqf="http://www.schematron-quickfix.com/validator/process"
    xmlns="http://purl.oclc.org/dsdl/schematron">
    <pattern>
        <rule context="sonnet[1]">
            <assert test="@n = 1">The first sonnet is not numbered with the @n value of 1, it is
                numbered with <value-of select="@n"/>.</assert>
        </rule>
        <rule context="sonnet[preceding-sibling::sonnet]">
            <assert test="@n - preceding-sibling::sonnet[1]/@n eq 1">Sonnets are not in the correct
                order, or not numbered correctly.</assert>
        </rule>
        <rule context="book">
            <assert test="count(toc/sonnet) eq count(text/sonnet)">The number of sonnets in the
                table of contents is not equal to the number of sonnets in the text, there are
                    <value-of select="count(toc/sonnet)"/> in the Table of Contents, but <value-of
                    select="count(text/sonnet)"/> in the text.</assert>
        </rule>
    </pattern>
</sch:schema>

Explanation

The first rule fires on all first sonnets and verifies that they have an @n value of 1. There are two first sonnets, the first one in the table of contents and the first one in the text of the document. The pattern for this rule will match both of them.

The second rule fires on all non-first sonnets, that is, all sonnets that have a preceding sibling <sonnet> element. For each of them it finds the first preceding sibling sonnet and subtracts its @n value from that of the current sonnet. The difference will be 1 as long as the values run consecutively. While a human would understand that each sonnet has a number one greater than the number of the preceding sonnet cannot apply to the first sonnet because it doesn’t have a preceding sonnet at all (and therefore not a preceding sonnet number), a computer cannot draw that real-world inference, so we have to specify it with the predicate.

The third rule starts from the <book> element and verifies that its <toc> and <text> children both have the same number of <sonnet> children. Since we’ve verified independently that both start at 1 and run consecutively, if this condition is also met, that means that the numbers all correspond.