Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2019-04-19T18:54:24+0000


Test #5: Schematron

The text

This test uses the text from Agatha Christie's novel The murder on the links, based on the version published by Project Gutenburg. You can download an XML version of the novel from http://dh.obdurodon.org/schematron-test-christie.xml.

This document has a root element <book> with two main child elements, <toc> for the table of contents and <text> for the body of the text. In the table of contents, there are <chapter> elements, the values of which are the chapter titles, and they also have an @n attribute that contains the chapter number. The <body> element also contains <chapter> child elements with @n attributes, but the chapter titles for these chapters are inside a child element of <chapter> called <ch-title>.

The task

Your task is to write a Schematron schema that will:

  1. Verify that chapter numbers (the values of the @n attributes) in both the table of contents and the body begin with 1 and run consecutively, that is, that each @n value in both the table of contents and the body is greater by 1 than the preceding value.
  2. Verify that the number of <chapter> elements in the table of contents is the same as the number of <chapter> elements in the body.

You should associate your schema with your XML document instance in <oXygen/> and verify that it works by changing some of the values in the table of contents and in the body in the XML. When you are satisfied with your answer, please upload just the Schematron file (not the XML) to Courseweb.

Our Answer

Our Schematron schema used to validate the document is below:

<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"
    xmlns:sqf="http://www.schematron-quickfix.com/validator/process"
    xmlns="http://purl.oclc.org/dsdl/schematron">
    <pattern>
        <rule context="chapter[1]">
            <assert test="@n = 1">First chapter does not have an @n value of "1".</assert>
        </rule>
        <rule context="chapter[preceding-sibling::chapter]">
            <assert test="(number(@n) - number(preceding-sibling::chapter[1]/@n)) = 1">Chapters
                are not numbered correctly.</assert>
        </rule>
        <rule context="book">
            <assert test="count(toc/chapter) eq count(text/chapter)">The number of chapters in the
                table of contents is not equal to the number of chapters in the text.</assert>
        </rule>
    </pattern>
</sch:schema>

Explanation

The first rule fires on all first chapters and verifies that they have an @n value of 1. There are two first chapters, the first one in the table of contents and the first one in the body of the document. The pattern for this rule will match both of them.

The second rule fires on all non-first chapters, that is, all chapters that have a preceding sibling <chapter> element. For each of them it finds the first preceding sibling chapter and subtracts its @n value from that of the current chapter. The difference will be 1 as long as the values run consecutively. While a human would understand that each chapter has a number one greater than the number of the preceding chapter cannot apply to the first chapter because it doesn’t have a preceding chapter at all (and therefore not a preceding chapter number), a computer cannot draw that real-world inference, so we have to specify it with the predicate.

The third rule starts from the <book> element and verifies that its <toc> and <text> children both have the same number of <chapter> children. Since we’ve verified independently that both start at 1 and run consecutively, if this condition is also met, that means that the numbers all correspond.