Maintained by: David J. Birnbaum (djbpitt@gmail.com)
Last modified:
2021-04-03T17:02:07+0000
This test uses a poem from this semester’s Dickinson project, which you can find at schematron-instance.xml. We have modified the XML for use in this Schematron test, including introducing content and markup that was not present in the project data. The original developers are not responsible for these modifications, which were made only for testing purposes.
You may assume that Relax NG validation is ensuring that all required elements
and attributes are present and in allowed contexts. For example, you do not have
to validate that the document contains a <date>
, that the
date is where it should be in the document, that the date is a four-digit year,
that the date is allowed to have a @period
attribute, or that
late
is a permitted value for that attribute.
Create Schematron rules that enforce the following constraints:
<line>
element cannot begin or end with a space
character.<line>
element cannot be empty or contain just
whitespace.<stanza>
element with a @type
attribute
that has the value quatrainmust contain exactly four lines. Your error message should report the actual line count of stanzas that fail this test.
<date>
element that has a
@period
attribute with the value latemust be dated between 1875 and 1886, inclusive. Your error message should report the actual date that appears in the poem.
<line>
elements also apply to all element descendants
of the <metadata>
element. That is, none of those
elements can begin or end with a space character and they cannot be blank or
consist entirely of whitespace. You should write one rule that tests all of
these elements, and not a separate rule for each of them.<poem_themes>
metadata element
must correspond to the name of at least one child element inside a
<line>
. For example, if there were a
<theme>
element with the value obdurodon, there would have to be an element called
<obdurodon>
in a line of
the poem. No fair checking by specific names; this rule has to work with any
element type, including those not present in this particular poem. Your
error message should report the value of the spurious
<theme>
element.<line>
must have names that
correspond to <theme>
element children of the
<poem_themes>
metadata element. This is the mirror
image of the preceding rule, and your error message should report the name
of the element type that appears in the body but is not listed among the
metadata themes.<?xml version="1.0" encoding="UTF-8"?> <sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2" xmlns:sqf="http://www.schematron-quickfix.com/validator/process"> <sch:pattern> <!-- ====================================================== --> <!-- Global variables --> <sch:let name="metadata-themes" value="//theme"/> <sch:let name="body-themes" value="//line/* ! name()"/> <!-- ====================================================== --> <sch:rule context="stanza[@type eq 'quatrain']"> <!-- A quatrain must contain 4 lines --> <sch:let name="line-count" value="count(line)"/> <sch:assert test="$line-count eq 4">Quatrains must contain exactly four lines. This stanza contains <sch:value-of select="$line-count"/> lines.</sch:assert> </sch:rule> <sch:rule context="line"> <!-- A line cannot be empty and or contain just whitespace --> <sch:report test="string-length(.) = 0">Lines cannot be empty and cannot contain just whitespace</sch:report> <!-- A line cannot start with a space character --> <sch:report test="starts-with(., ' ')">Lines should not begin with space characters</sch:report> <!-- A line cannot end with a space character --> <sch:report test="ends-with(., ' ')">Lines should not end with space characters</sch:report> </sch:rule> <sch:rule context="date[@period eq 'late']"> <!-- Late period poems must be dated between 1875 and 1886 --> <sch:let name="year" value="number(.)"/> <sch:assert test="$year ge 1875 and number(.) le 1886">Late-period poems must have a date between 1875 and 1886, inclusive, and <sch:value-of select="$year"/> does not fall within that range.</sch:assert> </sch:rule> <sch:rule context="line/*"> <!-- Inline element types must be listed among the metadata themes --> <sch:let name="name" value="name()"/> <sch:assert test="$name = $metadata-themes">The element type "<sch:value-of select="$name"/>" is not among the metadata themes: "<sch:value-of select="string-join($metadata-themes, ', ')"/>"</sch:assert> </sch:rule> <sch:rule context="theme"> <!-- Every theme element in the metadata must appear inside a line in the body --> <sch:assert test=". = $body-themes">The metadata theme "<sch:value-of select="."/>" does not appear in the body themes: "<sch:value-of select="string-join($body-themes, ', ')"/>"</sch:assert> </sch:rule> </sch:pattern> <sch:pattern> <sch:rule context="metadata//*[not(self::poem_themes)]"> <!-- Metadata entries (all descendants) cannot be empty or contain just whitespace --> <sch:report test="string-length(normalize-space(.)) = 0">Metadata entries cannot be empty and cannot contain just whitespace</sch:report> <sch:report test="starts-with(., ' ')">Metadata entries cannot begin with space characters.</sch:report> <sch:report test="ends-with(., ' ')">Metadata entries cannot end with space characters.</sch:report> </sch:rule> <!-- The real first line must match the metadata first line, except for whitespace --> <sch:rule context="body/descendant::line[1]"> <sch:assert test="normalize-space(.) eq //first_line ! normalize-space(.)">The first line of the poem must contain the same text as the <first_line> element in the metatdata.</sch:assert> </sch:rule> </sch:pattern> </sch:schema>
If multiple rules inside an <sch:pattern
> element are able to
match the same thing (for example, if one specifies all lines as the context
and another specifies only first lines), only the first one (in order) will
fire. The way to work around this is to separate otherwise overlapping rules
into different <sch:pattern>
elements. you did not have to
do this for the required activities, none of which required rule contexts
that would overlap. As an alternative to splitting your rules into different
<sch:pattern> elements as a way of avoiding overlapping contexts, you can
write tests that take the differences into account. We illustrate both of
these strategies below.
All of these rules can go inside the same <sch:pattern>
element:
<sch:rule context="stanza[@type eq 'quatrain']"> <!-- A quatrain must contain 4 lines --> <sch:let name="line-count" value="count(line)"/> <sch:assert test="$line-count eq 4">Quatrains must contain exactly four lines. This stanza contains <sch:value-of select="$line-count"/> lines.</sch:assert> </sch:rule> <sch:rule context="line"> <!-- A line cannot be empty and or contain just whitespace --> <sch:report test="string-length(.) = 0">Lines cannot be empty and cannot contain just whitespace</sch:report> <!-- A line cannot start with a space character --> <sch:report test="starts-with(., ' ')">Lines should not begin with space characters</sch:report> <!-- A line cannot end with a space character --> <sch:report test="ends-with(., ' ')">Lines should not end with space characters</sch:report> </sch:rule> <sch:rule context="date[@period eq 'late']"> <!-- Late period poems must be dated between 1875 and 1886 --> <sch:let name="year" value="number(.)"/> <sch:assert test="$year ge 1875 and number(.) le 1886">Late-period poems must have a date between 1875 and 1886, inclusive, and <sch:value-of select="$year"/> does not fall within that range.</sch:assert> </sch:rule>
To verify that quatrains contain exactly four lines, we match on
<stanza>
elements with a @type
value of
quatrain
, which means that the rule fires once per
<stanza>
element if the element has a @type
attribute with the value quatrain
(and ignores other
<stanza>
elements). We use the count()
function to count its <line>
children, and we save this
value in a variable because we’re going to use it twice (for the test and
the report), and by saving it in a variable we have to count it only once.
We then assert that the count must be equal to 4, and report an error, with
the actual count, if the test fails. With this poem the test will give the
correct result if you don’t specify the @type
in the
@context
because this poem happens not to contain any
stanzas that are not quatrains, but you should specify the
@type
attribute in the rule anyway. The poet also wrote
poems with stanzas were not quatrains, so specifying the attribute will
protect you from spurious error reports on tercets or sestets in other
poems.
We write separate rules to verify that a line does not begin or end with a space characters. Those aren’t very robust tests, though, because they check only for literal space characters, and not for other whitespace characters, like tabs and newlines. A more robust rule to check whether there is any whitespace character (not just a literal space) at the beginning of a line might be:
<sch:report test="matches(., '^\s')">
Because matches()
checks for a regex, rather than a string, we
can use the regex notation \s
, which means any whitespace
character
; this saves us from having to check for each possible
whitespace character separately. If you want to use
starts-with()
, you could use a more complex test, checking
for each type of whitespace character individually:
<sch:report test="starts-with(., ' ') or starts-with(., '	' or starts-with(., '
')
Because we can’t easily read tab or newline characters, we use numerical
character references to represent them in our code (see Kay, p. 142). We
would also want to write a more robust rule to check the end of the line,
using '\s$'
as the regex to match.
To ensure that a <line>
element isn’t empty, we measure the
length with string-length()
and assert that it should not be
equal to 0.
The context for these rules is line
, so the rule fires once per
<line>
element.
To test the date range we match the <date>
element only if it
has a @period
attribute with the value of late
,
and we save the value to a variable so that we can reuse it easily. If you
want to use general equality for the comparison, you don’t need the
number()
function, but if you want to use value comparison,
you do. They two types of comparison the same meaning in this particular
context, so you can use either one; they do have different meanings
elsewhere, though, and you can remind yourself of the difference in the
Comparison section of our https://dh.obdurodon.org/functions.xhtml. We assert that the year
falls within the designated range, and report an error if it doesn’t.
<line>
elements must be listed among the
metadata <theme>
elements<sch:let name="metadata-themes" value="//theme"/> <sch:rule context="line/*"> <!-- Inline element types must be listed among the metadata themes --> <sch:let name="name" value="name()"/> <sch:assert test="$name = $metadata-themes">The element type "<sch:value-of select="$name"/>" is not among the metadata themes: "<sch:value-of select="string-join($metadata-themes, ', ')"/>"</sch:assert> </sch:rule>
We create a variable called $metadata-themes
that is equal to a
sequence of all <theme>
elements in the document, which are
only the ones listed within the metadata section. In our rule, the
@context
value matches all children of the
<line>
element it is looking at the moment, that is, all
inline elements inside each line of the poem, one line at a time. For each
element instance that the @context
matches, we store the
element name in a variable so that we can reuse it easily, once for the test
and again for the report. The test asserts that the name of the inline
elemebt is equal to one of the <theme>
elements in the
metadata section, and here we have to use general equality because that is
the only want to test whether any item on the left side of the equal sign
(there’s only one, the name of the element being tested) is equal to any
item on the right (the sequence of all <theme>
values from
the metadata). If you try to use the eq
for value comparison
here, you’ll raise an error because value comparison can only compare one
thing to one thing, and our right side contains a sequence of
<theme>
elements, that is, more than one. If the test
fails, we report both the name of the problematic element that we matched
and the list of expected values that we extracted from the
<theme>
elements in the metadata section.
Because this rule matches on children of line elements
(context="line/*"
), and not on <line>
elements themselves, it does not complete or conflict with the earlier rule
that matches on <line>
elements themselves. For that reason,
it can go inside the same <sch:pattern>
.
<sch:pattern> <sch:rule context="metadata//*[not(self::poem_themes)]"> <!-- Metadata entries (all descendants) cannot be empty or contain just whitespace --> <sch:report test="string-length(normalize-space(.)) = 0">Metadata entries cannot be empty and cannot contain just whitespace</sch:report> <sch:report test="starts-with(., ' ')">Metadata entries cannot begin with space characters.</sch:report> <sch:report test="ends-with(., ' ')">Metadata entries cannot end with space characters.</sch:report> </sch:rule> </sch:pattern>
We can test the whitespace behavior of all metadata elements at once by
matching on all descendants of the <metadata>
element,
except that we don't want to test <poem_themes>
in this way
because it has element content, which means that it might contain whitespace
for pretty-printing, and we care about whitespace only when it begins or
ends real text. There are two ways to exclude <poem_themes>
from the @context
value:
<metadata>
if those children are not of type
<poem_themes>
. We do that above by using the self
axis to say match me unless I, myself, am an element of type
<poem_themes>
. This type of test is the
principal use case for the self axis.except
operator that can used to exclude items
from a sequence. If you specify the value of @context
as
metadata//(* except poem_themes)
, that will match all
element descendants of <metadata>
except descendant
<poem_theme>
elements.If have to put this <sch:rule>
into a different
<sch:pattern>
than the rule that checks date. The reason
is that the @context
for the date checking rule is
date[@period eq 'late']
and the context for checking all
descendants of <metadata>
is
metadata//*[not(self::poem_themes)]
, and a late
<date>
element matches both of those XPath patterns. If
a component of the document we are validating matches more than one
@context
value inside the same
<sch:pattern>
, only the first <sch:rule>
will fire. We could have worked around this limitation with more complex
predicates, but we find our code easier to write, read, debug, and maintain
if we keep our @context
XPath patterns as simple as possible,
even when that requires us to create additional
<sch:pattern>
elements.
To test whether the first line of the real poem matches the
<first_line>
element in the metadata we could use either
of those elements as the @context
value. In both cases, though,
we have to ensure that we don’t wind up with two @context
attributes within a single <sch:pattern>
element that match
the same item in the document. We chose to match the real first line, and we
normalize the whitespace on both it and the <first_line>
element and then check whether they are the same.