Maintained by: David J. Birnbaum (djbpitt@gmail.com)
Last modified:
2023-04-12T16:22:12+0000
In a three-way election for Best Stooge Ever, each candidate (Curly, Larry, Moe) wins between 0% and 100% of the votes. Assume that all votes are cast for one of the three candidates (no abstentions, write-ins, invalid ballots, etc.), which means that when you add the percentages for the three candidates, the result must be exactly 100%. Assume also that we’re recording percentage of the vote, not raw votes, and that the percentages are all integer values. (In Real Life we’d probably record the raw count and calculate the percentages, but in real life we wouldn’t be voting for Best Stooge Ever in the first place!) Here’s a Relax NG schema for the results of the election:
start = results
results = element results { election+ }
election = element election { year, stooge+ }
year = attribute year { xsd:gYear }
stooge = element stooge { name, xsd:int }
name = attribute name { "Curly" | "Larry" | "Moe" }
Here’s a sample XML document that is valid against the preceding schema:
50
35
15
53
33
14
]]>
We could have written a better Relax NG schema, but we didn’t, and although our sloppy schema works with the results above, it also allows erroneous results like the following:
<results>
<stooge name="Curly">55</stooge>
<stooge name="Larry">38</stooge>
<stooge name="Moe">11</stooge>
</results>
The problem here is that the three percentage
values total 104%, and no matter
how good our coding, it is not possible to prevent this type of error by using Relax
NG alone. Your assignment is to write a Schematron schema that verifies that the
three percentages always total exactly 100%. Test your results by creating the Relax
NG schema, your Schematron schema, and a sample XML document that you can validate
against both schemas in <oXygen/>. Enter correct and incorrect values and
verify that the Schematron schema is working correctly. For homework, upload only
your Schematron schema.
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
<pattern>
<rule context="election">
<assert test="sum(stooge) eq 100">
The sum of the vote percentages does not equal 100%.
</assert>
</rule>
</pattern>
</schema>
We’ve set the Schematron namespace as the default namespace with
xmlns="http://purl.oclc.org/dsdl/schematron
.
Notice that there is no namespace prefix in this statement, and when we set the
value of the @xmlns
attribute equal to a
value, we are declaring a default namespace, which will apply to the
element on which the declaration occurs (the root
<schema>
element) and all of its
descendants. We could, alternatively, have bound the Schematron namespace to the
prefix sch:
, which is what <oXygen/>
does by default. In that case our root element might have looked like:
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"
xmlns:sqf="http://www.schematron-quickfix.com/validator/process">
This version binds the prefix sch:
to the
Schematron namespace, which means that all elements that begin with this prefix
are in that namespace. In this case no default namespace is declared, so every
Schematron instruction will have to be preceded by the
sch:
namespace prefix. It is possible to do
both, that is, to bind a prefix to a namespace and to declare that namespace as
the default, although we don’t find that very useful, since either method alone
will do the job.
These two ways of ensuring that Schematron instruction elements are in the
Schematron namespace are equivalent, so you can use either one. They do have
different implications that may matter in other Schematron applications, though,
and we’ll discuss those when they come up. In our sample solution above we
removed the declaration of the Schematron Quick Fix namespace, which
<oXygen/> binds to the prefix sqf:
,
because we aren’t using it. We removed it just to simplify the display, but if
you prefer to leave it in and ignore it, that does no harm, and it will be
available if you later decide to use it. You can learn about SQF, which we have
found useful in real projects, in a demo
video from the <oXygen/> team.
The Schematron file that we wrote uses only one
<rule>
inside one
<pattern>
, and we defined the value of the
@context
attribute of our
<rule>
element (equivalent to the
@match
attribute in
<xsl:template>
elements in XSLT) as
election
, which is an XPath pattern
(not a full XPath path expression). Any
<election>
element in our document will be
submitted to any tests we define inside this
<rule>
. The
<assert>
element inside this
<rule>
uses the XPath
sum()
function to total the values of all
<stooge>
elements located on the child axis
of our current context, a single <election>
element, and compare that value to 100. It asserts that the sum will equal 100, and
therefore raises an error (using the error message that we wrote as the content of
the <assert>
element) if it doesn’t.
Inside the <assert>
, we write an error
message that Schematron will generate when this test is failed and the XML document
breaks the rules. We put, The sum of the vote percentages does not equal
100%,
but you could have written anything that you feel would be informative
to someone trying to correct the error.
The value of the @context
attribute on a
Schematron <rule>
element is an
XPath pattern. Like the value of the
@match
attribute on an
<xsl:template>
element, which is also an
XPath pattern, the value of @context
should
be just enough XPath to match the node where we want our Schematron rules to be
applied. We don’t need (= should not write) a full XPath expression because we
don’t have to navigate to the location; we just have to describe how to match
it. This means, among other things, that it is always a mistake to begin the
value of a @context
attribute with a
double slash. A leading double slash won’t prevent your code from
working, but it’s nonetheless a mistake because it makes it harder to read and
harder to understand.
Our rule fires once for each <election>
element in the document. There are different elections in different years, each
with its own <election>
element, and
they are all inside a single <results>
root element. The XPath pattern that we specify as the value of the
@context
attribute ensures that the rules
fire separately for each election, which is what we want, since if there is an
error in the values for one election, we want the validation to tell us which
election is the source of the error.
The XPath expressions used in the asserts and reports are relative to
the current context, so when we ask for the sum of
<stooge>
elements, we mean the sum of
<stooge>
element children (because the
child axis is the default XPath axis) of the
<election>
element being processed at
the moment. A common mistake is to write
sum(//stooge)
instead of
sum(stooge)
. The reason this is a mistake
is that if you have multiple elections you’ll be summing all of the
<stooge>
values in the entire document,
and not just in an individual <election>
element. If you want to sum the <stooge>
values that are children of a specific
<election>
element, you want to use the
child axis to restrict yourself to only those
<stooge>
elements.
We find it most helpful to read XPath path expressions from the left, path step
by path step, because each step specifies the current context(s) for the next
step. An XPath expression like //body/div
, then, means start at
the document node, find all
<body>
elements on its descendant axis, and then, for each
<body>
element, find all
<div>
elements on its child
axis.
We find it most helpful to read XPath patterns from the right. For example, an
XPath pattern like body/div
means find
all
. Reading from
the right helps us avoid thinking that we have to navigate to the leftmost
component of the pattern, and we don’t have to do that because XPath patterns
match, but they don’t traverse.<div>
elements that are children
of <body>
elements
Where we use XPath expressions and where we use XPath patterns is specified by
the languages that use XPath, and is not up to us. In Schematron, the value of
the @context
attribute is defined as an
XPath pattern and the value of the @test
attribute is defined as an XPath expression, for which the current context is
the node that the @context
attribute
matched. If @context
matches multiple nodes
(for example, if there are multiple
<election>
elements in the document, as
is the case here), the rule fires once for each of them, so only one of them
will be the current context at a given moment in the validation process.
You can stop here and consider the assignment complete, but for more Schematron practice, you’re welcome to add additional rules to check for additional types of error. The following types of errors could have been controlled by writing a better Relax NG schema, but for the purpose of learning Schematron, let’s do it in Schematron:
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">
<pattern>
<rule context="election">
<assert test="sum(stooge) eq 100">
The sum of the vote percentages should equal 100%.
</assert>
<assert test="count(stooge) eq 3">
There should be exactly 3 stooges.
</assert>
<assert test="count(stooge) eq count(distinct-values(stooge/@name))">
No two stooges should have the same name.
</assert>
</rule>
<rule context="stooge">
<report test="number(.) lt 0 or number(.) gt 100">
Vote percentages must be between 0 and 100.
</report>
</rule>
</pattern>
</schema>
To specify that there should be three stooges, we added a second
<assert>
within the same rule, since the
context is the same—that is, we want this new assertion to fire once for each
<election>
element. This time we use the
count()
function to count stooges on the child
axis, and compare that value to 3. To test that no stooges are repeated, we take
advantage of the @name
attribute and compare
the count of all stooges against the count of the distinct values of stooge names.
If all of the names are distinct from one another, the count of stooges will be
equal to the count of distinct stooge names.
Finally, we want to test that for any given stooge, the vote percentage is within the
range from 0 to 100. Since this is something that applies separately to each
individual <stooge>
, and not the
<election>
element as a whole, we created a
new <rule>
where the value of the
@context
attribute is now
stooge
. This means that it will fire once for
each <stooge>
element, and that it will
check the value for that individual stooge. Inside that rule, we used a
<report>
, which outputs its message when the
test inside it is true (because it is reporting that the real situation matches what
the test requires), as opposed to <assert>
,
which triggers when false (because it is informing the developer that something
asserted has failed to be satisfied). The test here is whether the percentage of
votes for the stooge being examined is less than 0 or greater than 100, and we
separate these using the XPath or
logical
operator.
eq
, =
,
and number()
To test the value of each stooge’s content we used the XPath
number()
function, which converts the
content of a <stooge>
element (a string
of characters) into a number. The reason we have to do this is that value
comparison (using eq
) requires not only
that there be exactly one item on each side of the comparison operator, but also
that they be of the same datatype. It looks to a human as if the stooge votes in
our XML are numbers, and therefore comparable to numbers in the XPath expression
inside the @test
attribute, but they could
just as easily be understood as strings of characters that happen to be digits.
Since XPath cannot know whether they represent a number or a string in the XML,
if we try to compare one of those values with a number in our XPath, we'll raise
an error about unmatched datatypes: Cannot compare xs:untypedAtomic to
xs:integer
. xs:untypedAtomic
means
that our Schematron knows that the value inside a
<stooge>
element is an atomic value, but
it cannot know whether it is a string or a number or any other specific type of
atomic value. Using the number()
function
inside our @test
to cast (the
technical term for convert
) the value to a number lets our XPath
comparison proceed.
Value comparison (like eq
) requires that the
datatypes on both sides of the comparison operator be the same, but general
comparison (like =
) does not. General
comparison will automatically treat the value in the XML as a number if we are
comparing it to a number, so we don’t have to cast it explicitly. We might think
that it would be better to use general comparison so that we don’t have to fuss
with the datatype ourselves, but because value comparison is stricter, it
provides more protection against coding errors, and our goal is not to reduce
error messages, but to reduce errors. Using general comparison instead of value
comparison here is not a mistake, but using value comparison to compare one
thing to one thing is better because it provides more protection against error,
and we use general comparison primarily when one of the comparands must be a
sequence.