Maintained by: David J. Birnbaum (djbpitt@gmail.com) 
            ![Creative Commons BY-NC-SA 3.0 Unported License [Creative Commons BY-NC-SA 3.0 Unported License]](http://www.obdurodon.org/images/cc/88x31.png) Last modified:
        
        2022-03-30T00:38:28+0000
        
    
    Last modified:
        
        2022-03-30T00:38:28+0000
This activity uses Skyrim (http://dh.obdurodon.org/skyrim.xml),
            a small text originally prepared by a former participant in our course. The file has a
            <cast> element at the top that contains, as
            child element, a list of characters
            (<character> elements) and factions
            (<faction> elements) that are mentioned in the
            <body> section, below the
            <cast> element. For this activity we are going
            to ignore characters initially and concentrate only on factions.
An entry for a faction in the cast list looks like:
<faction id="MythicDawn" alignment="evil"/>Meanwhile, a reference to a faction in the body looks like:
The <faction ref="MythicDawn">assassins</faction> first attacked …Note that we use the <faction> element
            differently inside the <cast> element (where it
            has a unique @id attribute, plus other attributes
            we will ignore for now) and inside the <body>
            element (where it has a @ref attribute). Any
            mention of a <faction> element in the body must
            point to a matching <faction> element in the
            cast list. The way the pointing happens is that a
            <faction> element in the body always has a
            @ref attribute, and the value of that
            @ref attribute should point to (= match the value
            of) the @id attribute of some
            <faction> element in the cast list. In other
            words, 1) there should be no <faction> element
            in the body that does not point to a <faction>
            element in the cast list, and 2) there should be no
            <faction> element in the cast list that is not
            pointed to by at least one <faction> element in
            the body. The attribute that does the pointing is the
            @ref on the
            <faction> element in the
            <body>; the target of the pointing is an
            @id attribute on a
            <faction> element in the cast list.
You can assume that the developer has used a Relax NG schema and declared that the
            @id attribute is of type
            xsd:ID, that is, that it is an XML id. That means
            that it has certain properties, including constraints on the characters that it can
            contain and the fact that it is unique in the document. You can also assume that your
            Relax NG schema verifies that all <faction>
            elements in the cast list have an @id attribute and
            all <faction> elements in the
            <body> have a
            @ref attribute.
There are at least two ways a developer could mangle these cross-references:
<faction> element
                in the body with a @ref attribute that doesn’t
                point to (that is, correspond to) the @id
                attribute of a <faction> element in the cast
                list. For example, the corresponding @id
                attribute might be on a <character> element
                in the cast list, instead of on a <faction>
                element, or there might be no corresponding @id
                attribute at all.<faction>
                element in the cast list, that is, one that is not pointed to by the
                @ref attribute of any
                <faction> element in the body. Since the
                inventory of factions in the cast list is supposed to summarize the factions that
                occur in the body, such an error would bring the list out of sync with the reality
                of the body.We want to write a Schematron schema that will guard against the types of error described above by checking for consistency in two ways:
<faction>
                elements in the cast list to verify that all factions mentioned there also occur in
                the body. That is, there should be no faction in the cast list that is not also
                present in the body.<faction>
                elements in the body that verifies that they have a
                @ref attribute that points to an
                @id attribute on a
                <faction> element in the cast list. Note
                that it isn’t enough to check for the existence of a corresponding
                @id attribute, since there are
                @id attributes on
                <character> elements in the cast list, and
                not only on <faction> elements. Not only
                must the @id exist, but it must be associated
                specifically with a <faction> element in the
                cast list, and not with a <character>
                element.<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"
    xmlns:sqf="http://www.schematron-quickfix.com/validator/process"
    xmlns="http://purl.oclc.org/dsdl/schematron">
    <pattern>
        <let name="cast-factions" value="//cast/faction/@id"/>
        <let name="body-factions" value="//body//faction/@ref"/>
        <rule context="cast/faction">
            <assert test="@id = $body-factions">The @id "<value-of select="@id"/>" occurs in the cast
                list, but not in the body.</assert>
        </rule>
        <rule context="body//faction">
            <assert test="@ref = $cast-factions">The @ref "<value-of select="@ref"/>" occurs in the
                body, but not in the cast list.</assert>
        </rule>
    </pattern>
</schema>We begin by setting some convenience variables. What we mean by convenience
            variable is that we don’t have to use them (we could have put the XPath expressions
            directly in the @test attributes), but they make
            our code more legible. $cast-factions is a sequence
            of all @id attributes on all
            <faction> elements in the cast list.
            $body-factions is a sequence of all
            @ref attributes on all
            <faction> elements in the
            <body>. (We could have used
            distinct-values() to get rid of the duplicates in
            the list of @ref values, but they do no harm in the
            tests we’re running, so we just left them in. There cannot be any duplicates in the list
            of @id values because we have declared them as type
            xsd:ID in our Relax NG schema.)
The first rule fires on each <faction> element in
            the cast list. It checks whether the @id attribute
            value for that element matches the value of any of the
            @ref attributes on
            <faction> elements in the
            <body>. If not, it reports an error. To make the
            report more informative, we use the Schematron element
            <value-of> to print the offending value.
We use general equals (=) for this test,
            rather than value comparison (eq).
            General equals takes two sequences of any length and tests whether any member of one is
            a member of the other. If so, the test succeeds—no matter how many members of the
            sequence don’t match! We have a sequence of one item on the left (the
            @id of the
            <faction> in the cast list that we’re testing at
            the moment, and it qualifies as a sequence in the XPath sense even if it’s a sequence of
            one item) and a sequence of many items on the right (all
            @ref values on all
            <faction> elements in the
            <body>), so this test will succeed whenever the
            @id we’re examining has a matching
            @ref. This type of one-to-many general comparison
            is very common in digital humanities coding. (Value comparison with
            eq does only one-to-one comparison, so if you try
            eq here, you’ll get an error message because there
            is a sequence of more than one item on the right side.)
The second rule does the reverse. It fires on every
            <faction> element in the
            <body> and checks whether the value of the
            @ref attribute matches the value of an
            @id attribute on a
            <faction>element in the cast list.
We could have hung the rules on the @id and
            @ref attribute values instead of on the
            <faction> elements that are their parents
            (making the necessary modifications to the paths in the code), and there’s no particular
            reason to favor one of these strategies over the other.
It’s possible to write rules that fire on <cast>
            or even on <skyrim> and that run one check of
            the entire document. This is much harder to code, although it can be done, but it’s also
            less informative, since it doesn’t associate the error message with a specific offending
            element. Even if you manage to poke the offending value into the error report, the red
            squiggly line will show up on <cast> or
            <skyrim>, so you’ll have to work harder to find
            the element you need to fix.
Some students in past semesters tried to run a single, global test on
            <cast> or
            <skyrim> just to count the number of
            @id values on
            <faction> elements in the cast list and compare
            that number to the count of distinct values of @ref
            attributes on <faction> elements in the
            <body>. That’s not a tenable strategy because
            if, say, the factions in the cast list are A
, B
, and C
 and the ones
            in the <body> are X
, Y
, and
                Z
, there are three of each, but they don’t correspond, you would want that to
            be reported as an error, and you can’t do that just by counting them.
<character> elements?One might think one could use the same type of validation to check for cross-references
            on <character> elements: is every character
            mentioned in the cast list also encountered in the
            <body> and does every character mentioned in the
            <body> have a
            @ref attribute that points to the
            @id attribute of a
            <character> element in the cast list? This turns
            out to be harder than with factions because there are elements in the body like:
… the <character ref="hero Jauffre MartinSeptim">three of them</character> made their way …The problem here is that there is no <character>
            element in the cast list with an @id attribute
            whose value is hero Jauffre MartinSeptim
. Instead, this is a pointer to three
            separate characters in the header. The strategy for checking coreference therefore has
            to involve breaking apart the @ref attribute and
            checking each of the three pointers separately. This is the sort of task for which the
            XPath tokenize() function was created.
There is a hypothetical parallel problem concerning the other half of the assignment.
            Suppose there is a
            <character id="Alex" loyalty="empire" alignment="neutral"/>
            element in the head, but the only time Alex occurs in the body is in combination with
            another character, e.g.,
            <character ref="Alex Alathia">. We can’t just
            check whether there is a @ref attribute in the body
            that matches the string Alex
 because there isn’t; here, too, we have to break
            apart the value of the @ref and check each part
            separately. This situation doesn’t happen to occur in our text, but it is potentially
            possible and therefore something against which a well-designed development environment
            would protect the user.
To avoid cluttering the screen, the Schematron below checks only
            <character> elements. In real life, you’d
            combine it with the one above.
<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"
    xmlns:sqf="http://www.schematron-quickfix.com/validator/process">
    <sch:pattern>
        <sch:let name="cast-characters" value="//cast/character/@id"/>
        <sch:let name="body-characters"
            value="distinct-values(//body//character/@ref | tokenize(.))"/>
        <sch:rule context="cast/character">
            <sch:assert test="@id = $cast-characters">The cast item <sch:value-of select="@id"/>
                does not appear in the body (<sch:value-of select="$body-characters"/>)</sch:assert>
        </sch:rule>
        <sch:rule context="body//character">
            <sch:assert test="string-length(normalize-space(@ref)) gt 0">The @ref attribute on
                    <sch:value-of select="."/> is missing or empty.</sch:assert>
            <sch:assert
                test="
                    every $i in tokenize(@ref) satisfies $i = $cast-characters"
                >There is a character entry for "<sch:value-of select="."/>" (@ref = <sch:value-of
                    select="@ref"/>) that doesn’t match an @id value in the cast list (<sch:value-of
                    select="$cast-characters"/>)</sch:assert>
        </sch:rule>
    </sch:pattern>
</sch:schema>We start by creating variables that hold deduplicated lists of
            @id values for
            <character> elements in the cast list and
            @ref attributes on
            <character> elements in the body. Becausee the
            @ref attributes may contain multiple,
            whitespace-separated values, we tokenize those values on whitespace (the default with
            the tokenize() function) before removing the
            duplicates.
In our first rule, which fires on <character>
            elements that are children of <cast>, we check
            the @id attribute on every
            <character> element in the cast list to verify
            that it is pointed to by at least one @ref
            attribute on a <character> element in the
            <body>
Our second rule verifies first that every
            <character> element in the
            <body> has a
            @ref attribute that contains real characters. The
            developer could have checked this in Relax NG by making the
            @ref attribute obligatory, but she didn’t. To our
            surprise, although we had used this document for other exercises previously, until we
            wrote this Schematron rule we had never noticed that there are two
            <character> elements in the
            <body> that don’t have any
            @ref attribute! This is real inadvertent error, and
            had we already learned Schematron when the original developer encoded this file, she
            would have been able to use it to catch this error and fix it.
Once we’ve confirmed that there is a @ref attribute
            on the <character> element in the
            <body> that we’re looking at at the moment, we
            use the XPath tokenize() function to break it apart
            into pieces. We then use the
            every $x in Y satisfies construction (Kay, p. 646
            ff.) to check each one individually.
The Relax NG xsd:ID datatype is guaranteed to be
            unique in the document, and it is also guaranteed to conform to the lexical
            specification (= spelling rules) for an XML non-colonized name, abbreviated
            NCName. NCNames must begin with an alphabetic character and can otherwise contain
            alphanumeric characters and selected punctuation. They cannot contain most punctuation
            characters and they cannot contain whitespace characters. If you declare an attribute as
            being of type xsd:ID in your schema and try to use
            an illegal character, <oXygen/> will notify you of the error. You can read a more
            precise, human-readable description of what is and is not permitted in an NCName (and
            therefore in an attribute value of type xsd:ID) at
                http://stackoverflow.com/questions/1631396/what-is-an-xsncname-type-and-when-should-it-be-used.
Attributes declared as the Relax NG datatype
            xsd:IDREF must have values that match an item of
            type xsd:ID in the same document. This means that
            they have the same requirements about legal and illegal characters, and they also have
            to match (that is, refer to) a real declared xsd:ID
            value in the same document. There is also an
            xsd:IDREFS datatype which refers to a
            white-space-delimited set of one or more xsd:ID
            values, which means, for example, that you can tag the word they
 in the body of
            your text and have it refer to the Three Stooges with:
… and then <character ref="curly larry moe">they</character> said …In the preceding example, if the @ref attribute were
            declared as having type xsd:IDREFS, <oXygen/>
            would verify that there were values of type xsd:ID
            for curly
, larry
, and moe
 in the document, and it would report an
            error if it couldn’t find all three.
The limitation of ID/IDREF is that it can only compare an
            xsd:IDREF value to all
            xsl:ID values in the document. This imposes two
            important limitations on its utility:
@ref
                attribute on a <faction> element in the
                <body> points to an
                xsd:ID somewhere, but it cannot determine
                whether the xsd:ID is specifically on a
                <faction> element in the
                <cast>. This means that ID/IDREF validation
                cannot tell whether an attribute of type
                xsd:IDREF or
                xsd:IDREFS points to an attribute of type
                xsd:ID on a
                <character> or .on a
                <faction>.xsd:IDREF attribute
                value in one document against an xsd:ID
                attribute value in a different document. ID/IDREF valuation happens only within a
                single document.The Schematron strategy illustrated above does not suffer from either of these limitations, and this example illustrates how it overcomes the first of them.