Maintained by: David J. Birnbaum (djbpitt@gmail.com)
Last modified:
2021-02-15T18:19:48+0000
Your task is to create a Relax NG schema for our XML version of the first 34 lines of a translation of Homer’s Iliad, available at http://dh.obdurodon.org/relax-ng_test_2214.xml.
Your schema should constrain the XML while also allowing for the reasonable
integration of new material, and there may be more than one way to do that. Hint:
you will want to use repeatable or-groups when you model mixed content, and
you can read about those under Mixed content
in our Relax NG content models posting.
You are not permitted to change the XML, so your schema has to be valid against the XML as given. Make sure that you associate your schema with the XML and use it to validate the XML file. If the XML is not valid against the schema, you’ll want to find the problem and adjust the schema.
Should you have any questions about the test, please post them in the Relax NG
channel of our Slack discussion board (and when you respond to someone’s query,
which you are encouraged to do, you can nudge them in the right direction, but don’t
give away the answer). You may use any reference material you would like while
creating your schema (books, Internet, etc.), except that you cannot receive help
from another person, and your work needs to be your own. When you are finished,
upload your schema to Canvas, where we have created an assignment
for it (do
not upload the XML).
There are multiple ways to model this document type effectively in Relax NG, so your solution need not have matched ours exactly. Among other things, Relax NG schemas are typically written to model not just a single document, but a document type—in this case, perhaps this poem and others that may be similar to it. For that reason, you will want to construct a schema that is not overly permissive, but also one that is not overly restrictive. Below is one solution:
start = epic_poem epic_poem = element epic_poem { metadata, body } # # ##### # Metadata # ##### # metadata = element metadata { title, author, translator, translation_language, translation_date, source } title = element title { text } author = element author { text } translator = element translator { text } translation_language = element translation_language { text } translation_date = element translation_date { xsd:int } source = element source { text } # # ##### # Body # ##### # body = element body { book?, line+ } book = element book { text } line = element line { number, mixed { (location | character | epithet | emotion | character_description)* } } number = attribute number { xsd:int } # # ##### # Inline content # ##### # location = element location { modern_place, type, text } modern_place = attribute modern_place { text } type = attribute type { text } character = element character { status, text } status = attribute status { text } epithet = element epithet { (who | where)?, mixed { character* } } who = attribute who { text } where = attribute where { text } emotion = element emotion { type, text } character_description = element character_description { who, text }
We used Relax NG comments, which begin with hash marks, to make it easier to find the different sections of our schema: metadata, body, and inline elements (elements contained in mixed content). You don’t have to add comments to your Relax NG, especially if it is as short as this schema, but we normally would.
Relax NG doesn’t care about the order of your declarations, but we find that grouping and labeling them this way makes life easier for the developer. A line that begins with a single hash mark in Relax NG is a comment, but lines that begin with two consecutive hash marks are not, so we insert a space after the first hash mark. By the way, comments do not have to begin only at the start of a line; you can include a comment at the end of a line of schema code, e.g.:
who = attribute who { text } # person to whom an epithet refers
We used a repeatable or-group to model mixed content in our defintion of
<line>
elements. Because attributes are not mixed in with
plain text the way child elements are (attributes are sequestered inside the
start tag), we don’t include them inside the mixed portion of the content model.
Relax NG won’t care if you write them inside the mixed portion, but it’s best to
make your schema as self-documenting as possible. Additionally, attributes are
not repeatable, so putting them inside a repeatable or-group would misrepresent
them as repeatable. Well-formedness ensures that you won’t be allowed to repeat
them, but that’s all the more reason not to let your schema say that you
can.
Epithets in this text may be about persons, in which case they have a
who
attribute, or about places, in which case they have a
where
attribute, or not about persons or places. Since a single
epithet is not simultaneously about persons and places, we said an epithet could
have either one who
attribute, one where
attribute, or
neither, but it couldn’t have both a who
and a where
.
Our working assumption that who
and where
are mutually
exclusive may prove incorrect once we have more text.
Years in this document are positive integer values, so we used the
xsd:int
datatype to contrain them to integer values. There is
also a datatype specifically for years, xsd:gYear
(see http://books.xmlschemata.org/relaxng/ch19-77127.html for
discussion).
Whether you describe a value with the keyword text
or an or-group of
string values depends on the types of values permitted. For example, in this
document the only values that occur for the status
attribute are
god
, goddess
, and mortal royalty
. If
you think those are the only values you will encounter in your project, it would
be better to model the status
attribute as:
status = attribute status { "god" | "goddess" | "mortal royalty" }
If, though, you think that other values are likely and unpredictable,
text
is more appropriate. We sometimes start development with
one of these options and then switch to the other as we accumulate more
information.