Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2021-02-15T18:19:48+0000


Relax NG test answer key

The task

Your task is to create a Relax NG schema for our XML version of the first 34 lines of a translation of Homer’s Iliad, available at http://dh.obdurodon.org/relax-ng_test_2214.xml.

Your schema should constrain the XML while also allowing for the reasonable integration of new material, and there may be more than one way to do that. Hint: you will want to use repeatable or-groups when you model mixed content, and you can read about those under Mixed content in our Relax NG content models posting.

You are not permitted to change the XML, so your schema has to be valid against the XML as given. Make sure that you associate your schema with the XML and use it to validate the XML file. If the XML is not valid against the schema, you’ll want to find the problem and adjust the schema.

Should you have any questions about the test, please post them in the Relax NG channel of our Slack discussion board (and when you respond to someone’s query, which you are encouraged to do, you can nudge them in the right direction, but don’t give away the answer). You may use any reference material you would like while creating your schema (books, Internet, etc.), except that you cannot receive help from another person, and your work needs to be your own. When you are finished, upload your schema to Canvas, where we have created an assignment for it (do not upload the XML).

Solution

There are multiple ways to model this document type effectively in Relax NG, so your solution need not have matched ours exactly. Among other things, Relax NG schemas are typically written to model not just a single document, but a document type—in this case, perhaps this poem and others that may be similar to it. For that reason, you will want to construct a schema that is not overly permissive, but also one that is not overly restrictive. Below is one solution:

start = epic_poem
epic_poem = element epic_poem { metadata, body }
#
# #####
# Metadata
# #####
#
metadata =
    element metadata { title, author, translator, translation_language, translation_date, source }
title = element title { text }
author = element author { text }
translator = element translator { text }
translation_language = element translation_language { text }
translation_date = element translation_date { xsd:int }
source = element source { text }
#
# #####
# Body
# #####
#
body = element body { book?, line+ }
book = element book { text }
line =
    element line {
        number,
        mixed { (location | character | epithet | emotion | character_description)* }
    }
number = attribute number { xsd:int }
#
# #####
# Inline content
# #####
#
location = element location { modern_place, type, text }
modern_place = attribute modern_place { text }
type = attribute type { text }
character = element character { status, text }
status = attribute status { text }
epithet =
    element epithet {
        (who | where)?,
        mixed { character* }
    }
who = attribute who { text }
where = attribute where { text }
emotion = element emotion { type, text }
character_description = element character_description { who, text }

Discussion

Comments

We used Relax NG comments, which begin with hash marks, to make it easier to find the different sections of our schema: metadata, body, and inline elements (elements contained in mixed content). You don’t have to add comments to your Relax NG, especially if it is as short as this schema, but we normally would.

Relax NG doesn’t care about the order of your declarations, but we find that grouping and labeling them this way makes life easier for the developer. A line that begins with a single hash mark in Relax NG is a comment, but lines that begin with two consecutive hash marks are not, so we insert a space after the first hash mark. By the way, comments do not have to begin only at the start of a line; you can include a comment at the end of a line of schema code, e.g.:

who = attribute who { text } # person to whom an epithet refers

Mixed content

We used a repeatable or-group to model mixed content in our defintion of <line> elements. Because attributes are not mixed in with plain text the way child elements are (attributes are sequestered inside the start tag), we don’t include them inside the mixed portion of the content model. Relax NG won’t care if you write them inside the mixed portion, but it’s best to make your schema as self-documenting as possible. Additionally, attributes are not repeatable, so putting them inside a repeatable or-group would misrepresent them as repeatable. Well-formedness ensures that you won’t be allowed to repeat them, but that’s all the more reason not to let your schema say that you can.

Epithets

Epithets in this text may be about persons, in which case they have a who attribute, or about places, in which case they have a where attribute, or not about persons or places. Since a single epithet is not simultaneously about persons and places, we said an epithet could have either one who attribute, one where attribute, or neither, but it couldn’t have both a who and a where. Our working assumption that who and where are mutually exclusive may prove incorrect once we have more text.

Years

Years in this document are positive integer values, so we used the xsd:int datatype to contrain them to integer values. There is also a datatype specifically for years, xsd:gYear (see http://books.xmlschemata.org/relaxng/ch19-77127.html for discussion).

Text vs or-groups

Whether you describe a value with the keyword text or an or-group of string values depends on the types of values permitted. For example, in this document the only values that occur for the status attribute are god, goddess, and mortal royalty. If you think those are the only values you will encounter in your project, it would be better to model the status attribute as:

status = attribute status { "god" | "goddess" | "mortal royalty" }

If, though, you think that other values are likely and unpredictable, text is more appropriate. We sometimes start development with one of these options and then switch to the other as we accumulate more information.