Digital humanities

Maintained by: David J. Birnbaum ( [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2019-04-16T15:54:14+0000

Test #2: Relax NG: Answers

Your task for this test was to create a schema for an XML document that found a balance between constraint and flexibility. Our solution (not the only possible one) is below:

start = poem
poem = element poem { title, author, date, stanza+ }
title = attribute title { text }
author = attribute author { text }
date = attribute date { text }
stanza = element stanza { n, line+ }
n = attribute n { text}
line = element line { mixed { ( ref | quote )* } }
ref = element ref { ( person | relig | school | astron )?, text }
quote = element quote { speaker, text }
person = attribute person { text }
relig = attribute relig { text }
school = attribute school { text }
astron = attribute astron { text }
speaker = attribute speaker { text }

Some of you defined <poem> elements using aspects of ∏Russian doll syntax, which is also fine. That approach might begin with something like:

poem = element poem { metadata, stanza+ }
metadata = attribute title { text }, attribute author { text }, attribute date { text }

We’ve aimed here for a balance in our schema between constraining the content as much as possible (to prevent errors) while allowing the same schema to be used for other (hypothetical) documents of this type. You may have struck this balance in a different place than we did, which is fine; the important thing is to consider the competing needs for control, on the one hand, and flexiblity, on the other.

The <poem> element can be defined in two different ways; one way using a more traditional style, and another utilizing the Russian doll style of Relax NG. In Russian doll syntax, you define all three of the attributes within your <title> element as another line called metadata. The metadata label is not an element or an attibute—it is simply a place for you to list and define all of the attributes of the <title> element in the same place. Using the Russian doll structure was in no means required, but we wanted to include it as a possibility should you have choosed to use it. You could, by the way, bypass the metadata label with something like:

poem = element poem {
    attribute title { text },
    attribute author { text },
    attribute data { text },

Our <ref> element allows both for choice among the four listed attributes. Since this is one poem, you may have kept it this way or you may have chosen to open up the possibility of different attributes. However, one single object in the text is never tagged as referring to more than one thing, nor could it be; for this reason, you want to limit the attributes on <ref> elements to 1 (or perhaps 0 or 1).

Some of you used the keyword mixed with attributes, for example, with a statement like:

quote = element quote { mixed { speaker } }

This works but it’s misleading because attributes are not mixed into content the way elements are; they are stashed away inside the start tag. For that reason, it is more idiomatic to use the mixed keyword only for situations where elements and plain text are mixed together in the content of an element, separating out attributes and listing them first.