Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2014-09-02T16:30:48+0000


Relax NG content models

Mixed content

The general guide to Relax NG at http://dh.obdurodon.org/relaxng.html describes how to model elements that contain only other elements and elements that contain only text. To refresh your memories:

In addition to the preceding, Relax NG provides a special notation for elements with mixed content, that is, with a combination of plain text and elements. If you have a paragraph that can contain, say, a mixture of plain text, title elements, and emphasis elements, this can be described in Relax NG as

paragraph = element paragraph { mixed { ( title | emphasis )* } }

Note that there’s a set of curly braces nested inside another set of curly braces. Reading from the outside in, this means that an element of type <paragraph> contains mixed content, that is, plain text mixed in with whatever is inside the embedded curly braces. What is embedded is an or-group, which says either an element of type <title> or an element of type <emphasis>, and that you make that choice zero or more times. The vertical bar (pipe) means make a choice; the asterisk means do it zero or more times. All together, the model means you’ll have zero or more instances of elements of type <title> and <emphasis> mixed into plain text.

Empty elements

As we discussed earlier, elements may also be empty. An example of the Relax NG syntax for describing an empty element is

lineBreak = element lineBreak { empty }

empty is also a reserved word, so it means that the element is empty; it does not mean that a character contains an element called <empty>. (You can have an element called <empty>, but you have to describe it differently in Relax NG.)

An empty element, that is, one with no text or data content, may nonetheless have attributes. For example:

character = element character { id, type, gender, empty }
id = attribute xml:id { xsd:ID }
type = attribute type { text }
gender = attribute gender { "m" | "f" }

The preceding Relax NG snippet means that a <character> element has four components: an xml:id attribute, a type attribute, a gender attribute, and no data content. These Relax NG statements can be understood as follows:

This construction might be used as follows in an XML document:

<character xml:id="oliverTwist" type="orphan" gender="m"/>