Digital humanities

Last modified: 2014-09-02T16:30:48+0000

Relax NG content models

Mixed content

The general guide to Relax NG at describes how to model elements that contain only other elements and elements that contain only text. To refresh your memories:

In addition to the preceding, Relax NG provides a special notation for elements with mixed content, that is, with a combination of plain text and elements. If you have a paragraph that can contain, say, a mixture of plain text, title elements, and emphasis elements, this can be described in Relax NG as

paragraph = element paragraph { mixed { ( title | emphasis )* } }

Note that there’s a set of curly braces nested inside another set of curly braces. Reading from the outside in, this means that an element of type <paragraph> contains mixed content, that is, plain text mixed in with whatever is inside the embedded curly braces. What is embedded is an or-group, which says either an element of type <title> or an element of type <emphasis>, and that you make that choice zero or more times. The vertical bar (pipe) means make a choice; the asterisk means do it zero or more times. All together, the model means you’ll have zero or more instances of elements of type <title> and <emphasis> mixed into plain text.

Empty elements

As we discussed earlier, elements may also be empty. An example of the Relax NG syntax for describing an empty element is

lineBreak = element lineBreak { empty }

empty is also a reserved word, so it means that the element is empty; it does not mean that a character contains an element called <empty>. (You can have an element called <empty>, but you have to describe it differently in Relax NG.)

An empty element, that is, one with no text or data content, may nonetheless have attributes. For example:

character = element character { id, type, gender, empty }
id = attribute xml:id { xsd:ID }
type = attribute type { text }
gender = attribute gender { "m" | "f" }

The preceding Relax NG snippet means that a <character> element has four components: an xml:id attribute, a type attribute, a gender attribute, and no data content. These Relax NG statements can be understood as follows:

This construction might be used as follows in an XML document:

<character xml:id="oliverTwist" type="orphan" gender="m"/>