Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2021-12-27T22:03:54+0000


Examples of Schematron from our projects

Interlinear glossing

Linguistic corpora often record transcriptions in multiple tiers, such as a transcription of the original utterance, a word-by-word gloss with grammatical information, and a more fluid, natural-language translation. The set of notational conventions most commonly used for this purpose by corpus linguists have been codified in the Leipzig Glossing Rules (http://www.eva.mpg.de/lingua/resources/glossing-rules.php). Here is a Russian example from that document (modified to add the original orthography tier):

Orth Мы с Марко поеха-л-и автобус-ом в Переделкино
Translit My s Marko poexa-l-i avtobus-om v Peredelkino.
Gram 1PL COM Marko go-PST-PL bus-INS ALL Peredelkino.
ILG we with Marko go-PST-PL bus-by to Peredelkino.
Free 'Marko and I went to Peredelkino by bus.'

(COM = comitative; ALL = allative)

Other tiers might include International Phonetic Alphabet (IPA) and interlinear glossing or free translation into other languages.

What Schematron can validate: Each of the computationally tractable tiers should have the same number of words, and each word should have the same number of hyphens.

Database referential integrity

The Rusian genealogy project at http://genealogy.obdurodon.org/ is an XML database. There are entries for people and for marriages, where a marriage contains pointers to the participants and to their offspring.

What Schematron can validate: The targets of all pointers must exist: husbands, wives, children, parents.

Sequential numbering

In the Annotated Afanas′ev Library http://aal.obdurodon.org, page ranges are entered as follows:

<text-pages-r>
    <start>36</start>
    <end>37</end>
</text-pages-r>

What Schematron can validate: The value of the contents of the <end> element must be greater than or equal to the value of the contents of the <start> element.