Digital humanities

Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2021-12-27T22:03:54+0000

Examples of Schematron from our projects

Interlinear glossing

Linguistic corpora often record transcriptions in multiple tiers, such as a transcription of the original utterance, a word-by-word gloss with grammatical information, and a more fluid, natural-language translation. The set of notational conventions most commonly used for this purpose by corpus linguists have been codified in the Leipzig Glossing Rules (http://www.eva.mpg.de/lingua/resources/glossing-rules.php). Here is a Russian example from that document (modified to add the original orthography tier):

Orth	Мы	с	Марко	поеха-л-и	автобус-ом	в	Переделкино
Translit	My	s	Marko	poexa-l-i	avtobus-om	v	Peredelkino.
Gram	1PL	COM	Marko	go-PST-PL	bus-INS	ALL	Peredelkino.
ILG	we	with	Marko	go-PST-PL	bus-by	to	Peredelkino.
Free	'Marko and I went to Peredelkino by bus.'

(COM = comitative; ALL = allative)

Other tiers might include International Phonetic Alphabet (IPA) and interlinear glossing or free translation into other languages.

What Schematron can validate: Each of the computationally tractable tiers should have the same number of words, and each word should have the same number of hyphens.

Database referential integrity

The Rusian genealogy project at http://genealogy.obdurodon.org/ is an XML database. There are entries for people and for marriages, where a marriage contains pointers to the participants and to their offspring.

What Schematron can validate: The targets of all pointers must exist: husbands, wives, children, parents.

Sequential numbering

In the Annotated Afanas′ev Library http://aal.obdurodon.org, page ranges are entered as follows:

<text-pages-r>
    <start>36</start>
    <end>37</end>
</text-pages-r>

What Schematron can validate: The value of the contents of the <end> element must be greater than or equal to the value of the contents of the <start> element.