Digital humanities

Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2025-03-29T17:57:00+0000

XProc basics

The following list describes basic features of XProc, simplified to focus on the type of XProc that we will write in our course. See also Part A of Martin Kraetke’s XProc 3.0 Tutorial.

XProc pipelines consist of steps.
Steps have input and output ports. The ports for each type of step are predefined, and you’ll need to look them up, as you need them, at https://xprocref.org or in the specs if you don’t already know how they work. Here are some examples:
- An XSLT step (]]>) has two input ports. One, called source, is for the XML being transformed. The other, called stylesheet, is for the XSLT stylesheet used to perform the transformation.
- An XSLT step has two output ports. One, called result, is for the transformed output document. The other, called secondary, is for other output. The secondary port matters only when our XSLT uses ]]> and we’ll ignore it in this unit.
- A validation step (e.g., ]]>, ]]>) has two input ports. One, called source, is for the XML being validated. The other, called schema, is for the schema used to perform the validation.
- A validation step has two output ports. One, called result, is for the original XML document, which (unless it is invalid) emerges unchanged and is passed along the pipeline to whatever the next step might be. The other, called report, is for the validation results. We usually ignore the report port because an invalid document raises an error, so we don’t need to look at the result port to know whether the document was or was not valid.
The pipeline as a whole most often (and in all examples in this unit) defines a starting point with a ]]> step that ingests a single source document.
The pipeline as a whole most often (and in all examples in this unit) defines a final output with a single ]]> step. In some examples in this unit we’ll define this step as empty because we’re going to ignore it, instead writing the output that we care about to disk with ]]>.
In a typical pipeline with one input document, the processing passes the document through a series of steps, some of which transform or modify it, until it reaches the end, whereupon the pipeline emits the document in its final state.
A step is connected automatically to the following one in the order in which they appear in the XProc file, which means that unless you say otherwise, the steps are applied in the order listed. This is called an implicit connection.
You can tell a step to take its primary input from a source other than the immediately preceding step. This is called an explicit connection, and it overrides any implicit connection. Explicit connections are specified only on the step that receives the input, that is, a step may pull its input explicitly from a specified other step, but a step does not push (send) its output anywhere explicitly.

A common use of explicit connections involves doing multiple things to the same source. For example, if you want to use XSLT to transform XML to both HTML and SVG, you might use an implicit connection between consecutive steps to get from the XML to the HTML, but once you’ve done that, you can’t make a second implicit connection from the XML because implicit connections are only between adjacent steps. You can, though, make an explicit connection to pull the XML input into a step that creates SVG.