Maintained by: David J. Birnbaum (djbpitt@gmail.com)
Last modified:
2025-02-28T21:00:37+0000
The following list describes basic features of XProc, simplified to focus on the type of XProc that we will write in our course. See also Part A of Martin Kraetke’s XProc 3.0 Tutorial.
XProc pipelines consist of steps.
Steps have input and output ports. The ports for each type of step are predefined, and you’ll need to look them up, as you need them, at https://xprocref.org or in the specs if you don’t already know how they work. Here are some examples:
An XSLT step
(]]>
)
has two input ports. One, called source,
is for the XML being transformed. The other,
called stylesheet, is for the XSLT
stylesheet used to perform the
transformation.
An XSLT step has two output ports. One, called
result, is for the transformed output
document. The other, called secondary, is
for other output. The secondary port matters only
when our XSLT uses
]]>
and we’ll ignore it in this unit.
A validation step (e.g.,
]]>
,
]]>
)
has two input ports. One, called source,
is for the XML being validated. The other, called
schema, is for the schema used to perform
the validation.
A validation step has two output ports. One, called result, is for the original XML document, which (unless it is invalid) emerges unchanged and is passed along the pipeline to whatever the next step might be. The other, called report, is for the validation results. We usually ignore the report port because an invalid document raises an error, so we don’t need to look at the result port to know whether the document was or was not valid.
The pipeline as a whole most often (and in all examples in this unit)
defines a starting point with a
]]>
step
that ingests a single source document.
The pipeline as a whole most often (and in all examples in this unit)
defines a final output with a single
]]>
step. In some examples in this unit we’ll define this step
as empty because we’re going to ignore it, instead
writing the output that we care about to disk with
]]>
.
In a typical pipeline with one input document, the processing passes the document through a series of steps, some of which transform or modify it, until it reaches the end, whereupon it emits the document in its final state.
A step is connected automatically to the following one in the order in which they appear in the XProc file, which means that unless you say otherwise, the steps are applied in the order listed. This is called an implicit connection.
You can tell a step to take its primary input from a source other than the immediately preceding step. This is called an explicit connection, and it overrides any implicit connection. Explicit connections are specified only on the step that receives the input, that is, a step may pull its input explicitly from a specified other step, but a step does not push (send) its output anywhere explicitly.
A common use of explicit connections involves doing multiple things to the same source. For example, if you want to use XSLT to transform XML to both HTML and SVG, you might use an implicit connection between consecutive steps to get from the XML to the HTML, but once you’ve done that, you can’t make a second implicit connection from the XML because implicit connections are only between adjacent steps. You can, though, make an explicit connection to pull the XML input into a step that creates SVG.