Maintained by: David J. Birnbaum (djbpitt@gmail.com)
Last modified:
2025-03-05T18:02:02+0000
See the Assignment page for a discussion of the data and the task. Your solution does not have match ours as long as it does what you want; in ours we included steps for all of the tasks suggested in the assignment. Our XProc is below, followed by line comments:
]]>
Here’s how it works:
Lines 2–5: Note the namespace declarations inside the start-tag.
The cx:
namespace prefix is used for XProc extension steps, that
is, those that are not part of any of the Standard step and optional libraries.
The xs:
namespace prefix is used for standard datatypes, such as
the Boolean (true/false) type in Line 11.
Line 11: A static parameter is a variable that does not
depend on input data. In this case we declare a
$debug
parameter that governs 1) whether to
save movies.xml (in addition to movies.xhtml and
movies.svg, which are always created) and 2) whether to display progress
messages to the screen as the pipeline is processed. The default value is set to
false()
(no progress messages), and we can
override that and set it to true()
when we
run the pipeline. To run with debug output (note the different syntax for
specifying a true value for XML Calabash and MorganaXProc-IIIse):
xmlcalabash moves.xpl debug="?true()"
morgana moves.xpl -static:debug=true
Without debug information:
xmlcalabash moves.xpl
morgana moves.xpl
Lines 15–16: Read input over the Internet. Setting the value of
@sequence
to
false
means that the input must be exactly
one document. Setting it true
allows any
number of documents, including zero.
Lines 20–22: We set the output as empty because we’re going to
write the results that care about to disk using
]]>
. Since we don’t have
exactly one output file (we have none), we have to set
@sequence
to
true
, since a value of
false
would require exactly one output
document (see above concerning
]]>
).
Line 23: Here and elsewhere (lines 34, 39, 46, 55, 66, 77, 88,
98, 107) we use ]]>
,
which passes through its input unchanged, just as a place to hang a
@message
attribute. The value of a
@message
attribute is written to the screen
when the step is executed, and the
@use-when
attribute value controls whether
the step should be executed or not. If
@debug
is
true()
, the messages are displayed; if it
is false()
(the default), they aren’t. The
messages are just for our convenience, so that we can track the progress of the
pipeline as it is executed.
We could put a @message
attribute on any
step, but we want the working steps to be executed regardless of the value of
the $debug
parameter, so we can’t put a
@use-when
attribute on them. Putting the
message on a separate ]]>
step lets us control whether it is rendered; the working step is always
executed.
Lines 29–34: Add basic XML markup according to the ixml grammar specified on Line 31.
You need to use Markup Blitz, rather than CoffeePot, for this step because the source file is too large for CoffeePot to process comfortably. You can use the abridged movieData-short.txt input with CoffeePot (see the link in the assignment), but not the full movieData.txt file.
If you’re running with MorganaXProc-IIIse configured according to our configuration instructions, you’ll automatically use Markup Blitz, which is what you want. XML Calabash added support for Markup Blitz only in version alpha21, so if you’re running an earlier version of XML Calabash (you can check your version with xmlcalabash version) you can either upgrade (see §2.1 Configuring XML Calabash (XProc) of our Configuring XProc and ixml processors for information about configuring XML Calabash alpha21 to use Markup Blitz) or just use MorganaXProc-IIIse.
If you’re running XML Calabash version alpha21 or later according to those
configuration instructions, XML Calabash would normally default to using
CoffeePot for ixml. We override that default behavior by using the
@cx:processor
attribute to tell XML
Calabash to use Markup Blitz instead of CoffeePot.
Line 38: We use an XProc step to remove the first
]]>
element, which contains
header labels and not real film data.
Lines 43–45: We use a
]]>
step with our movies-countries.xsl XSLT stylesheet to modify the country and runtime
information as described in the assignment.
Lines 50–54: We created movies.rnc to model the XML we intend to create within our pipeline. Since we’ve now reached that stage, we validate the XML against the Relax NG schema.
Lines 61–65: We use movies.sch to
validate the XML against a Schematron schema. Counting the number of countries
is sort of silly, but validing the content of the
]]>
element is useful
because we’re going to rely on the shape of the information there when we create
SVG later in the pipeline.
Line 70: We don’t normally save the XML because it’s just a step
on the way to the output we really want, which is the HTML and SVG. For
debugging purposes, though, we might want to see it, so we use a
]]>
step (which writes the
input to disk with the specified filename and then passes that same input along
unchanged to the next step, like a
]]>
step) that runs only
in debug mode. We can put our progress message directly on this step because,
here, unlike with our other working steps, this one, the whole step runs only in
debug mode.
Lines 74–76: The XML flows by implicit binding into this next
step, which uses movies-to-html.xsl to create
an HTML reading view of our data. We created movies.css, for styling the HTML, separately, and our XSLT transformation
writes a ]]>
element into the
HTML that points to it.
Lines 81–87: The output of the XSLT transformation flows by
implicit binding into a ]]>
step, which writes the HTML to the specified filename. We set the serialization
parameters for this step to use HTML 5 with XML syntax, to include the optional
XML declaration, and to omit the HTML content type specification, which should
not be used when creating HTML 5 with XML syntax. We turn on pretty-printing to
make the raw, angle-bracketed HTML easier to read.
The input into the source port of a
]]>
step flows out of its
result port unchanged, just as with a
]]>
step. However, as
described immediately below, we’re going to ignore that HTML going forward; so
this ]]>
step represents the
end of this branch of the pipeline.
Lines 92-97: We run a new
]]>
step that transforms the
XML (not the HTML that we just created!) into SVG with movies-to-svg.xsl. If we didn’t say otherwise,
the output of the last step (the HTML that we created earlier) would flow into
the source port of this
]]>
step through implicit
binding. That isn’t the input we want, though,p so we override it and
pull the input instead from the step where we performed our
Schematron validation. This is why we had to put a
@name
attribute on the
]]>
step;
the @name
lets us point to it on Line 94,
where the @step
attribute value on the
]]>
element says use the
step called
and the
finalize-xml
@port
attribute value says to use the
result port of that step.
This step creates a branch or fork in the pipeline. The flow of information has
been linear (from each step to the next one by implicit binding) until now, but
at this stage we reach backward and establish a second flow from the
result port of an earlier step. Because a
]]>
step
passes its input through unchanged, we could have attached the name to either of
the two steps before it, the
]]>
step that finalizes the
XML or the
]]>
step
that validates it. The result would be the same with any of these three choices;
we opted for the last one because it corresponds most directly to our
understanding of the processing, and, specificially, that we don’t want to
create the HTML and SVG output until we’ve fully validated the XML.
Lines 102–06: We use a
]]>
step to write the SVG to
disk with the specified filename. The serialization parameters are different
from those for the HTML output because SVG is a different type of XML document
than HTML 5 with XML syntax.
Unless we say otherwise, the output on the result port of final step in a pipeline passes its information implicitly into the main output port, which we declared on Lines 20–22. However, because we specified in that declaration that the output should be empty, that specification overrides the implicit binding, which means that nothing flows out of the pipeline.
Below is a graph of the pipeline as created by XML Calabash, except that we added the
coloring manually. We turn off debug mode, which means that pipeline omits the
]]>
steps that are used for
messaging and the step that saves
movies.xml only when we run the pipeline in debug mode. See also the notes
below the graph:
Steps are implicitly connected or bound to the steps that follow them immediately in the pipeline. Here are some details:
The primary input and primary output ports are declared before anything else. The steps that follow them in this pipeline are all what we call working steps, that is, those that process the document according to our instructions. The primary input has only one port, called source, and it flows implicitly into the source port of the first working step. The primary output has only one port, called result, and the result port of the last working step flows implicitly into it. In this pipeline we override that implicit connection by specifying that the primary output port should ingest not from the implicit connection, but from an empty document.
Most of the time we let the implicit connections control the flow of information, but sometimes we overrule it with an explicit connection or binding. See the discussion of the branching that creates both HTML and SVG output.
The pink represents the primary flow of information, most often from the result port of one step into the source port of the next.
The blue represents the end of the pipeline. In this project the last thing we do
is use ]]>
steps to write
the HTML and SVG output to disk. Because we configured the primary output port
to be empty (lines 20–22), that explicit inline empty value overrides the
implicit connection from the last path step to the primary output. The implicit
connection is still there, but the primary output port ignores it because we’ve
told it that it should be empty.
You can see in the graph above that the
]]>
step has two output
ports: result and result-uri. In addition to writing to disk, a
]]>
step copies the input
from its source input port to its output result port unchanged,
just like ]]>
, and it
writes the path to the saved file to the result-uri output port. We
ignore both of those, with the result that they disappear without a
trace
The diamonds near the top represent the primary input and output steps. The house-shaped polygon at the top represents the plain text document that the primary input port loads. The house-shaped polygon on the far right represents the output of the primary output port, which is empty because that’s how we configured it.
Steps (other than ]]>
and
]]>
) are represented by
rectangles with three rows, as follows:
The middle row holds the type of the step; for an example see
, near
the upper left of the graph. If the step has a name, it is
displayed after a slash character following the type; for an example see
the name
finalize-xml
for the
]]>
step.
The top row represents the input ports for the step. The primary input
into each step is the source port. Some steps have addition
inputs, e.g., a grammar port for an
]]>
step or
a stylesheet port for a
]]>
step. ]
The bottom row represents the output ports for the step. The primary
output from each step is the result port. Some steps have
additional outputs, e.g.,
]]>
emits a report on its report output port, which we
ignore.
Where steps require secondary input (e.g., grammars for ixml, stylesheets for XSLT, schemas for validation), those enter through a secondary input port. Those inputs come from documents, also represented by by rectangles with three rows, as follows:
Because we read this input from existing resources, the middle row has an href value that gives the source filename.
All output ports emit output, but in this pipeline except for the primary output on the result port (and sometimes even then) we ignore that output. In the graph this is represented by an arrow from the output port to a small black circle, which means that the output from that port disappears without a trace.