Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2022-03-06T22:50:28+0000
Definitions of the term algorithm vary, but what they all have in common is that an algorithm is a formal series of steps that can be used to accomplish a task. Before you write a line of code, you’ll want to think about whether you can come up with an algorithm for solving a programming task. Suppose your task is to create a table of contents for Hamlet, along the lines of:
Forgetting about XSLT for the moment, how can you think about this task? Here are two very different perspectives:
Act 4, Scene 2, for example, you type those exact characters into your XSLT, and you do that manually for all of the acts and all of the scenes.
Here’s another way to think about the difference between those two models. The first says
(we’ll simplify the problem for the moment and talk just about the acts, as if there weren’t
any scenes) print a title for Act 1, then for Act 2, then for Act 3, then for Act 4, then
for Act 5, and then stop.
The second says find all the acts, no matter how many
there are, and process them in the order in which they occur. When you run out of acts,
you’re done.
The first approach requires you to do all the work. If you’re going to type up, manually, each of the headers, you might as well just write the HTML manually. Furthermore, the first approach doesn’t even need to look at the play; the human has done that already and created the output, instead of letting the XSLT create the output in response to what it finds in the play. A big problem with having the human write the output separately and explicitly for each act and scene is that when that’s done, all you have is the table of contents for that one play. If you want to get a table of contents for another play (which may have a different number of acts, with a different number of scenes in each act), you have to build that manually, as well. The second approach, on the other hand, will work for any play, no matter how many acts or scenes it has, and it (the program, not you) has to look at the play to determine how many acts and scenes there are.
The preceding two strategies can both be considered algorithms, or step-by-step
instructions for completing the task, although the first is so brute force
that some
programmers would consider it a non-algorithmic solution because it really says
don’t even look at the play; just print this stuff that I’ve typed up manually
. The
second algorithm is clearly superior for two reasons: 1) it lets the computer do more of the
work (the human doesn’t even have to know the act and scene structure of the play) and 2) it
works for any play with any number of acts or scenes.
Here’s a solution that requires the human to do all the work:
This creates the correct output, but it doesn’t extract any information at all from the
input XML document. The human has written the entire output HTML by hand, and the XSLT just
produces it all at once. The XSLT doesn’t figure out how many acts or scenes there are (the
human had to do that manually), and it also doesn’t know what the
<head>
elements say because it never looks at them.
The human has counted the acts and scenes, hard-coded where each one should be cited, and
has manually duplicated the information that could have been extracted automatically from
the <head>
elements in the input XML document.
Here’s a compromise approach:
This solution also requires the human to figure out exactly how many acts and scenes there
are, and it manually places each one where has to be. If the programmer mistypes a number in
a predicate or miscounts the number of scenes or types something out of order, it gets the
wrong answer. It also can’t create a table of contents for any play that has a different
number of acts or scenes. Unlike the preceding example, which doesn’t even look into the
input XML document, this one does require access to the input XML version of the play
because it is extracting the value of the <head>
elements from there. Nonetheless, this version requires the human, rather than the computer,
to determine how many acts and scenes there are and where they should be placed in the
output HTML.
Here is one solution based on the second approach, that is, a solution that 1) lets the computer figure out how many acts and scenes there are and 2) will work with any play with any number of acts and scenes:
The first template rule, for the document node (the entire document) says create an HTML
document for the output, create an unordered list
(
The second template processes each
act by creating a list item (<ul>
) inside the body of that HTML document, and
then do something for every act in the play no matter how many there are
(//body/div
).<li>
) for it and,
inside that item, outputting the value of the <head>
for that act (Act 1
, Act 2
, etc.), creating a new
<ul>
to list the scenes in that act, and then
processing each scene. The third template creates a list item for each scene.
Note that this program doesn’t have to know how many acts and scenes there are, and it
doesn’t have to know exactly what each <head>
says.
It follows the second algorithm, above, and lets the computer count the acts and scenes,
copy the output text from the input XML document, and know when to stop because it has run
out of acts and scenes.
XSLT is a transformation language, and it is designed to take an XML input document and transform it into something else. An XSLT stylesheet that doesn’t extract any information at all from the input XML isn’t really performing a transformation. And an XSLT stylesheet that requires the human to do things that the computer can do much more easily and reliably, such as count the acts and scenes and get them in the right order, isn’t taking advantage of the algorithmic facilities that computational tools provide.
When you approach an XSLT transformation problem, before you write a line of code, think
about how much of the answer is already latent in your input XML. In this case, the number
of act and scenes, the order in which they appear, and the value of their
<head>
elements were all already present in the
input XML. All that information should be extracted from the input XML by the XSLT
stylesheet, and not recreated manually by the human. What the human has to do is supply
information not present in the input XML (such as, for example, the fact that the output
should be in the form of HTML unordered lists) and specify when specific bits of information
in the input XML should be fetched and where they should be written in the output. As a rule
of thumb, if the computer can do it for you, don’t do it yourself.
If you think back to the first XML encoding assignments you did in this course, a lot of you created numbered output, along the lines of the following hypothetical recipe for preparing a cup of coffee:
We told you then that it was usually a Bad Idea to number items because you could mistype and make a mistake, and because the computer can count for you. Now that you know XPath, you know that you could have written:
and you could then use the XPath position()
function
to tell you the number (ordinal position) of each step in the series. For example,
transforming that XML with the following XLST:
will produce
The steps were not numbered explicitly in the input XML. Instead, the numbers were added by
the XSLT stylesheet during the transformation. The big payoff for letting the computer do
the numbering is that if you later decide to add a step (e.g., Grind coffee beans
),
you don’t have to change the step numbers manually.
Development strategies that can deal gracefully with different types of input are considered robust, which means that they don’t break easily (the opposite of robust is fragile or brittle). Manual numbering, where the computer could have handled it for you, is fragile because you can mistype, and also because if you modify your content, you may have to modify all of the numbering, including the numbers you’ve already entered for other items. There are times when you might need to enter numbers manually, though. For example, if you’re transcribing a document that was originally misnumbered and you want to record that original numbering, with the error, you’ll have to enter that yourself. In most cases, though, for more robust results, you’ll want to rely on XPath and XSLT to take care of this type of task for you.