Maintained by: David J. Birnbaum (djbpitt@gmail.com)
Last modified:
2025-11-02T16:57:17+0000
XQuery is a programming language designed primarily to work with XML databases, that is, databases that engage with XML content in an XML-idiomatic way, such as by using XPath path expressions and functions. We introduce XML databases in an appendix below, but because XQuery can also be used with XML that has not been stored into an XML database, most of this tutorial is about XQuery in general, that is, without any dependency on a database.
In Real Life we often use XQuery for exploratory data analysis, where we run queries over our XML to learn how it is structured. You’ve already practiced performing exploratory data analysis with XPath, but, as we’ll see below, XQuery is more powerful than XPath, which means, among other things, that it may be easier to ask some questions about XML in XQuery than in XPath.
XQuery, like XSLT, is built on top of XPath, but XQuery syntax is similar to
XPath syntax and every XPath expression is also a valid XQuery expression. For
example, the expression 1 + 1 is a complete
and valid XPath expression and also a complete and valid XQuery expression that
evaluates to the value 2. Similarly, if
http://dh.obdurodon.org/wilde-testimony.xml points to an XML
document, you can use the standard XPath
doc() function to select it, so the
standard XPath expression:
doc("http://dh.obdurodon.org/wilde-testimony.xml")//speech => count()
will select all ]]>
descendants of the document and return a count of them. Because this is a
complete and valid XPath expression, it is also a complete and valid XQuery
expression.
XQuery supports FLWOR (pronounced flower
) expressions, where the letters
stand for:
for
let
where
order by
return
A FLWOR expression must start with either a
for or
let statement, must end with a
single return statement, and may
include any of the other statements (including more instances of
for and
let, but not of
return) as needed.
Pure XPath supports some of these statement types, but not all of them, and not as many ways of combining them as XQuery does. There are a few additional statement types that are also allowed within FLWOR expressions, but in this tutorial we concentrate on the ones above.
The following is a FLWOR expression that returns a sequence of all speeches not by Oscar Wilde himself in the document, sorted by speaker name:
let $testimony as document-node() := doc("http://dh.obdurodon.org/wilde-testimony.xml")
for $speech as element(speech) in $testimony//speech
where $speech/speaker ne "Wilde"
order by $speech/speaker
return $speech
The FLWOR expression correctly starts with a
for or
let statement (in this case a
let), ends with a single
return, and includes
for,
where, and
order by statements in between. Here’s what
each line means:
let $testimony := doc("http://dh.obdurodon.org/wilde-testimony.xml")
This expression binds the result of evaluating the expression on
the right to the variable name
$testimony. That expression uses
the standard XPath doc() function
to select a document, and the function is defined as returning the
document node (which is the parent of the root element).
The as document-node() phrase
specifies that the value retrieved by the expression on the right must
be a document node. Datatyping is not required by the XQuery spec, but
it is considered good practice because if the expression somehow returns
something other than a document node, you want to raise an error
immediately so that you can find and fix the problem. We strongly
recommend using as phrases whenever
you define a variable in XQuery.
Note the following differences between XQuery and XSLT:
In XQuery we always precede a variable name with a
dollar sign—both when we define the variable and when we use it.
In XSLT we precede a variable name with a dollar sign only when
we use the variable, but not when we define it. For example, in
XSLT we can bind the value
10 to the variable
$x with
]]>
(no dollar sign inside the value of the
@name attribute) and then
refer to it with, e.g.,
]]>
(using the dollar sign when we refer to the variable). In the
XQuery above, however, we use the dollar sign both when we
define $testimony (line 1)
and when we use it (line 2).
The operator that binds a value to a variable in XQuery is
:=. This is sometimes
informally (that is, it isn’t an official term and you won’t
find it in the XQuery specification) called the walrus
operator because it looks like the eyes and tusks of a
walrus lying on its side—at least if you have a lively
imagination!
for $speech as element(speech) in $testimony//speech
The part of this expression that follows the
in keyword is a standard XPath path
expression that selects all
]]> descendants of
the document node (which is what the variable
$testimony represents because
that’s how we defined it in the first statement). The expression as a
whole binds each of those elements, one at a time, to the variable
$speech. The
as element(speech) phrase says that
the values processed in this for
statement must be elements of type
]]>. As with the
as clause in line 1, this part of
line 2 is optional, but we strongly recommend including it. The code
that follows a for expression (in
this case, the last three lines of the XQuery script) will fire once for
each item, which, in this case, means once for each
]]> element.
There are technical reasons that a
for expression is not
officially a loop (ask us about this if you’re curious). What
it has in common with a loop, though, is that it does something once
for each item.
It is common to refer to the XPath expression after the keyword
in as the sequence variable
because it identifies the sequence of items to which the following
statements will be applied. The variable after the
for keyword is commonly called the
range variable because it ranges over the items in
the sequence that it processes.
The sequence of items to be processed doesn’t have to be a variable. The FLWOR expression:
for $x in (1, 2, 3)
return $x * 2
returns the sequence (2, 4, 6).
Here the sequence to be processed is not a variable; it is a
sequence of literal integer values.
where $speech/speaker ne "Wilde"
This expression filters the items
(]]> elements)
selected by the for expression and
keeps only those that have a child
]]> element that is
not equal (using the standard XPath
ne value-comparison operator to
mean not equal to
) to the string value
Wilde.
We could, alternatively, have omitted the
where expression and used an
XPath predicate to do the filtering:
for $speech in $testimony//speech[speaker ne "Wilde"]
These expressions are synonymous and will return the same results, although one or the other may be more efficient when used with an XML database (see below).
order by $speech/speaker
This expression sorts the speeches (the ones that survived the filtering)
by the value of the
]]> child of the
]]> element. The
speeches by each distinct speaker are returned in document order by
default, but we could subsort them (for example from shortest to
longest) if we wanted to do that.
return $speech
This expression returns each speech that has undergone the processing above, that is, that has survived the filtering and been sorted by speaker name.
It is possible to obtain the same result with a pure XPath expression instead of an XQuery FLWOR expression:
doc("http://dh.obdurodon.org/wilde-testimony.xml")//speech[speaker ne "Wilde"] =>
sort((), function($x) {$x/speaker})
We’ve broken this over multiple lines for legibility, but it could have been
written on a single line. The path expression selects speeches (mimicking,
in this example, the XQuery for
expression), we use a predicate (instead of XQuery
where) to filter out Wilde’s speeches,
and we use the standard XPath sort()
function (instead of XQuery order by)
to sort the speeches by speaker. Although the meaning of the two versions is
the same and they return the same results, we find the XQuery version easier
to read and understand.
To practice using XQuery inside <oXygen/> do the following:
Open <oXygen/>.
In the menu bar, select Window → Show View → XPath/XQuery Builder. This opens a panel (usually on the right side of the screen).
At the upper left corner inside that panel click on the dropdown to the immediate left of the red rightward-pointing triangle (see the image below). From the dropdown list select Saxon-HE XQuery 12.5 (your version number might be different).
In the image of the XPath/XQuery builder below, the dropdown list where you select the XQuery version is circled in red and the red rightward-pointing triangle that you click to evaluate the XQuery is circled in blue:
![[Image of XPath/XQuery builder interface]](images/xquery-builder.png)
You can use the XPath/XQuery builder to apply XQuery to a document that is open in
<oXygen/> or to a document that you select using the standard XPath
doc() function (or to a sequence of documents
that you select with the standard XPath
collection() function, about which see below).
Let’s illustrate those two variants:
You can practice applying XQuery to a remote document (addressed by using the
standard XPath doc() function) by copying
and pasting the following XQuery expression into the XPath/XQuery builder panel
and evaluating it by clicking on the red rightward-pointing triangle:
let $testimony as document-node() := doc("http://dh.obdurodon.org/wilde-testimony.xml")
for $speech as element(speech) in $testimony//speech
where $speech/speaker ne "Wilde"
order by $speech/speaker
return $speech
<oXygen/> should open a panel at the bottom where it displays the 74 results.
You can scroll to verify that the results are a sequence of
]]> elements, that none of
the speeches are by Oscar Wilde, and that the speeches are sorted according to
alphabetic order of the speaker name.
You can practice applying XQuery to document that is open in <oXygen/>
(instead of one that you access with the XPath
doc() function) as follows:
To open the Wilde testimony document in <oXygen/>, start by typing
Command-u (Mac) or Control-u (Windows), which
opens an Open URL
dialog. Paste the URL for the document
(http://dh.obdurodon.org/wilde-testimony.xml)
into the box and click OK. The document should open in
<oXygen/>. If you have other tabs open in <oXygen/>, select
the one that contains the Wilde testimony document, which makes it the
active document for the XPath/XQuery builder.
Click inside the XPath/XQuery builder panel, remove any content that is already there, and type the following:
for $speech as element(speech) in //speech
where $speech/speaker ne "Wilde"
order by $speech/speaker
return $speech
Use the red triangle to evaluate the XQuery against the open document.
The results should be the same as above. We changed the XQuery to remove the
$testimony variable (and its binding to the
result of evaluating the doc() function)
because <oXygen/> will use the current active document as the context for
XQuery processing unless we specify something else. That means that the first
line of the revised XQuery means select all of the
.]]> descendants of the
document node of the active document and proceed from there
The following topics are important where you need them, but you can do a lot with XQuery using just the information above. When you begin learning XQuery we recommend:
Reading the sections above carefully and practicing them
Skimming over the sections below to acquaint yourself with their contents, and then returning and reading them carefully when you have a need for them in your own work
If your XQuery selects or creates nodes that are in a namespace you must make the namespace information available to your XQuery script. There are two ways to do this:
You can set a default element namespace, which will then apply to all elements except where you say otherwise explicitly.
You can bind a prefix to a namespace and then use the prefix when you refer to the node.
We didn’t have to think about namespaces with the Wilde testimony example above because that XML document does not use any namespaces, but the example below is in the TEI namespace, and therefore requires that we declare and use that namespace in our XQuery.
A default element namespace in XQuery applies only to elements, and not to attributes. There is no way to declare a default attribute namespace, which means that if you need to refer to attributes that are in a namespace, you must use a namespace prefix.
You can declare a default namespace that will apply to all elements (but not attributes; see the note above) mentioned in your XQuery (both those you select from the XML you’re processing and those you might create for your output) by including a declaration like the following at the beginning of your XQuery:
declare default element namespace "http://www.tei-c.org/ns/1.0";
The part in quotation marks needs to match the namespace used in your document. Since the bad-hamlet.xml document that we process below is in the TEI namespace, we’ve made the TEI namespace the default.
XQuery statements that begin with the keyword
declare go at the beginning of the
XQuery document and each declare
statement must end with a semicolon.
You can find all 357 speeches by Hamlet in the play with the following XQuery:
declare default element namespace "http://www.tei-c.org/ns/1.0";
let $play as document-node() := doc("http://dh.obdurodon.org/bad-hamlet.xml")
for $speech as element(sp) in $play//sp
where $speech/@who eq "Hamlet"
return $speech
If you omit the XML namespace declaration you will return no results because
you will be asking for ]]>
elements in no namespace instead of
]]> elements in the TEI
namespace. Asking for something that does not exist is not an error; your
XQuery will find all zero instances of
]]> elements in no namespace,
keep only those spoken by Hamlet (there are none, of course), and return all
zero of them. This is, as they say, probably not what you want. The takeaway
is that namespace mistakes can be difficult to find and debug because they
may not raise errors, so if your XQuery is expected to return results and it
doesn’t, a common reason is that you’ve failed to specify a namespace
correctly.
You can bind a namespace URL to a prefix with a statement like:
declare namespace tei="http://www.tei-c.org/ns/1.0";
You can then refer to elements in the TEI namespace by using the prefix, as in line 3 of the following example (in two places in that line):
declare namespace tei="http://www.tei-c.org/ns/1.0";
let $play as document-node() := doc("http://dh.obdurodon.org/bad-hamlet.xml")
for $speech as element(tei:sp) in $play//tei:sp
where $speech/@who eq "Hamlet"
return $speech
If you omit the tei: prefix in line 3
you’ll get no results because you’ll be asking for speeches in no
namespaces, and all speeches in this document are in the TEI namespace.
What happens when the input XML is in one namespace (such as TEI) and the output you are creating is in a different namespace (such as HTML or SVG)? The XQuery default element namespace declaration applies to all elements, both those read from input XML and those created in the output as literal result elements, and there is no way to declare different default namespaces for input and output. You could make one of those namespaces the default and use a prefix for the other, but, for what it’s worth, we sometimes use namespace prefixes for both input and output, so that we don’t have to remember which one we’ve made the default. We’ll illustrate below how to manage namespaces in XQuery when we read XML in one namespace and create XML in a different namespace.
Unlike XQuery, XSLT is able to declare default namespaces for input and
output separately. An
xquery-default-namespace="http://www.tei-c.org/ns/1.0"
attribute setting on the root
]]> element
specifies that the TEI namespace is the default namespace for elements
read from the input document. An
xmlns="http://www.w3.org/1999/xhtml"
attribute setting on the same root element specifies that elements
created in the output will be in the HTML namespace unless you say
otherwise explicitly.
Below we illustrate a transformation from TEI XML to HTML in two steps. First we create plain-text output; we then modify the query to create HTML output. The goal of this exercise is to illustrate how to read input in one namespace (TEI) and create output in a different namespace (HTML), and we create the plain-text output first just to separate the general query logic from the code used to manage the namespaced HTML output.
The following XQuery creates a deduplicated list of speakers in each act of Hamlet:
The output looks like:
<?xml version="1.0" encoding="UTF-8"?>Act 1: All, Bernardo, Cornelius and Voltimand, Francisco, Gertrude, Ghost, Hamlet, Horatio, King, Laertes, Marcellus, Marcellus and Bernardo, Marcellus and Horatio, Ophelia, Polonius Act 2: First Player, Gertrude, Guildenstern, Hamlet, King, Ophelia, Polonius, Reynaldo, Rosencrantz, Rosencrantz and Guildenstern, Voltimand Act 3: All, First Player, Gertrude, Ghost, Guildenstern, Hamlet, Horatio, King, Lucianus, Ophelia, Player King, Player Queen, Polonius, Prologue, Rosencrantz, Rosencrantz and Guildenstern Act 4: Captain, Danes, Fortinbras, Gentleman, Gertrude, Guildenstern, Hamlet, Horatio, King, Laertes, Messenger, Ophelia, Rosencrantz, Rosencrantz and Guildenstern, Sailor, Servant Act 5: All, First Ambassador, First Clown, Fortinbras, Gertrude, Hamlet, Horatio, King, Laertes, Lord, Osric, Priest, Second Clown
The XML declaration appears at the beginning of the output because we didn’t specify that we were creating plain-text output, and insofar as the default output type is XML, the transformation prepends the XML declaration to the actual output. In Real Life we would tell XQuery about the output type, which would cause the XML declaration to be omitted automatically, but since plain-text output is just an interim step toward our ultimate goal of producing HTML 5 output that uses XML syntax, we’ll leave it in place for now.
Here’s how each line of the XQuery works:
Because the document is in the TEI namespace, we declare that
namespace and bind it to the prefix
tei:. We have to prepend this
prefix to every element in the TEI input document that we mention.
(We could, alternatively, have used a default element namespace and
omitted the prefix.)
We use the standard XPath doc()
function to bind the document node of the play document to the
variable $play.
We use a for statement to
process each act in the play, binding the
]]> elements that
represent each individual act, in turn, to the variable
$act. The
at $pos clause sets the value
of the variable $pos to the
position of each act, in turn, within the sequence of acts being
processed. You can call the position variable whatever you want; we
use $pos as a mnemonic for
position
.
We use the ]]>
descendants of each act to represent the speakers. We could,
alternatively, have used the
@who attribute, but the
]]> element is
more user-friendly.
We use the standard XPath
distinct-values() function to
deduplicate the sequence of speakers for each act.
Although human readers know that the associated values are
strings, from an XQuery perspective they have the datatype
xs:untypedAtomic, which is
the default datatype for element and attribute values read from
an XML document unless we say otherwise. If we include the
as xs:string+ phrase and
omit the ! string() phrase
we’ll be notified of a datatype error because we’ve specified
that the value must be one or more strings and XQuery thinks
it’s one or more untyped atomic values. If we omit the
as xs:string+ phrase we’ll
accept any datatype, and in that case we don’t need to use the
! string() phrase to
convert the untyped atomic values to string values. We
nonetheless recommend always specifying datatypes so that we’ll
be notified if we’ve made a mistake that causes our code to
select items of an unexpected type.
We use the standard XPath sort()
function to sort the distinct speaker names alphabetically.
We use the standard XPath
string-join() function to form
the deduplicated sequence into a string of comma-separated
values.
We use the standard XPath
concat() function to construct
an output line for each act that combines the act number and the
list of speaker names into a human-friendly statement. The string
at the end
is a newline character; its presence causes the information for each
act to be printed on a separate line.
In Real Life we might have combined some of the XPath functions within a single line, but we’ve separated them here for didactic reasons, so that we could focus on each one individually.
The preceding XQuery constructs plain-text output, but what if we want to construct HTML output? Let’s try constructing a two-column table, with the act number in the first column and the comma-separated list of speakers in the second column. The output should look like the following:
| Act | Speakers |
|---|---|
| 1 | All, Bernardo, Cornelius and Voltimand, Francisco, Gertrude, Ghost, Hamlet, Horatio, King, Laertes, Marcellus, Marcellus and Bernardo, Marcellus and Horatio, Ophelia, Polonius |
| 2 | First Player, Gertrude, Guildenstern, Hamlet, King, Ophelia, Polonius, Reynaldo, Rosencrantz, Rosencrantz and Guildenstern, Voltimand |
| 3 | All, First Player, Gertrude, Ghost, Guildenstern, Hamlet, Horatio, King, Lucianus, Ophelia, Player King, Player Queen, Polonius, Prologue, Rosencrantz, Rosencrantz and Guildenstern |
| 4 | Captain, Danes, Fortinbras, Gentleman, Gertrude, Guildenstern, Hamlet, Horatio, King, Laertes, Messenger, Ophelia, Rosencrantz, Rosencrantz and Guildenstern, Sailor, Servant |
| 5 | All, First Ambassador, First Clown, Fortinbras, Gertrude, Hamlet, Horatio, King, Laertes, Lord, Osric, Priest, Second Clown |
Below is one way to create this output (an explanation follows the code):
Hamlet
Hamlet
Act
Speakers
{
let $play as document-node() := doc("http://dh.obdurodon.org/bad-hamlet.xml")
for $act as element(tei:div) at $pos in $play//tei:body/tei:div
let $act-speakers as element(tei:speaker)+ := $act/descendant::tei:speaker
let $distinct-act-speakers as xs:string+ := distinct-values($act-speakers) ! string()
let $sorted-act-speakers as xs:string+ := sort($distinct-act-speakers)
let $sorted-act-speakers-string as xs:string := string-join($sorted-act-speakers, ", ")
return
{$pos}
{$sorted-act-speakers-string}
}