Maintained by: David J. Birnbaum (djbpitt@gmail.com) Last modified: 2021-12-27T22:03:53+0000
Since version 2.0 of the eXist XML database it has been possible to build self-contained, modular applications entirely within eXist, which facilitates their deployment into any database instance using a standardized packaging format. For projects that may prefer to minimize the dependency on a particular framework, though, it is also possible to build a site with plain ol’ HTML and call on eXist only to retrieve information that needs to be queried within the database. This approach makes it easier to switch among XML databases, should there be a reason to do so, since the developer can reuse all existing code except for the actual database call and any processing performed within the database itself.
This tutorial illustrates the use of HTML with PHP to develop a site on a traditional Apache HTTP server, making calls to eXist to extract and return information from the database as needed. In this model, eXist is used as an XML database, rather than as an integrated application platform. The example is simplified, and does not necessarily represent the sort of design we would employ in production, but the PHP-mediated communication between the Apache HTTP server and eXist illustrates general principles that can be employed elsewhere as needed.
The GitHub repo for this tutorial is https://github.com/ebeshero/phpStuff and the files with which it interacts are part of the Digital Mitford: The Mary Russell Mitford archive. The repo contains the HTML/PHP files and the XSLT, but not the XML data files or the XQuery. All PHP and XQuery files are reproduced below. The workflow we illustrate is as follows:
This workflow is implemented with three PHP pages, as follows:
We recommend developing XML in an XML-aware editor, such as <oXygen/>. <oXygen/> is a commercial product, but non-commercial academic licenses are available at a reduced cost. It is possible to develop XML in other editors, but our experience has been that <oXygen/> provides a better XML development environment than the alternatives.
We recommend developing PHP in a PHP-aware editor, and we usually use JetBrains PHP Storm. PHP Storm is a commercial product, but non-commercial academic licenses are available at no cost. It is possible to develop PHP in other editors, including <oXygen/>, but the development environment is better in an editor that emphasizes PHP support.
We developed our XQuery inside eXide, the IDE built into eXist, and it is also possible to use <oXygen/> to develop code inside eXist. The advantage of developing directly within eXist is that it removes the need to upload XQuery into eXist before running it there. The disadvantage is that because the code is stored within eXist, it is not part of our project GitHub repo.
Our general approach to getting our PHP script to request and receive data from eXist is to develop and save the query inside eXist ahead of time and address it with a REST call. We specify the base URL for our eXist installation in a separate file called config.php, which we import into the PHP files that perform the actual queries. The import statement looks like:
<?php define("REST_PATH", "http://dxcvm10.psc.edu:8080/exist/rest"); ?>
We then prepend this path to our actual query (see below). The point of defining this variable separately is that should we move our eXist server to a new location, we would have to update only config.php, and all of our queries would be performed at the new location. A skeletal query performed from within PHP looks like:
<?php require_once("config.php"); $contents = REST_PATH . "/db/queries/letterDates.xql"; $result = file_get_contents($contents); echo $result; ?>
The first line imports the REST_PATH variable and the second line prepends that to the rest of the URL to run the letterDates.xql query that we have stored inside eXist. The third line receives the results of the query and the fourth line inserts those results into the HTML page created by the code wrapped around the query. We return all results from eXist with HTML markup but in no namespace because when we insert the result of the query into the HTML being created by our PHP, it will inherit the HTML namespace that we define there.
The preceding example runs a stored query without any input parameters. Where we do need to pass parameters into the query (see pages 2 and 3 below), we usually use an HTTP GET request, which is simpler than POST to develop (it requires less code) and debug (because we can bypass any input form and just edit the query string directly in the browser address bar). The examples below use GET requests; at the bottom of this page we illustrate how to use POST instead.
The PHP file that displays a list of years looks as follows:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>View the years</title> </head> <body> <h1>Years</h1> <hr/> <p>Select a year to view of list of letters from that year</p> <ul> <?php require_once("config.php"); $contents = REST_PATH . "/db/queries/letterDates.xql"; $result = file_get_contents($contents); echo $result; ?> </ul> </body> </html>
The PHP code in the middle runs the query and inserts the results into the HTML. As
mentioned above, the namespace defined on the root <html>
element is
inherited by the inserted content. The letterDates.xql query looks as follows:
xquery version "3.0"; declare default element namespace "http://www.tei-c.org/ns/1.0"; let $mitfordColl:=collection('/db/mitford/letters')/* let $letterYears := $mitfordColl//sourceDesc/msDesc//head/date/substring-before(@when, '-') let $dletterYrs := distinct-values($letterYears) for $dly in $dletterYrs where string-length(normalize-space($dly)) gt 0 order by $dly return <li xmlns=""><a href="getLetterList.php?year={$dly}">{$dly}</a></li>
In the preceding XQuery and the following ones, we’ve declared the TEI namespace as the
default, which saves us from having to prepend a namespace prefix to every TEI element
in our XPath expressions. The price for that convenience is that in XQuery (unlike XSLT)
the default namespace applies equivalently to the input and the output, which means that
we now have to specify that the <li>
elements in our output are in no
namepsace, because if we don’t, the default will cause them to come out (incorrectly) in
the TEI namespace. Some developers prefer to do the reverse, that is, to omit the
default namespace declaration (so that no namespace is the default), bind the TEI
namespace to a prefix, and use that prefix before all element names in the XPath that
refers to the input XML.
The XQuery returns a sequence of <li>
elements, each of which
contains a year, and the <li>
elements are inserted between the
<ul>
tags that are present in the PHP. What gets returned looks
like:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>View the years</title> </head> <body> <h1>Years</h1> <hr/> <p>Select a year to view of list of letters from that year</p> <ul> <li> <a href="getLetterList.php?year=1818">1818</a> </li> <li> <a href="getLetterList.php?year=1819">1819</a> </li> <!-- more years --> </ul> </body> </html>
Each year is wrapped in an <a>
element with an @href
attribute that points toward getLetterList.php and passes the year as a parameter called
year
.
When the user clicks on one of the links above, the system runs getLetterList.php,
passing in a year as a parameter called year
. The PHP file looks as follows:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Choose a document</title> </head> <body> <h1>Choose a document to read from the list below</h1> <hr/> <ul> <?php require_once("config.php"); $year = htmlspecialchars($_GET["year"]); $contents = REST_PATH . "/db/queries/letterList.xql?year=$year"; $result = file_get_contents($contents); echo $result; ?> </ul> </body> </html>
The most significant difference between this PHP script and the first is that this one
retrieves the GET parameter year
and assigns it to a PHP variable called
$year
, and when it calls on the letterList.xql query inside eXist, it
passes that value in as a GET parameter also called year
. We use the htmlspecialchars()
function to sanitize the input by removing
possibly malicious characters that should not occur naturally in a year. In a real
production environment, before issuing the REST call to eXist we would trap erroneous
entry (input that is not a year, input that is a year but that is not represented in the
database) and return an informative error message.
The letterList.xql query looks as follows:
xquery version "3.0"; declare default element namespace "http://www.tei-c.org/ns/1.0"; declare variable $year := request:get-parameter('year', '1819'); let $mitfordLetters:=collection('/db/mitford/letters')/* let $letterTitles := $mitfordLetters//sourceDesc/msDesc//head[date[normalize-space(substring-before(@when, '-'))[string-length(.) gt 0] = $year]] for $LT in $letterTitles let $ld := $LT/ancestor::teiHeader//sourceDesc/msDesc//head/date/normalize-space(@when) order by $ld return <li xmlns=""><a href="getLetterText.php?uri={tokenize($LT/base-uri(), '/')[last()]}">{$LT/replace(.,'\W+$','')}</a></li>
We declare a variable called $year
and populate it by using
request:get-parameter()
to retrieve the GET parameter called
year
that was passed in as part of the REST call issued by the PHP script. We
specify 1819
as a default year in case no year parameter was entered; in a real
production environment we would instead trap that sort of error and return an
informative error message. Our $letterTitles
variable selects the titles of
all letters with a year that matches the value of the $year
variable, we
use $ld
(the full date of each of those letters) to sort them
chronologically, and we output the titles as HTML links. The output HTML looks as
follows:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Choose a document</title> </head> <body> <h1>Choose a document to read from the list below</h1> <hr/> <ul> <li> <a href="getLetterText.php?uri=1819-01-09-Elford.xml">Letter from Mary Russell Mitford to Sir William Elford, 1819 January 9</a> </li> <li> <a href="getLetterText.php?uri=1819-01-12_Elford.xml">Letter from Mary Russell Mitford to Sir William Elford, 1819 January 12</a> </li> <!-- more letters --> </ul> </body> </html>
Each letter title is a link to our third PHP script, getLetterText.php, and passes a GET
parameter called uri
, which is the unique name of the XML file in the eXist
database.
The third PHP script looks as follows:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Read a document</title> </head> <body> <h1>Read a document</h1> <hr/> <?php require_once("config.php"); $uri = htmlspecialchars($_GET["uri"]); $contents = REST_PATH . "/db/queries/letterText.xql?uri=$uri"; $result = file_get_contents($contents); echo $result; ?> </body> </html>
It is identical in functionality to the second script, getLetterList.php, except that
instead of passing a GET parameter called year
with the year as a value, it
passes a GET parameter called uri
with a filename as a parameter. As with
getLetterList.php, in a real production environment we would perform error trapping,
such as checking the value of the uri
parameter against a whitelist of all
filenames in the database, so that we could return an informative error message in case
of an erroneous value.
The getLetterList.php PHP script runs the letterText.xql query, which looks as follows:
xquery version "3.0"; declare default element namespace "http://www.tei-c.org/ns/1.0"; declare variable $uri := request:get-parameter("uri", "1819-05-30_Elford.xml"); declare variable $letterFileName := concat('/db/mitford/letters/',$uri); let $letter := doc($letterFileName)/* return transform:transform($letter, doc('/db/xslt/MitfordLetterTransformPHP.xsl'), ())
This query retrieves the uri
GET parameter, using an existing filename as a
default if no parameter value is supplied. In a real production environment we would
trap erroneous or missing input and return an informative error message, instead of
serving an arbitrary file. The script then finds the letter in question and performs an
XSLT transformation (using the transform:transform()
function) to convert
it to an HTML fragment. The transformation does not return an entire HTML document
because the PHP script is already providing the HTML superstructure; eXist returns only
the HTML that needs to be inserted between the <body>
tags created by
the PHP. As above, we return that in no namespace because it will inherit the HTML
namespace from the PHP wrapper before the results are returned to the user.
We don’t provide the XSLT here because it is long and of minimal illustrative value, since it is customized to the particular markup conventions of the Digital Mitford project, and it is available in our GitHub repo. The important details are that we return results in no namespace and we use:
<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
to set the return method to xml
and suppress the XML declaration.
We generally prefer GET to POST in our own projects, but because GET strings can cause a URL to exceed the allowed limit (which varies by server and browser), some applications my need to use POST as a way of passing more information into eXist than would be possible with GET. Here is a sample POST query (from a different project, which runs a bibliographic query), based on the explanation at http://stackoverflow.com/questions/5647461/how-do-i-send-a-post-request-with-php
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head><title>Test</title></head> <h1>PHP test</h1> <form action="test1.php" method="post"> <label for="target">Keyword:</label> <input name="target" type="text"/> </form> </html>
When the user types in a keyword and hits the enter
key, the system runs
test1.php, passing a parameter called target
with the value supplied by the user.
The test1.php script looks as follows:
<html> <head><title>Hi, Mom!</title></head> <body> <h1>Stuff</h1> <?php $target = htmlspecialchars($_POST['target']); $url = 'http://localhost:8080/rest/db/repertorium/xquery/searchBibl.xql'; $data = array('type' => 'input', 'target' => $target); // use key 'http' even if you send the request to https://... $options = array( 'http' => array( 'header' => "Content-type: application/x-www-form-urlencoded\r\n", 'method' => 'POST', 'content' => http_build_query($data) ) ); $context = stream_context_create($options); $result = file_get_contents($url, false, $context); if ($result === FALSE) { /* Handle error */ } //var_dump($result); echo $result; ?> </body> </html>
In a real production environment we would use config.php to set a REST_PATH variable, as
above, and we would trap errors and return informative error messages. We’ve commented
out the var_dump()
function, which we used for development and debugging.
More information about HTTP request parameters is available at the StackOverflow posting from which we copied this code.