Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2016-05-24T01:20:38+0000


PHP and the eXist XML database

Communicating between PHP and eXist

Since version 2.0 of the eXist XML database it has been possible to build self-contained, modular applications entirely within eXist, which facilitates their deployment into any database instance using a standardized packaging format. For projects that may prefer to minimize the dependency on a particular framework, though, it is also possible to build a site with plain ol’ HTML and call on eXist only to retrieve information that needs to be queried within the database. This approach makes it easier to switch among XML databases, should there be a reason to do so, since the developer can reuse all existing code except for the actual database call and any processing performed within the database itself.

This tutorial illustrates the use of HTML with PHP to develop a site on a traditional Apache HTTP server, making calls to eXist to extract and return information from the database as needed. In this model, eXist is used as an XML database, rather than as an integrated application platform. The example is simplified, and does not necessarily represent the sort of design we would employ in production, but the PHP-mediated communication between the Apache HTTP server and eXist illustrates general principles that can be employed elsewhere as needed.

About this tutorial

The GitHub repo for this tutorial is https://github.com/ebeshero/phpStuff and the files with which it interacts are part of the Digital Mitford: The Mary Russell Mitford archive. The repo contains the HTML/PHP files and the XSLT, but not the XML data files or the XQuery. All PHP and XQuery files are reproduced below. The workflow we illustrate is as follows:

  1. The system displays a list of years, each one of which represents a year for which the corpus contains letters written by Mitford.
  2. The user selects a year from the list of years, with the ultimate intention of viewing the text of a letter from that year.
  3. The system displays a list of letters from the selected year, showing the date and recipient of each letter.
  4. The user selects a letter from the list of letters for that year.
  5. The system displays a formatted view of the selected letter.

This workflow is implemented with three PHP pages, as follows:

  1. Page 1, getYears.php, displays the list of years and lets the user choose a year in order to navigate to page 2 to select an individual letter.
  2. Page 2, getLetterList.php, displays a list of letters for the year selected on page 1 and lets the user choose a letter in order to navigate to page 3 and read the text of the letter.
  3. Page 3, getLetterText.php, displays a reading view of the text of the letter selected on page 2.

Development platforms

We recommend developing XML in an XML-aware editor, such as <oXygen/>. <oXygen/> is a commercial product, but non-commercial academic licenses are available at a reduced cost. It is possible to develop XML in other editors, but our experience has been that <oXygen/> provides a better XML development environment than the alternatives.

We recommend developing PHP in a PHP-aware editor, and we usually use JetBrains PHP Storm. PHP Storm is a commercial product, but non-commercial academic licenses are available at no cost. It is possible to develop PHP in other editors, including <oXygen/>, but the development environment is better in an editor that emphasizes PHP support.

We developed our XQuery inside eXide, the IDE built into eXist, and it is also possible to use <oXygen/> to develop code inside eXist. The advantage of developing directly within eXist is that it removes the need to upload XQuery into eXist before running it there. The disadvantage is that because the code is stored within eXist, it is not part of our project GitHub repo.

How PHP talks to eXist

Our general approach to getting our PHP script to request and receive data from eXist is to develop and save the query inside eXist ahead of time and address it with a REST call. We specify the base URL for our eXist installation in a separate file called config.php, which we import into the PHP files that perform the actual queries. The import statement looks like:

<?php
    define("REST_PATH", "http://dxcvm10.psc.edu:8080/exist/rest");
?>

We then prepend this path to our actual query (see below). The point of defining this variable separately is that should we move our eXist server to a new location, we would have to update only config.php, and all of our queries would be performed at the new location. A skeletal query performed from within PHP looks like:

<?php
    require_once("config.php");
    $contents = REST_PATH . "/db/queries/letterDates.xql";
    $result = file_get_contents($contents);
    echo $result;
?>

The first line imports the REST_PATH variable and the second line prepends that to the rest of the URL to run the letterDates.xql query that we have stored inside eXist. The third line receives the results of the query and the fourth line inserts those results into the HTML page created by the code wrapped around the query. We return all results from eXist with HTML markup but in no namespace because when we insert the result of the query into the HTML being created by our PHP, it will inherit the HTML namespace that we define there.

The preceding example runs a stored query without any input parameters. Where we do need to pass parameters into the query (see pages 2 and 3 below), we usually use an HTTP GET request, which is simpler than POST to develop (it requires less code) and debug (because we can bypass any input form and just edit the query string directly in the browser address bar). The examples below use GET requests; at the bottom of this page we illustrate how to use POST instead.

Page 1: Display a list of all years in the collection

The PHP file that displays a list of years looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>View the years</title>
</head>
<body>
<h1>Years</h1>
<hr/>
<p>Select a year to view of list of letters from that year</p>
<ul>
<?php
    require_once("config.php");
    $contents = REST_PATH . "/db/queries/letterDates.xql";
    $result = file_get_contents($contents);
    echo $result;
?>
</ul>
</body>
</html>

The PHP code in the middle runs the query and inserts the results into the HTML. As mentioned above, the namespace defined on the root <html> element is inherited by the inserted content. The letterDates.xql query looks as follows:

xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
let $mitfordColl:=collection('/db/mitford/letters')/*
let $letterYears := $mitfordColl//sourceDesc/msDesc//head/date/substring-before(@when, '-')
let $dletterYrs := distinct-values($letterYears)
for $dly in $dletterYrs
where string-length(normalize-space($dly)) gt 0 
order by $dly
return <li xmlns=""><a href="getLetterList.php?year={$dly}">{$dly}</a></li>

In the preceding XQuery and the following ones, we’ve declared the TEI namespace as the default, which saves us from having to prepend a namespace prefix to every TEI element in our XPath expressions. The price for that convenience is that in XQuery (unlike XSLT) the default namespace applies equivalently to the input and the output, which means that we now have to specify that the <li> elements in our output are in no namepsace, because if we don’t, the default will cause them to come out (incorrectly) in the TEI namespace. Some developers prefer to do the reverse, that is, to omit the default namespace declaration (so that no namespace is the default), bind the TEI namespace to a prefix, and use that prefix before all element names in the XPath that refers to the input XML.

The XQuery returns a sequence of <li> elements, each of which contains a year, and the <li> elements are inserted between the <ul> tags that are present in the PHP. What gets returned looks like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>View the years</title>
</head>
<body>
<h1>Years</h1>
<hr/>
<p>Select a year to view of list of letters from that year</p>
<ul>
<li>
    <a href="getLetterList.php?year=1818">1818</a>
</li>
<li>
    <a href="getLetterList.php?year=1819">1819</a>
</li>
<!-- more years -->
</ul>
</body>
</html>

Each year is wrapped in an <a> element with an @href attribute that points toward getLetterList.php and passes the year as a parameter called year.

Page 2: Display a list of letters for the selected year

When the user clicks on one of the links above, the system runs getLetterList.php, passing in a year as a parameter called year. The PHP file looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>Choose a document</title>
</head>
<body>
<h1>Choose a document to read from the list below</h1>
<hr/>
<ul>
<?php
    require_once("config.php");
    $year = htmlspecialchars($_GET["year"]);
    $contents = REST_PATH . "/db/queries/letterList.xql?year=$year";
    $result = file_get_contents($contents);
    echo $result;
?>
</ul>
</body>
</html>

The most significant difference between this PHP script and the first is that this one retrieves the GET parameter year and assigns it to a PHP variable called $year, and when it calls on the letterList.xql query inside eXist, it passes that value in as a GET parameter also called year. We use the htmlspecialchars() function to sanitize the input by removing possibly malicious characters that should not occur naturally in a year. In a real production environment, before issuing the REST call to eXist we would trap erroneous entry (input that is not a year, input that is a year but that is not represented in the database) and return an informative error message.

The letterList.xql query looks as follows:

xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare variable $year := request:get-parameter('year', '1819');
let $mitfordLetters:=collection('/db/mitford/letters')/*
let $letterTitles := 
    $mitfordLetters//sourceDesc/msDesc//head[date[normalize-space(substring-before(@when, '-'))[string-length(.) gt 0] = $year]]
for $LT in $letterTitles
let $ld := $LT/ancestor::teiHeader//sourceDesc/msDesc//head/date/normalize-space(@when)
order by $ld
return <li xmlns=""><a href="getLetterText.php?uri={tokenize($LT/base-uri(), '/')[last()]}">{$LT/replace(.,'\W+$','')}</a></li>

We declare a variable called $year and populate it by using request:get-parameter() to retrieve the GET parameter called year that was passed in as part of the REST call issued by the PHP script. We specify 1819 as a default year in case no year parameter was entered; in a real production environment we would instead trap that sort of error and return an informative error message. Our $letterTitles variable selects the titles of all letters with a year that matches the value of the $year variable, we use $ld (the full date of each of those letters) to sort them chronologically, and we output the titles as HTML links. The output HTML looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>Choose a document</title>
</head>
<body>
<h1>Choose a document to read from the list below</h1>
<hr/>
<ul>
<li>
    <a href="getLetterText.php?uri=1819-01-09-Elford.xml">Letter from Mary Russell Mitford to Sir William Elford, 1819 January 9</a>
</li>
<li>
    <a href="getLetterText.php?uri=1819-01-12_Elford.xml">Letter from Mary Russell Mitford to Sir William Elford, 1819 January 12</a>
</li>
<!-- more letters -->
</ul>
</body>
</html>

Each letter title is a link to our third PHP script, getLetterText.php, and passes a GET parameter called uri, which is the unique name of the XML file in the eXist database.

Page 3: Display a reading view of the selected letter

The third PHP script looks as follows:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>Read a document</title>
</head>
<body>
<h1>Read a document</h1>
<hr/>
<?php
    require_once("config.php");
    $uri = htmlspecialchars($_GET["uri"]);
    $contents = REST_PATH . "/db/queries/letterText.xql?uri=$uri";
    $result = file_get_contents($contents);
    echo $result;
?>
</body>
</html>

It is identical in functionality to the second script, getLetterList.php, except that instead of passing a GET parameter called year with the year as a value, it passes a GET parameter called uri with a filename as a parameter. As with getLetterList.php, in a real production environment we would perform error trapping, such as checking the value of the uri parameter against a whitelist of all filenames in the database, so that we could return an informative error message in case of an erroneous value.

The getLetterList.php PHP script runs the letterText.xql query, which looks as follows:

xquery version "3.0";
declare default element namespace "http://www.tei-c.org/ns/1.0";
declare variable $uri := request:get-parameter("uri", "1819-05-30_Elford.xml");
declare variable $letterFileName := concat('/db/mitford/letters/',$uri);
let $letter := doc($letterFileName)/*
return transform:transform($letter, doc('/db/xslt/MitfordLetterTransformPHP.xsl'), ())

This query retrieves the uri GET parameter, using an existing filename as a default if no parameter value is supplied. In a real production environment we would trap erroneous or missing input and return an informative error message, instead of serving an arbitrary file. The script then finds the letter in question and performs an XSLT transformation (using the transform:transform() function) to convert it to an HTML fragment. The transformation does not return an entire HTML document because the PHP script is already providing the HTML superstructure; eXist returns only the HTML that needs to be inserted between the <body> tags created by the PHP. As above, we return that in no namespace because it will inherit the HTML namespace from the PHP wrapper before the results are returned to the user.

We don’t provide the XSLT here because it is long and of minimal illustrative value, since it is customized to the particular markup conventions of the Digital Mitford project, and it is available in our GitHub repo. The important details are that we return results in no namespace and we use:

<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>

to set the return method to xml and suppress the XML declaration.

POST queries

We generally prefer GET to POST in our own projects, but because GET strings can cause a URL to exceed the allowed limit (which varies by server and browser), some applications my need to use POST as a way of passing more information into eXist than would be possible with GET. Here is a sample POST query (from a different project, which runs a bibliographic query), based on the explanation at http://stackoverflow.com/questions/5647461/how-do-i-send-a-post-request-with-php

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head><title>Test</title></head>
    <h1>PHP test</h1>
    <form action="test1.php" method="post">
        <label for="target">Keyword:</label> <input name="target" type="text"/>
    </form>
</html>

When the user types in a keyword and hits the enter key, the system runs test1.php, passing a parameter called target with the value supplied by the user. The test1.php script looks as follows:

<html>
<head><title>Hi, Mom!</title></head>
<body>
<h1>Stuff</h1>
<?php
$target = htmlspecialchars($_POST['target']);
$url = 'http://localhost:8090/rest/db/repertorium/xquery/searchBibl.xql';
$data = array('type' => 'input', 'target' => $target);

// use key 'http' even if you send the request to https://...
$options = array(
    'http' => array(
        'header'  => "Content-type: application/x-www-form-urlencoded\r\n",
        'method'  => 'POST',
        'content' => http_build_query($data)
    )
);
$context  = stream_context_create($options);
$result = file_get_contents($url, false, $context);
if ($result === FALSE) { /* Handle error */ }

//var_dump($result);
echo $result;
?>
</body>
</html>

In a real production environment we would use config.php to set a REST_PATH variable, as above, and we would trap errors and return informative error messages. We’ve commented out the var_dump() function, which we used for development and debugging. More information about HTTP request parameters is available at the StackOverflow posting from which we copied this code.