Digital humanities

Maintained by: David J. Birnbaum ( [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2018-04-13T23:24:19+0000

SVG test

The task

Your first SVG homework assignment asked you to create by hand an SVG bar chart that included the results of a Best stooge ever contest, and your second and third SVG homework assignments asked you to write XSLT to generate a bar graph of presidential election results. You may have figured out that we faked the data for the Best stooge ever contest, but this one is for real: recently IMDb polled its readers about their favorite movie of all time. Some of the poll results can be represented in a bar graph, like the data from the stooge contest or the US presidential election. Your task for this test is to write an XSLT stylesheet to convert the XML results of the movie poll (below) into an SVG bar graph. At a minimum, your graph should include the X and Y axes with labels (with whatever labels you consider appropriate), as well as bars for the percentage of top-ten votes for each movie. You should upload only your XSLT to CourseWeb; we do not need either the XML or the SVG that your XSLT generates.

In the XML below, each element contains the number of raw votes each movie received.

    <movie name="the_godfather">281</movie>
    <movie name="pulp_fiction">197</movie>
    <movie name="gladiator">113</movie>
    <movie name="apocalypse_now">85</movie>
    <movie name="goodfellas">85</movie>
    <movie name="the_godfather_part_2">78</movie>
    <movie name="jaws">73</movie>
    <movie name="taxi_driver">69</movie>
    <movie name="wizard_of_oz">66</movie>
    <movie name="life_is_beautiful">55</movie>

Our solution

<xsl:stylesheet xmlns:xsl=""
    xmlns:xs="" exclude-result-prefixes="xs" version="3.0"
    <xsl:output method="xml" indent="yes"/>

    <xsl:variable name="total" as="xs:double" select="sum(//movie)"/>
    <xsl:variable name="barWidth" select="30" as="xs:double"/>
    <xsl:variable name="interbarSpace" select="$barWidth div 2" as="xs:double"/>
    <xsl:variable name="yScale" as="xs:double" select="3"/>
    <xsl:variable name="color" as="xs:string+"
            'red', 'CornflowerBlue', 'green', 'LemonChiffon', 'Lavender', 'LightCyan',
            'LightGreen', 'pink', 'grey', 'Violet', 'LemonChiffon', 'Lavender', 'LightCyan'"/>
    <xsl:variable name="maxHeight" select="max(//movie) div $total * 100" as="xs:double"/>
    <xsl:variable name="XLength" select="($interbarSpace + $barWidth) * (count(//movie) + 1)"
    <xsl:template match="/">
        <svg width="{$XLength}" height="{$maxHeight + 200}">
            <g transform="translate(40,{$maxHeight + 100})">
                <!-- x-axis -->
                <line x1="0" y1="0" x2="{$XLength}" y2="0" stroke="black" stroke-width="2"/>
                <!-- y-axis -->
                <line x1="0" y1="0" x2="0" y2="-{$maxHeight * $yScale + 30}" stroke="black"
                <!-- 10% gray line -->
                <line x1="0" y1="-{10 * $yScale}" x2="{$XLength}" y2="-{10 * $yScale}" stroke="lightgray"
                <text x="-10" y="-{10 * $yScale}" text-anchor="middle" font-size="50%">10%</text>
                <!-- 20% gray line -->
                <line x1="0" y1="-{20 * $yScale}" x2="{$XLength}" y2="-{20 * $yScale}" stroke="lightgray"
                <text x="-10" y="-{20 * $yScale}" text-anchor="middle" font-size="50%">20%</text>
                <!-- x-axis label -->
                <text x="-20" y="-{$maxHeight div 2 * $yScale}" font-size="65%" text-anchor="middle"
                    transform="rotate(270 -20 -{$maxHeight div 2 * $yScale})">Percentage of
                <!-- y-axis label -->
                <text x="{$XLength div 2}" y="45" text-anchor="middle" font-size="75%">Movies</text>
                <!-- title -->
                <text x="{$XLength div 2}" y="-{$maxHeight * $yScale + 25}" text-anchor="middle"
                    >Poll of Greatest Movies of all Time</text>
                <xsl:apply-templates select="//movie"/>
    <xsl:template match="movie">
        <xsl:variable name="XPos" as="xs:double"
            select="(position() - 1) * ($barWidth + $interbarSpace) + $interbarSpace"/>
        <xsl:variable name="votes" as="xs:double" select="."/>
        <xsl:variable name="votePercent" as="xs:double" select="$votes div $total * 100"/>
        <rect x="{$XPos}" y="-{$votePercent * $yScale}"
            height="{$votePercent * $yScale}" width="{$barWidth}" stroke="black"
            fill="{$color[count(current()/preceding-sibling::movie) + 1]}"/>
        <!--percentage of votes text-->
        <text x="{$XPos + ($barWidth div 2)}" y="-{($votePercent * $yScale) + 5}"
            font-size="75%" text-anchor="middle"><xsl:value-of
                select="round-half-to-even(. div sum(//movie) * 100)"/>%</text>
        <!--movie name-->
        <text x="{$XPos + ($barWidth div 2) - 15}" y="15"
            transform="rotate(-50 {$XPos + ($barWidth div 2)- 15} 0)" text-anchor="end"
            <xsl:value-of select="translate(@name, '_', ' ')"/>

The SVG output of our transformation looks like:

10% 20% Percentage of Votes Movies Poll of Greatest Movies of all Time 25% the godfather 18% pulp fiction 10% gladiator 8% apocalypse now 8% goodfellas 7% the godfather part 2 7% jaws 6% taxi driver 6% wizard of oz 5% life is beautiful


A robust solution to this task uses variables so that the XSLT will continue to work should the data set change. After the next IMDb movie poll, the XSLT above will produce an SVG graph without much tweaking or adjusting, despite additions. The exception to this, however, is if any movie were to receive more than 25% of the vote, we would no longer be producing enough gray lines to make the graph look balanced, and the tops of very tall bars could be cut off. Based on the public’s divided opinions about movies, we think our choice not to include more lines as the maximum height grows is a safe one. If we needed bullet-proof code, we could have found the maximum percentage and let the data tell us how high our ruling lines needed to go (as well as how far we should translate our graph downward).

We chose to color our bars based on the position of the bar, although we could also have made them all the same color, or chosen a color according to the value, or simply assigned colors to movies arbitrarily. We allowed for 13 colors, sort of arbitrarily, even though we have only the top 10 films, which means that our code could break if we needed to graph, say, the top 15. We could have protected ourselves from this risk by using the mod operator to cycle through the colors again when we get to the end. Can you figure out how?

Finally, we used the translate() function to remove the underscore from the movie names. In Real Life we would want to fix the capitalization, in which case Wizard of Oz poses a unique challenge. (The official title of this film is The Wizard of Oz, but the problem is with “of”, and not “the”.) The most robust solution to capitalization is to include the capitalized value in the XML, since if you need versions with and without capitalization, it’s easier to strip capitalization globally than to introduce it selectively. But sometimes we don’t have control of the XML. Robust algorithmic capitalization is a challenging problem, but how would you solve it for just the data in this poll?