Digital humanities


Maintained by: David J. Birnbaum (djbpitt@gmail.com) [Creative Commons BY-NC-SA 3.0 Unported License] Last modified: 2022-09-05T21:31:13+0000


Text #1: XML

Instructions

Answer the following questions and upload your answers to Canvas. All tests in this course are open book, so you can look things up and you can try out your code in <oXygen/>, but you cannot receive help from another person.

Create your answers in <oXygen/> as a plain-text document, which you can do by creating a new document (using the File → New menu options) and typing text into the Type filter text box. Do not use a word processor (like Microsoft Word) because word processors do things like changing your straight quotation marks to curly ones, which you don’t want. The file you upload must follow our file-naming conventions, using .txt as the filename extension (to indicate that you are submitting a plain-text file).

Required questions

  1. What is the difference between descriptive and presentational markup? Which one do we prioritize in the digital humanities and why?

  2. XML elements can have four types of content, listed below. For each one provide a brief description and an example (in XML):

    1. Element content

    2. Text content

    3. Mixed content

    4. Empty element

  3. If you paste the following document into <oXygen/> you’ll get a red square because your document won’t be well formed. What’s wrong with it and how can you fix it? (There may be more than one thing to fix.)

    
    Shopping list
    Apples
    Yogurt
    Ice cream]]>
  4. If you copy and paste the following document into <oXygen/> you’ll get a red square because your document won’t be well formed. What’s wrong with it and how can you fix it? (There may be more than one thing to fix.)

    
    1 < 2]]>

Optional extra-credit questions

  1. What is the difference between a) well-formed, b) valid, and c) well-balanced XML?

  2. If Curly, Larry, and Moe all speak in unison and you try to represent it as:

    Nyuk, nyuk nyuk!]]>

    you’ll raise an error. Why is this not well formed and how can you fix it in a way that is amenable to machine processing?

  3. One of the questions above notes that:

    
    1 < 2]]>

    is not well formed. Yet:

    
    1 > 2]]>

    is well formed. Why might an XML parser (the part of <oXygen/> that has to translate our character-by-character XML into a tree structure) treat these differently?