Practicum #1: Markup and metadata

One thing that our group foregrounded when completing this practicum was the end goal. We tried to imagine why a scholar would want to access a corpus of reviews of a particular short story. Beyond simply wanting to gauge the critical reception of the work, we thought it would be useful to encode a few things: other works and authors that are discussed, religious references, place names (real and fictional), and one instance of a reference to laws (“Jim Crow”). In a lot of ways, these were very obvious things (i.e. mostly proper nouns) that one would expect an encoded corpus to include. Then we started to think about what we could say about these things. What research questions could we imagine having about the corpus? How could we leverage the specificity and granularity of encoding to think about the relation of the references, critics, publications, and object of analysis?

This led us to try to encode parts of speech when they were in reference either to the work (“The Celestial Railroad”), or to other works or references discussed in the review. We intended this as a kind of sentiment analysis, so we focused particularly on adjectives and verbs surrounding discussions of the principle work and other works referenced. While one can imagine an interesting sentiment map emerging from an entire corpus of reviews–perhaps even mapped to the political affiliations of the publications in which the reviews were published–the sheeer amount of labor required to accomplish this may be prohibitive. Further, it would be difficult, assuming a collaborative approach, to formalize the encoding, as determining sentiment is heavily interpretive (both in the specific case, and in deciding which words warrant encoding). So perhaps this was a misguided trial, but it did lead to some interesting conversations amongst the group.

From working on the Viral Texts project, I think I have a particularly first-hand view of the way the end user (or, imagined end user) shapes the interpretive process of applying metadata. Certain moments in the project raised questions that caused radical revisions of our tagging taxonomy (notably when Prof. Cordell actually wrote an article using the database interface). These were moments that helped us think about how scholars would actually interact with the site. Though textual encoding is a totally different method, this focus on end user was similarly foregrounded in the practicum.

And I’m still mucking around with TEI Boilerplate for fun. I’ll update this post once I get a working version hosted.

UPDATE 10/04/2014:

I’ve got the TEI document hosted on my site and styled with the TEI Boilerplate. It allows for very easy styling by declaring <rendition> elements with your desired styling that can then be referenced as attributes on any element in your document. For instance, I declared

<rendition xml:id=“b” n=“teibp:bold” scheme=“css”>
font-weight:bold;
</rendition>

then applied this attribute to every <persName> element like so:

<persName xml:id=“Hawthorne” rendition=“#b”>Hawthorne</persName>

Simple as that. I stole most of the styles from their demo document, but also modified some for my purposes. One other handy thing is that TEI boilerplate has a built-in mechanism for including facsimile pages in your <pb /> elements using the @facs attribute. Super easy!

Our lightly styled TEI document can be viewed here: http://kevingeraldsmith.com/TEI-Boilerplate-master/content/Non-Slaveholder.xml