Practicum #1: Markup and metadata

One thing that our group foregrounded when completing this practicum was the end goal. We tried to imagine why a scholar would want to access a corpus of reviews of a particular short story. Beyond simply wanting to gauge the critical reception of the work, we thought it would be useful to encode a few things: other works and authors that are discussed, religious references, place names (real and fictional), and one instance of a reference to laws (“Jim Crow”). In a lot of ways, these were very obvious things (i.e. mostly proper nouns) that one would expect an encoded corpus to include. Then we started to think about what we could say about these things. What research questions could we imagine having about the corpus? How could we leverage the specificity and granularity of encoding to think about the relation of the references, critics, publications, and object of analysis?

This led us to try to encode parts of speech when they were in reference either to the work (“The Celestial Railroad”), or to other works or references discussed in the review. We intended this as a kind of sentiment analysis, so we focused particularly on adjectives and verbs surrounding discussions of the principle work and other works referenced. While one can imagine an interesting sentiment map emerging from an entire corpus of reviews–perhaps even mapped to the political affiliations of the publications in which the reviews were published–the sheeer amount of labor required to accomplish this may be prohibitive. Further, it would be difficult, assuming a collaborative approach, to formalize the encoding, as determining sentiment is heavily interpretive (both in the specific case, and in deciding which words warrant encoding). So perhaps this was a misguided trial, but it did lead to some interesting conversations amongst the group.

From working on the Viral Texts project, I think I have a particularly first-hand view of the way the end user (or, imagined end user) shapes the interpretive process of applying metadata. Certain moments in the project raised questions that caused radical revisions of our tagging taxonomy (notably when Prof. Cordell actually wrote an article using the database interface). These were moments that helped us think about how scholars would actually interact with the site. Though textual encoding is a totally different method, this focus on end user was similarly foregrounded in the practicum.

And I’m still mucking around with TEI Boilerplate for fun. I’ll update this post once I get a working version hosted.

UPDATE 10/04/2014:

I’ve got the TEI document hosted on my site and styled with the TEI Boilerplate. It allows for very easy styling by declaring <rendition> elements with your desired styling that can then be referenced as attributes on any element in your document. For instance, I declared

<rendition xml:id=“b” n=“teibp:bold” scheme=“css”>
font-weight:bold;
</rendition>

then applied this attribute to every <persName> element like so:

<persName xml:id=“Hawthorne” rendition=“#b”>Hawthorne</persName>

Simple as that. I stole most of the styles from their demo document, but also modified some for my purposes. One other handy thing is that TEI boilerplate has a built-in mechanism for including facsimile pages in your <pb /> elements using the @facs attribute. Super easy!

Our lightly styled TEI document can be viewed here: http://kevingeraldsmith.com/TEI-Boilerplate-master/content/Non-Slaveholder.xml

Pointing-to: On genealogies, ‘what is DH?’, and the generative encounters of stating rules

I’ve been thinking about disciplinarity, and how that maps onto the DH genealogy approach we took to the start of the class. Specifically, why is this approach novel? And why are ‘What is DH?’ articles so ubiquitous that to not include any of them on a DH course syllabus is worthy of  a tweet? And, if the backlash to this genre is in full swing, as I think it is, what are more productive ways of talking about disciplinarity in DH?

As Tom Scheinfeldt argues, focusing on the diversity of DH (rather than our connectedness) can be generative,

I believe the time has come to re-engage with what make us different. One potentially profitable step in this direction would be a continued exploration of our very different genealogies … In the end, I believe an examination of our different disciplinary histories will advance even our interdisciplinary purposes: understanding what makes us distinctive will help us better see what in our practices may be of use to our colleagues in other disciplines and to see more clearly what they have to offer us.

For Scheinfeldt, the diversity can be traced through a genealogical approach. In Alan Liu’s article “Imagining the New Media Encounter,” he describes the messy and generative encounters of new media/methods and old media/methods,

We thought we knew what “writing” means, but now “encoding” makes us wonder (and vice versa). So, too, “reading” and “browsing” (as well as related activities like searching, data-mining, and data-visualization) destabilize each other.

Can we think of this destabalization not just in encounters of new and old media, but also in encounters of the disparate disciplines and intellectual traditions colliding together in an amalgam we call DH? This is similar to the ‘productive unease’ which Flanders seeks to highlight; “productive: not of forward motion but of that same oscillating, dialectical pulsation that is the scholarly mind at work.” Flanders sees this unease also around the creation and manipulation of digital models for literary study:

The word verification stands out here, sounding very cut and dried, threateningly technical, a mental straitjacket, but in fact the key phrase there is “the rules we have stated”: it is the act of stating rules that requires the discipline of methodological self-scrutiny … [A]s our tools for manipulating digital models improve, the model stops marking loss and takes on a clearer role as a strategic representation, one which deliberately omits and exaggerates and distorts the scale so that we can work with the parts that matter to us.

I want to focus on this idea “stating rules” in a way that one can then point to them–these are the rules, this is my methodology. It seems to me that the ubiquity of “What is DH?” articles comes from an impulse to circumscribe the field in a way that one can then point to something and say, “This is DH,” and, of course, “This is not DH.” I think that the project of these articles–that is, circumscribing the field in a particular way is a way of self-consciously stating rules–is helpful in staging the encounters Scheinfeldt, Liu and Flanders imagine.

I am reminded of Stephen North’s iconic 1987 book, The Making of Knowledge in Composition: Portrait of an Emerging Field. Controversial upon its release–and no less controversial today–North’s portrait of Composition is a legitimizing text. The groupings North enforces (with all the authority of Theuth displaying his inventions to Thamus) serve to tie together disparate intellectual traditions and especially methodologies–from the critical, historical and philosophical scholars; the experimentalists drawing from research in cognitive psychology; to the ethnographers appropriating anthropological methods–into a single field, big ‘C’ Composition. The benefit of North’s text wasn’t so much that he was right about everything, but rather he concretely circumscribed the field and opened a decades-long debate over the genealogy and geography of the field.

Unsurprisingly, North’s portrait is spatial, organized as a map, complete with bustling cities and lonely frontiers. One can imagine a similar map of DH–with the manifold genealogical threads of DH leading to countries of encoders, network theorists, archivists, media archaeologists, etc. etc. Indeed, this seems to be what the “What is DH?” And like North’s map of Composition, the spatial distinctions between the territories would elide the overlap between methods and interdisciplinarity of actually-existing DH projects. But it would also state rules, and thus self-consciously adhere to some methodology for stating those rules. It would then be open to (hopefully) generative critique.

And pointing-to is important when we think of how the field is perceived by the academy at large. This seems to be part of the impetus behind the more radical restructuring of the university imagined in Liu’s “theses” to an emerging DH center, or Hayles’s and Pressman’s Introduction to Comparative Textual Media, “Making, Critique: A Media Framework.” Any restructuring of the academy requires capital–both material and cultural/political–that requires a legitimacy and concreteness that comes from pointing-to.

As a brief aside, I’d also like to think about how we can go in the other direction–mapping a discipline with DH methods. Two interesting projects from Jim Ridolfo (assistant professor of Writing, Rhetoric, and Digital Studies at the University of Kentucky) come to mind. The first, his super-useful Rhetmap, which started as a simple mapping of PhD programs, but has extended to map all Rhet/Comp job listings for the year (those published in the MLA JIL). Ridolfo also uses the MLA JIL in the second mini-project I want to mention, #MLAJIL Firsts. Here, Ridolfo uses OCR to look at the history of MLA job listings to track the first instances of some disciplinary keywords: rhetoric (1965), rhetoric and composition (1971), computer (1968), computers and writing (1990), humanities computing (1995), digital rhetoric (2000), and digital humanities (2000). Ridolfo’s analysis here doesn’t extend much beyond this list, but he does make the OCR data available on his site in hopes that the data will be “useful for learning more about job market histories in English studies.” Indeed, I think there is the possibility to use this kind of concrete historical data to think more about the diverse genealogies–the trunks of the DH tree.