Key project: TAPAS as application and as community

I made a late switch for my key project reflection. I was originally planning to review the “Mapping Texts” project, which is interesting to me in part from my work on Viral Texts, as both projects make use of data from the Chronicling America project’s set of digitized newspapers. Instead, I’ve chosen TAPAS, which I identify as a possible repository to host a collection of TEI texts produced in a hypothetical composition course that may be my dissertation project (see my post here on the perils of DH pedagogy). With this in mind, I’d like to lay out the scope of TAPAS as it currently stands. Keeping in mind that the platform was only officially released on October 7th (a week ago!), I will also spend some time laying out what TAPAS aims to become. Finally, in the spirit of self-interestedness, I’ll talk a bit about how TAPAS does (and doesn’t) fit the needs of my potential project.


TAPAS is a TEI publishing and repository service that aims to provide that oft-forgotten element of DH projects—infrastructure. This is not the case at Northeastern (due in large part to the growth of the Digital Scholarship Group), but many institutions do not have the infrastructure to support DH projects—server space, IT personnel maintaining that server space, etc. This is especially if those projects require not only server space, but that the materials remain accessible for long periods of time. As a result, there are countless well conceived TEI projects that, after their initial publication, have disappeared or are no longer accessible in their intended form. Whether this is from lack of resources or simply lack of time maintaining a site, this is a discouraging factor for adopters of the TEI. TAPAS seeks to address this issue by providing the necessary infrastructure and expertise required to publish and maintain TEI projects.

I’ve already used TAPAS, in a sense, to teach a TEI workshop to Prof. Cordell’s undergraduate Technologies of Text course. To be more precise, I used a collection of TEI encoded letters that is published and stored on TAPAS called the Dorr Rebellion Project as the basis for the in-class workshop. I had the students transcribe and encode a series of Dorr’s correspondence working from printed facsimiles of the original letters. Then to underscore the particular choices the students made as editors, I had them compare their own transcriptions with those on the Dorr Rebellion Project. [For those interested in learning more about a fascinating time in Rhode Island history, I recommend you check out the project’s external site]. It was this experience that inspired me to examine TAPAS further through the lens of pedagogical potential. I argue that TAPAS has the potential to make a meaningful contribution to the field of DH in two primary ways: as an application, and as a site of community. I stress potential here because it seems to me that a platform like TAPAS is only as good as its users, and how it reacts to the needs of early adopters. I hope this will become clear as I get into more detail.

TAPAS as an application

As briefly mentioned above, infrastructural and technical support are the primary benefits a particular project can expect by using the TAPAS platform. First, TAPAS provides the infrastructure for people to publish their TEI. This includes all-important server space. The platform will recognize the All-TEI schema, and provide very basic visualization of valid TEI. Importantly, and in addition to the infrastructure, a key feature of TAPAS is

to provide data curation services … includ[ing] automated migration of TEI data to future versions of the TEI, and basic format migration of any image files. We will also offer the option of more detailed hand curation for data that cannot be automatically migrated, probably for an additional fee. (According to their FAQ page)

To me, this is the primary technical contribution of TAPAS. Using TAPAS to publish your TEI ensure that this work will be accessible for the long term. Changes in the TEI guidelines, HTML markup, or any other issues which normally cause a web project to become inaccessible will be dealt with by TAPAS curators. It is important to note here that TAPAS is a paid service (through paid membership in the TEI Consortium). Here, they hint at a yet-to-be-defined premium fee structure for special projects requiring additional curation.

TAPAS as a community

The impact of TAPAS should not be measured through technical features alone. Rather TAPAS is meant to function as a community space on and through which to build, share, and collaborate on TEI projects. TAPAS enables this community in several ways. First, the TAPAS Commons provides a space for those without membership in the TEI to try out TAPAS before making the initial investment. These collections are publicly accessible. Any good platform is only as good as its documentation and forums (see WordPress and Omeka for excellent examples of what a forum community can look like). The Forums are made up of the community of people using TAPAS, where questions can be posed and expertise shared. Finally, the TEI project visualizations provide spatial (see map for an example) and tempScreenshot 2014-10-13 at 4.36.02 PMoral ways of accessing the growing repository of TAPAS. A major goal of TAPAS is the sharing of TEI—aimed at fostering the growth of textual encoding as a method of scholarly production. The geographic and temporal visualizations of are ways for entrants and established scholars to find relevant projects as models or sources of collaboration. Obviously in this early stage these visualizations are a bit sparse, but will grow in size and usefulness as TAPAS is adopted by more projects.

Personally, I am interested in how TAPAS could function as a repository for publishing and hosting TEI files created by students. As it stands, I don’t think TAPAS would work, but there are encouraging developments in the works for subsequent releases (all of this information is available on the FAQ page). First, future iterations of TAPAS intend to allow for the use of custom schema, something I think will be a necessity for my project (as I mentioned in a previous post, the schema would ideally be designed in conjunction with the students in the class). Second, the TAPAS interface is significantly limited in terms of visualization options. Neither do they support user-generated stylesheets. These are both aspects of the platform they plan to remedy in future iterations. An important aspect of teaching a class centered on TEI as a tool for composition is the ability for students to design the ultimate presentation of the text. This is a significant sticking point, as I believe XSLT is too large a topic to broach in a course that is already introducing XML/TEI. A publishing repository like TAPAS that incorporated WYSIWYG transformation tools would be an amazing boon to this hypothetical course. At the bare minimum, I would think that user-generated stylesheets would need to be supported (and probably produced by me). As a final entry on what is shaping up to be my wish list, it would be ideal for students to actually compose the encoded texts on the site. Some kind of integrated text editor would eliminate the need for Oxygen subscriptions (which are only free for 30 days), and mitigate the potential costs of a TEI-centered course.

I hope that TAPAS develops some of this functionality sooner rather than later. And since I happen to be a student at Northeastern (which houses TAPAS), I may just be able to influence development of those features that would meet my needs. I’ve talked to some of the people working on the project (Julia Flanders and Ben Doyle), and I have a meeting set up to talk about my potential project and whether TAPAS could serve as my principal platform. Regardless, it is an exciting project that bodes well for the adoption and proliferation of the TEI. As the initial temporal and monetary investment of beginning a TEI project are reduced, one can imagine a (relative) boom in text encoding. Lowering the barriers to entry should help graduate students, junior scholars, and scholars at smaller universities lacking infrastructure; this should lead to better and more diverse scholarship.


Attending to complexity; or, how do I say anything about Twitter?

(This is part 1, read part 2 here)

When Moya Bailey visited our class I asked a question about her emerging research using Twitter data. I asked about her object of inquiry, and she gave me a thoughtful response about a kind of shifting lens–between analyzing the network as such and more traditional ethnographic methods like interviews. I’d like to tease out a few points here about where my question was coming from, and some follow up thoughts that pertain to my own research.

My interest in this question of the object of inquiry was partly due to having attended Katherine Bode’s talk titled “Digital Humanities & Digitized Newspapers: The Australian Story” (you can view the slides and read the transcript here). After a brief overview of the DH scene in Australia, Bode proceeded to talk a bit about her project. The real contribution, I thought, were the four epistemological principles she described for those doing digital archival research.

  1. Literary works are processes, not singular objects, in time and space
  2. The archive contains multiple systems of meaning
  3. Digital methods for accessing the archive increase the potential for unrealized mismatches between the access we intend and the access we achieve
  4. All data must be published

Bode was talking about her research into serialized novels, but these principles apply to digital archival research writ large. I will try to say something about how these principles may apply to Bailey’s project (I won’t say anything about publishing data, though I do believe that is important and may present issues for Twitter data). First, the hashtag that Bailey is investigating (#girlslikeus) is not a singular object, but a process in space and time. This is something I think Twitter makes readily apparent in ways that printed material does not. But thinking more about temporality of tweets, what does it mean to remove a tweet from its original spatial/temporal context and present it in some other space? Each individual tweet is complexly situated in a moment, but the hashtag as an entity is also complexly situated temporally. That is, if the mass of previously hashtagged tweets that make up a hashtag right now can be described as the context for an entry into this conversation, the addition of a tweet to this hashtag changes the context. The next tweet that uses the hashtag is then responding to an altered context. Another way to say this is that the meaning/context of a hashtag is constituted by every element (tweet) that uses it. This might be obvious. But it also might be important to think about how we can possibly research certain kinds of data. How do you attend to the multiple systems of meaning in an archive (or Twitter)? How can you make interpretive claims about one or the other? This is where, I think, Bailey is right to employ a zooming lens—focusing on individual actors in the assemblage of meanings with shifting methods of scale.

But then, what about the third principle? What are some potentially unrealized mismatches between what one intends to access (by downloading 1% of all Twitter data to examine a hashtag), and the access one actually achieves? What does the network science aspect of Bailey’s project actually afford her? I think it is important here, as Ben Schmidt has argued, to think about the type of data that certain digital methods allow and what kinds of claims can/should be made:

when we look at digital sources, we should exploit them in the direction that they can best expand the conversation. As a rule, there’s reason to think digital sources tend to be worse than the evidence we’re already using for understanding individual stories and individual motivations, but much better than the evidence we’re using for describing large-scale, structural changes. If you want to know what it felt like on a whaling ship, you should go to an archive.* But if you want to understand the patterns of resource depletion that the whaleships engaged in, you will get a better understanding by looking at statistical aggregates than trying to interpret individual results. (Schmidt)

So I guess this is all to say that I’m not sure what kinds of claims Bailey will make based on the Twitter data. This isn’t a criticism, she hasn’t even seen the data yet.

(TOTAL ASIDE: I was also thinking about Emily Apter’s Against World Literature, in which she invokes “untranslatability as a deflationary gesture toward the expansionism and gargantuan scale of world-literary endeavors” (3). While Apter does “endorse World Literature’s deprovincialization of the canon and the way in which, at its best, it draws on translation to deliver surprising cognitive landscapes hailing from inaccessible linguistic folds,” she claims to “have been left uneasy in the face of the entrepreneurial, bulimic drive to anthologize and curricularize the world’s cultural resources, as evinced in projects sponsored by some proponents of World Literature … [which] fall prey to the tendency to zoom over the speed bumps of untranslatability in the rush to cover ground” (2-3). Here Apter is responding (in part) to Moretti’s distant reading (along with the Wallerstein’s world-systems theory and the Annales School of history from which Moretti derives much of his theories in “Conjectures on World Literature”). Apter’s brilliant book then leverages the tensions in moments of untranslatability–in literature, art, architecture–to slow down and attend to the often intractable complexity that comes with translation. As such, she resists Moretti’s totalizing call to read “the great unread.” How does this work in a DH context? In the case of Bailey’s project, I am interested to see (as the project develops) the extent to which she attends to the complexity of the individual tweet and member in the #girlslikeus conversation/assemblage, in all of its complexity, while also trying to say something about the macro-scale of #girlslikeus as an artifact. Fascinating!)

I’ve thought a lot about this object of inquiry question with regard to Twitter, principally in conjunction with a paper I wrote last semester that tried to develop a methodology for analyzing the curation of born-digital artifacts in a digital archive. This curation was done in the form of student exhibits created for an advanced writing class working in conjunction with Our Marathon (you can check out some of the exhibits here). I was looking at a single exhibit which consists of a series of tweets with explanatory text added by the student. I wanted to get at the question: does the tweet fundamentally change in meaning as it is pulled from someone’s Twitter and added to a digital archive? In order to attend to the complexity of the objects of analysis I tried to combine Actor-Network Theory (from Bruno Latour’s Reassembling the Social) and a social semiotic approach to multimodal compositions (drawn primarily from new media theorists Gunther Kress and Lev Manovich). My idea was to try to attend to the materiality of the tweets in order to simply describe the social context of the communicative act, the assemblage of actors for the tweet-as-artifact, as it originally appeared and then as it was embedded in a digital archive. I then tried to make interpretive claims (via semiotic analysis) based on the shifting assemblages as the tweet moved into the archive. While the project wasn’t a total success (in particular, the way I employed ANT was super selective and ended up feeling rather arbitrary), it did get me thinking a lot about how we think about using DH methods in the writing classroom. This is something I talk about more in part 2 of this post, as I plan to employ DH methods in a class as part of my dissertation project.

Critical interventions & critical literacies: The perils of DH pegagogy

(This is part 2, read part 1 here)

While I am no longer sure I want to explore the marriage of ANT and social semiotics, I am thinking about ways that we can think productively about the theories of digital methods (be they archival, encoding, or algorithmic). McPherson‘s call to action strikes close to my heart as a compositionist interested in critical literacies:

In extending our critical methodologies, we must have at least a passing familiarity with code languages, operating systems, algorithmic thinking, and systems design. We need database literacies, algorithmic literacies, computational literacies, interface literacies. We need new hybrid practitioners: artist-theorists, programming humanists, activist-scholars; theoretical archivists, critical race coders. We need new forms of graduate and undergraduate education that hone both critical and digital literacies.

As I move towards a dissertation project that increasingly seems like it will be a research study using DH methods in a composition class, I wonder which of these literacies are important for my students to understand. I love the idea of having students encode their own writing in TEI. The main benefits I see of this is to make explicit the often implicit conventions of academic discourse. As Trey Conatser argues:

Requiring students to compose and tag their writing within an XML editor ensured that they explicitly and deliberately identified their own rhetorical and compositional choices.

I agree, but wonder how critical interventions made by scholars like McPherson, Koh, Bailey, and Cecire need to inform this practice? Can I have students compose in TEI/XML without addressing the hierarchical structure of XML tagging? Indeed, McPherson critiques the history of DH, as work “proceeded as if technologies from XML to databases were neutral tools.” If I want my students to curate multimodal exhibits, how do we talk about the politics of the interface, let alone delve into critical code studies? This becomes a logistical problem as much as it is a theoretical problem. How will students have time to meaningfully engage these methods while also taking the time to critically examine them?

One answer to this (on the TEI encoding side) is to have the class collectively create a schema for marking up texts. Conatser describes personally creating schema for each of the four assignment in his XML-based composition class:

For each assignment I specified a markup scheme in line with its most urgent goals.

For all the promise I think is in the idea of a XML-based composition class, this decision (most likely made with logistical constraints in mind), seems like a pretty prescriptive way to approach teaching a composition class. In the same way that I have student-sourced the creation of grading criteria in the past, I think a student-sourced markup scheme would allow a more complex conversation to arise around the structure of XML, writing conventions, and issues of authority. But even with a more student-centered approach, how do we get to the level of engagement that the scholars above (rightly) call for? This is a difficult issue for me, and one that I’m hoping my research will help me develop some practical pedagogical strategies to address.

Practicum #1: Markup and metadata

One thing that our group foregrounded when completing this practicum was the end goal. We tried to imagine why a scholar would want to access a corpus of reviews of a particular short story. Beyond simply wanting to gauge the critical reception of the work, we thought it would be useful to encode a few things: other works and authors that are discussed, religious references, place names (real and fictional), and one instance of a reference to laws (“Jim Crow”). In a lot of ways, these were very obvious things (i.e. mostly proper nouns) that one would expect an encoded corpus to include. Then we started to think about what we could say about these things. What research questions could we imagine having about the corpus? How could we leverage the specificity and granularity of encoding to think about the relation of the references, critics, publications, and object of analysis?

This led us to try to encode parts of speech when they were in reference either to the work (“The Celestial Railroad”), or to other works or references discussed in the review. We intended this as a kind of sentiment analysis, so we focused particularly on adjectives and verbs surrounding discussions of the principle work and other works referenced. While one can imagine an interesting sentiment map emerging from an entire corpus of reviews–perhaps even mapped to the political affiliations of the publications in which the reviews were published–the sheeer amount of labor required to accomplish this may be prohibitive. Further, it would be difficult, assuming a collaborative approach, to formalize the encoding, as determining sentiment is heavily interpretive (both in the specific case, and in deciding which words warrant encoding). So perhaps this was a misguided trial, but it did lead to some interesting conversations amongst the group.

From working on the Viral Texts project, I think I have a particularly first-hand view of the way the end user (or, imagined end user) shapes the interpretive process of applying metadata. Certain moments in the project raised questions that caused radical revisions of our tagging taxonomy (notably when Prof. Cordell actually wrote an article using the database interface). These were moments that helped us think about how scholars would actually interact with the site. Though textual encoding is a totally different method, this focus on end user was similarly foregrounded in the practicum.

And I’m still mucking around with TEI Boilerplate for fun. I’ll update this post once I get a working version hosted.

UPDATE 10/04/2014:

I’ve got the TEI document hosted on my site and styled with the TEI Boilerplate. It allows for very easy styling by declaring <rendition> elements with your desired styling that can then be referenced as attributes on any element in your document. For instance, I declared

<rendition xml:id=“b” n=“teibp:bold” scheme=“css”>

then applied this attribute to every <persName> element like so:

<persName xml:id=“Hawthorne” rendition=“#b”>Hawthorne</persName>

Simple as that. I stole most of the styles from their demo document, but also modified some for my purposes. One other handy thing is that TEI boilerplate has a built-in mechanism for including facsimile pages in your <pb /> elements using the @facs attribute. Super easy!

Our lightly styled TEI document can be viewed here:

Pointing-to: On genealogies, ‘what is DH?’, and the generative encounters of stating rules

I’ve been thinking about disciplinarity, and how that maps onto the DH genealogy approach we took to the start of the class. Specifically, why is this approach novel? And why are ‘What is DH?’ articles so ubiquitous that to not include any of them on a DH course syllabus is worthy of  a tweet? And, if the backlash to this genre is in full swing, as I think it is, what are more productive ways of talking about disciplinarity in DH?

As Tom Scheinfeldt argues, focusing on the diversity of DH (rather than our connectedness) can be generative,

I believe the time has come to re-engage with what make us different. One potentially profitable step in this direction would be a continued exploration of our very different genealogies … In the end, I believe an examination of our different disciplinary histories will advance even our interdisciplinary purposes: understanding what makes us distinctive will help us better see what in our practices may be of use to our colleagues in other disciplines and to see more clearly what they have to offer us.

For Scheinfeldt, the diversity can be traced through a genealogical approach. In Alan Liu’s article “Imagining the New Media Encounter,” he describes the messy and generative encounters of new media/methods and old media/methods,

We thought we knew what “writing” means, but now “encoding” makes us wonder (and vice versa). So, too, “reading” and “browsing” (as well as related activities like searching, data-mining, and data-visualization) destabilize each other.

Can we think of this destabalization not just in encounters of new and old media, but also in encounters of the disparate disciplines and intellectual traditions colliding together in an amalgam we call DH? This is similar to the ‘productive unease’ which Flanders seeks to highlight; “productive: not of forward motion but of that same oscillating, dialectical pulsation that is the scholarly mind at work.” Flanders sees this unease also around the creation and manipulation of digital models for literary study:

The word verification stands out here, sounding very cut and dried, threateningly technical, a mental straitjacket, but in fact the key phrase there is “the rules we have stated”: it is the act of stating rules that requires the discipline of methodological self-scrutiny … [A]s our tools for manipulating digital models improve, the model stops marking loss and takes on a clearer role as a strategic representation, one which deliberately omits and exaggerates and distorts the scale so that we can work with the parts that matter to us.

I want to focus on this idea “stating rules” in a way that one can then point to them–these are the rules, this is my methodology. It seems to me that the ubiquity of “What is DH?” articles comes from an impulse to circumscribe the field in a way that one can then point to something and say, “This is DH,” and, of course, “This is not DH.” I think that the project of these articles–that is, circumscribing the field in a particular way is a way of self-consciously stating rules–is helpful in staging the encounters Scheinfeldt, Liu and Flanders imagine.

I am reminded of Stephen North’s iconic 1987 book, The Making of Knowledge in Composition: Portrait of an Emerging Field. Controversial upon its release–and no less controversial today–North’s portrait of Composition is a legitimizing text. The groupings North enforces (with all the authority of Theuth displaying his inventions to Thamus) serve to tie together disparate intellectual traditions and especially methodologies–from the critical, historical and philosophical scholars; the experimentalists drawing from research in cognitive psychology; to the ethnographers appropriating anthropological methods–into a single field, big ‘C’ Composition. The benefit of North’s text wasn’t so much that he was right about everything, but rather he concretely circumscribed the field and opened a decades-long debate over the genealogy and geography of the field.

Unsurprisingly, North’s portrait is spatial, organized as a map, complete with bustling cities and lonely frontiers. One can imagine a similar map of DH–with the manifold genealogical threads of DH leading to countries of encoders, network theorists, archivists, media archaeologists, etc. etc. Indeed, this seems to be what the “What is DH?” And like North’s map of Composition, the spatial distinctions between the territories would elide the overlap between methods and interdisciplinarity of actually-existing DH projects. But it would also state rules, and thus self-consciously adhere to some methodology for stating those rules. It would then be open to (hopefully) generative critique.

And pointing-to is important when we think of how the field is perceived by the academy at large. This seems to be part of the impetus behind the more radical restructuring of the university imagined in Liu’s “theses” to an emerging DH center, or Hayles’s and Pressman’s Introduction to Comparative Textual Media, “Making, Critique: A Media Framework.” Any restructuring of the academy requires capital–both material and cultural/political–that requires a legitimacy and concreteness that comes from pointing-to.

As a brief aside, I’d also like to think about how we can go in the other direction–mapping a discipline with DH methods. Two interesting projects from Jim Ridolfo (assistant professor of Writing, Rhetoric, and Digital Studies at the University of Kentucky) come to mind. The first, his super-useful Rhetmap, which started as a simple mapping of PhD programs, but has extended to map all Rhet/Comp job listings for the year (those published in the MLA JIL). Ridolfo also uses the MLA JIL in the second mini-project I want to mention, #MLAJIL Firsts. Here, Ridolfo uses OCR to look at the history of MLA job listings to track the first instances of some disciplinary keywords: rhetoric (1965), rhetoric and composition (1971), computer (1968), computers and writing (1990), humanities computing (1995), digital rhetoric (2000), and digital humanities (2000). Ridolfo’s analysis here doesn’t extend much beyond this list, but he does make the OCR data available on his site in hopes that the data will be “useful for learning more about job market histories in English studies.” Indeed, I think there is the possibility to use this kind of concrete historical data to think more about the diverse genealogies–the trunks of the DH tree.