Posts Tagged ‘computinghumanities’

Day Three (Turning into a Test Match)

Grumpy post.

The mediocrity of some thinking is just, well, either shameful, depressing, or lazy. Perhaps all three. A great question from the floor using Winnicott to those who talked about virtual worlds and avatars and if we blur real and this other thing are there ethical questions. The answer is obviously yes (does anyone remember the rape in LambdaMOO?) and not only about data and so on. Conversation has picked up a bit, but the ethical issues aren’t just about use of my data but it’s my body in there.

Middle Data

This literal lift from piece I wrote as part of the Living Archive project I am involved in:

big data

If you do anything that falls into the current catch-all that is the digital humanities it is hard to avoid it at the moment. Big data means, more or less, humanities projects that want to use data sets, that are big? Already notice the change. Humanities scholars and something called a ‘data set’. Not, some books, films, paintings. And then this data set is to be big. How big? Science big (well, not really as their big data sets make humanities data sets look small).

The thumbnail sketch of this is often via Moretti’s ‘distant reading’, where instead of closely reading a small set of texts (the novels of Dickens, the films of Ford, or even a sample of romance novels or westerns) you develop methods to analyse enormous numbers of novels. As a consequence the questions you ask are different, and also, once appropriately digitised, various pattern matching exercises can be performed simply to see what there is. So, instead of looking at 20 novels in your research, you study, say, 20 million.

This requires infrastructure – bandwidth, storage, tools, labs, boffins and geeks. Heady stuff. And most of all, it needs money. Politically, in some sectors of the research/university universe this is doubly attractive, simply because our performance is routinely measured not just by publications and successful grants, but the size of these grants. Therefore a single big data humanities funded project becomes particularly attractive simply because it means against your institution’s research criteria you are plainly being more successful – for better or worse, more research income always trumps less, regardless of outcomes (well, up to a point).

Manovich’s lab is into big data, and he’s written some useful pointers and cases studies from his lab (for example his Vertov essay, and the chapter in Berry’s anthology. ). Because of the scale of things visualisation becomes significant in this domain, and so there is a lot of highly skilled and interesting work that lies at the intersection of computing, digital humanities, visualisation, and then what questions to ask.

middle theory

This was Bordwell and Carroll’s shot across the bows of big theory cinema studies. Grand theory, in their critique, was continental inspired work that presumed to have a theoretical framework that was all encompassing, which was then wheeled out and placed over cinema (films, audiences, institutions, it didn’t really matter) but disregarded the specificity of the objects under study and historical practice. At times their description of ‘grand theory’ risked caricature (Sinnerbrink’s New Philosophies of Film covers this material well), and as a card carrying member of some forms of grand theory I am unwilling to dismiss it in quite this manner. However, what they do offer is what they describe as ‘middle theory’ which is specifically grounded in smaller, localised samples that looks at (well, so they’d like to think) what is, and then theorises about that. (Actually I don’t think Bordwell and Carroll are nearly as elegantly hermeneutic as that, but that’s by the by here.) What I like though is the intent and inclination here. You look closely, not naively or without some opening questions or gambits you’re wanting to wonder about, but it is a much more open process than that proffered by high theory where the risk, in term of the object of study, is that the theoretical framework becomes tautologically validated using the film as evidence, rather than the other way around. It is a method that in some ways can be thought to put the (romantic?) desire to understand the films (in the case of Bordwell and Carroll) as first, rather than the doing of theory as an end.

middle data

So between the enormity of big data, and the traditional intensely close reading of what we might as well describe as small data (the westerns of Ford, the tragedies of Shakespeare), we can see the rise of what I would like to describe as middle data. What is middle data? It is a methodological field, not an algorithmic process (for those that recognise it, this is from Barthes’ “From Work to Text” as a nod to the importance of grand theory to my own work) which means that it lies between the sorts of new practices that emerge when we apply novel digital techniques to things that can be treated as data using what I’d like to think of as more traditional propositions.

The living archive project is, pretty much, a middle data project. The data set is small, and focussed, though larger than what you’d ordinarily look at in depth for the usual close reading. It is not large enough to be capital ‘B’ big data, yet since it is thoroughly digital in its methods and how it has been thought of in itself as an inherently digital system it has been made in a way that enables and facilitates a rich variety of other research methods or propositions. By inherently digital I mean the project, from inception, was never conceived of as merely the digitisation and dissemination of an analogue collection onto the web, but looked to use the affordances of the digital in concert with the network to enable, invent, test, play with, other sorts of possibilities.

Some of these methods are quite simple, for example the ability to curate clips (acts) from performances into novel series outside of the shows in which they appeared through informal tags and curation into collections. This makes it reasonably simple for a scholar to collect all acts of a certain type into their own collection, to perhaps compare changes or even similarities in performance across the history of the circus, or perhaps to look at the history of costume, performance style, gender and performance, or public sphere politics, physical circus and performance.

However, it is more interesting than being able to curate parts into collections because the overall digital system has been built as a platform which means it provides APIs to let new and different interfaces to the data been easily written. This means it is a relatively lightweight (and agile) task to write a different interfaces to interrogate the data in different ways and to present the outcomes of this in different ways. For instance, PhD candidate Reuben Stanton spent only a few hours to develop an interface to retrieve and then visualise how many shows are currently public versus how many are private. This visualisation, which is currently known within the project team as ‘the iceberg’ dynamically draws a time line and places public shows above the ‘waterline’ and private below. While pragmatically useful to simply indicate, and encourage, further marking up of the nonpublic shows so that they can be made public by Circus Oz, it also indirectly reveals other things.

circusoz Timeline 01.jpg

For instance, there are few shows in the early years, then a peak of recorded shows, and then a more recent decline. The recent decline can be partly explained by a backlog of digitisation and upload, but it could also be speculated that the preserved record appears like this because video becomes an available (though expensive) technology, with an early spike of interest and use. It is not, though, very simple or straightforward to use, and so is not used regularly after initial enthusiasm. (Of course there is also the obvious point of trying to have preserved the original media for so long in now obsolescent formats means that less will be usable.) The rise could respond to the development of new formats like Digital DV, where tape length is mow much longer and more suited to recording a show, storage is cheaper and simpler (tape costs per hour plummeted at this time), and so it becomes easier, cheaper, and less disruptive to record a show. The tapes are physically much smaller and it is literally a simple task to film and then stick the unedited tape into a draw. Finally the more recent decline might reflect the move to a completely digital mode where, to begin with, it was trivial to record a lot (and we all did) but then storage of non–tape media (we could call it non physical but that isn’t really accurate) on hard drives was informal and may have been stored willy nilly then erased, lost, forgotten, deleted and otherwise and indirectly treated as emphemeral.

This is speculative, and none of it may be correct. However, the ability to interrogate the data through different questions and to visualise this, simply and quickly (in this case how many shows are there?, from when? which are public? and draw this) means the system and research project supports the ability to easily develop novel questions or propositions utilising the data. This means there is an agility to how the system can be utilised in addition to its explicit role to provide access to the company and the public to the record of performance of Circus Oz.

At the moment I like to conceive of this as ‘middle data’ as it utilises computational analysis, a constrained data set, and visualisation to facilitate varieties of close analysis (small data?). It is a method that implicitly requires and relies upon the digital and computational (it is not just digitisation but is also varieties of analysis and calculation enabled by the computer) to discover patterns to undertake closer analysis’s of our object of study.


For example, in 2000 a colleague and I developed a film annotation system (the gloriously named Smil Annotation Film Engine, aka SMAFE) which allowed me to add in and out markers (timecode) to an entire film and to then extract these shots and sequences in various ways and to show the results. The film was John Ford’s 1956 western The Searchers and I marked up the film manually around the presence of doorways. This was based on an intuitive hunch I had, where in Ford’s work there is a clear distinction made and used about the inside and outside of the home and so I believed doorways may be a significant though unrealised elsment of the film’s miss-en-scene. Simply viewing the film, and trying to notice doors, is one method. However, once marked up the system then allowed me to search on specific criteria (camera oustide, inside, looking in, looking out, and so on). A strong poetic pattern around doorways was immediately apparent, and visible. The point here is that the tool allowed the exploration of questions normally associated with close reading using methodologies that relied upon the affordances of the computational. This is a method that worked in concert, and so then supported a more developed close reading of the film – just how were doorways part of the film’s miss-en-scene? why? and what might this contribute to our understanding of what the film might be about?



Barthes, Roland. “From Work to Text.” Image–Music–Text. Trans. Stephen Heath. London: Flamingo, 1977. 155–64. Print.

Bordwell, David, and Nöel Carroll. Post–Theory: Reconstructing Film Studies. Madison: University of Wisconsin Press, 1996. Print.

Moretti, Franco. Distant Reading. 1st ed. Verso, 2013. Print.

Moretti, Franco. Graphs, Maps, Trees: Abstract Models for Literary History. Verso, 2007. Print.

Sinnerbrink, Robert. New Philosophies of Film: Thinking Images. 1st ed. Bloomsbury Academic, 2011. Print.

Manovich, Lev. “How to Compare One Million Images?” Understanding Digital Humanities. Ed. David M. Berry. London: Palgrave Macmillan, 2012. 249–278. Print.

Manovich, Lev. “Visualizing Vertov.” Russian Journal of Communication 5.1 (2013): 44–55. Taylor and Francis+NEJM. Web. 24 May 2013.

Archaries and Libchives

Libraries as ‘museums of marginalia’. Fascinating presentation by David Pearson, Director of Culture, Heritage and Libraries for the City of London Corporation about how marginalia shifts something that is more or less anonymous (my words) into an artefact of value. (Reminds me of an old project by computer scientist and ethnographer Cathy Marshall where she made a prototype annotation program for a laptop, her method involved buying heavily annotated second hand editions of text books at university book stores, and interviewing their purchasers, to model existing annotation practices. Was a seriously cool prototype, which, of course, went nowhere back then.) In relation to the Circus Oz living archive project this, for me, is highly suggestive as it is what we have been building. Which then begs the question of to what extent it is an archive, a library, or some intriguing emergent hybrid: an archary or libchive perhaps?

What is Social Video?

Another possible essay coming out of the Circus Oz archive project that I’m nutting and fidgeting over:

The major impediments to the creation and use of video online have, until recently, been understood to be technical, involving the high computational demands of video and its bandwidth requirements. These are no longer issues, and we have seen, in the last 5 years, an exponential explosion in the presence, role and use of video online. However, the majority of the services, platforms, protocols and uses of video online have retained an industrial model of what video is a format and media object. This has meant that in relation to video what has been emphasised is the making, dissemination and sharing of whole, singular, video objects. In this context Web 2.0 and social media systems and services have been built that have tended to place these services ‘outside’ of and around whole videos without troubling how we or what we think video as a practice, form or object might be as a genuinely digitally defined social media thing. This has possibly hindered the development of new media forms that might develop if the qualities of social media and Web 2.0 became part of the ‘inside’ of video. This paper will be a speculative investigation into what such a ’social video’ might be to propose theoretical prototypes that can then inform critical development of online video services and archives.

Archive Relations

Abstract for a possible paper that is coming out of the Circus Oz Living Archive project:

Recent work in digital archival practice and theory has had two major trajectories. The first has emphasised the digitisation of existing physical collections and so has been concerned with the development and application of appropriate protocols, including technical standards, metadata schemas, and appropriate preservation regimes. More recently this has evolved into an interest in the archival problems posed by ‘born digital’ artefacts and the development of relevant protocols for the preservation of these things. However, broadly within the field of digital archiving theory the concept of the artefact as a relatively autonomous object remains paramount, archives are collections of things and it is their thingness which the archivist labours to preserve. Things in an archive though, are and must be mute. In an archive it is the brute thingness of the objects that are to be preserved above all else. What they might mean, one day, is kept as the promise for why these things need to be preserved as things, but as a promise this always lies as a future before an archives’ things. In this way an archive can be considered to be the virtual (in Lévy’s sense) and what its objects come to be as the actualisation of this virtual. This means that archives, unlike other collections, are flat, all objects being equal in their muteness and possible future significance. This change, from mute thing to significance, that is the actualisation of its virtual promise, is always and can only be the result of the thing entering into external relations with what lies outside of itself. They are simply put into different contexts. The terms of these relations, what these contexts are, are always outside of the object and effect what Deleuze and Guattari have characterised as an ‘incorporeal transformation’ where the thing has changed as a consequence of these relations, but the object in itself has not. This means that we can think about an archive as about not its objects but the relations that they come to exist within. This also means we can think of these relations as objects in their own right, and so pose the question and problem of what an archive of relations would be, and what is a relation, when considered as an archival object. Finally, this also means we can speculate about a new, virtual archive, which treats the relations that happen between its objects as the subject and object of its archival and curatorial practices and regime, and what, if anything, the implications of this are.

Open Humanities Alliance

The Open Humanities Alliance is, as the name suggests, a coalition pushing, advocating and supporting the open humanities. Stepping out of the $ based publishing models that have defined the humanities towards richer, greener pastures. They have a journal incubator where if you want to turn an existing journal (that you or your organisation owns), or create a new one, that is open then they help you set it up and so on. This sort of support is a great idea. Which segues nicely into Peter Suber’s new book out of MIT Press on Open Access. Commercial academic publisher, and it is for sale (Kindle edition is ten bucks) and a timely publication.

Open publishing is a big deal. While I’ve regularly raised the contradictions involved in humanities academics being quick to critique ideology left right and centre our relation to our own academic ideologies, like all good ideology, remains naturalised and invisible. Of course we do the writing, the editing, and then pay for the journal that is the product of our largely donated labour. And of course the publisher makes money from this. Because today, with the internet thingie, we really do need access to a printing press, paper, typographers, ink, delivery trucks and a subscription office.

The open also matters in terms of access to collections, archives and the collections of services that we now have online. These days it is no longer just a question of having a pile of stuff that people might look at, but having an API that lets other services use these things, and, increasingly, let people make new stuff with this stuff. Which, when you think about it, isn’t so very far from the scholarly really, is it?

Publishers Spam

So unless you’ve been under the academic equivalent of a rock lately (aka admin, attending innovation workshops, or just well, teaching) you might have missed that we are finally getting up in arms over the fucked up nature of academic publishing. The problem is very simple, though apparently deeply intractable.

As academics we do research which we publish (most of us because we want to, but it also part of the job description). Since we then need somewhere to publish we publish in specialised journals, though these journals are often owned by large academic publishing companies who very rarely, if ever, pay for what we write and which they publish. Oh, we also do the reviewing of the work submitted, and also the editing. Then we or our institutions pay a lot of money to buy our labour back again (the journal). My personal favourite is when you find an article via a publisher’s site and you or your library doesn’t subscribe and they want, usually, anything from $20 to $50 for one copy of one article – and it’s already digital!

Once upon a time not very long ago we needed to do this because to disseminate your work you needed a printing press and a distribution network. Both were costly. Then public research money invented the internet, and we don’t need that private infrastructure anymore.

(I’ve written elsewhere here how this does my head in. I go to meetings where I am surrounded by people who can tell me that we’re subject to patriarchal, neoliberal and colonialist assumptions but the backwards obsequiousness kow towing that this represents just goes through to the keeper. Called me old fashioned but there is something about getting ones own house in order that might be relevant here.)

Anyway, back to the trail. So a nobel prize winning mathematician wrote a blog post pointing this out and indicating they would boycott a particularly rapacious publisher. It has snowballed. But I’m holding my breath. In the meantime Sage sent me this invitation:

Dear Adrian Miles,

If you are interested in reviewing journal manuscripts, we would like to invite your participation in SAGE Open, an innovative peer reviewed open access journal from SAGE. Manuscript reviews are an important part of the publication process. Reviewers gain valuable academic and publishing experience.

If you are interested in reviewing for SAGE Open please click here.

Potential reviewers should have published in a peer-review journal and should have current knowledge in their area of expertise. If you accept an invitation to review for SAGE Open you should be prepared to promptly return your review.

SAGE Open has received over 1,000 manuscripts in the last year. We also encourage you to submit your manuscript through SAGE Track, SAGE’s web-based peer review and submission system. Submitting your manuscript is free.

Only if your manuscript is accepted will you pay the author acceptance fee of $395 (discounted from the regular price of $695)! For more information, view the SAGE Open manuscript submission guidelines.

Now, some disciplines have journals that you do pay to be in, but not mine. So here we have a publisher wanting me to donate my labour to them for their publication. I donate further labour through reviewing, and then I pay $400 for the privilege. Um, this is so arse about as to be just embarrassing for all concerned. It is sort of dressed up as crowd sourced academia meets social media (except this has always been the modern academic model) except its purpose is precisely the income stream. Two words. Fuck off.

Sophie an Ebook Sort of Thing

From the site:

Sophie is software for writing and reading rich media documents in a networked environment. Initially designed and developed under the auspices of the Institute for the Future of the Book, Sophie is currently being significantly revised and improved, thanks to a generous grant from the Mellon Foundation in the fall of 2008. Sophie 2.0, with added features and improved stability, will debut October 15, 2009.

Version 1 as available reminds me of the Voyager Expanded Book Toolkit, which I think I’ve still got in a box somewhere, though of course now open source, network enabled and so on. Be nice to see what version two does and is the sort of thing that honours students could make use of.

Day of Digital Humanities

I’ve always toyed with a possible ‘day in the life’ project where you use digital tools to let a community document itself for a day. Capture the local paper, radio, set up places where people can scan whatever they like, make camera’s available. And provide a simple front end so the community itself can upload, annotate, tag, and self describe their own stuff. A snapshot. So it is with pleasure that I read an email from Geoffrey Rockwell describing the third annual day of the digital humanities.

International Handbook (2)

My brief post last week got comments from Andrew and the inestimable Jeremy H (one of the editors of the Handbook). Given how rarely I have been writing here, and also how the writing that has happened hasn’t amounted to much, sort of surprised that anyone noticed. Which reminds me, I should just turn off comments. Yes, they can massage you. Yes, you can even get some sort of conversation started. But I’m all for flat rhizomes. Comments make little toy peaks, not links.

So, Jeremy pointed out the model is that it is a reference book, so not really for personal consumption or purchase. The business model (which after all, is surely what we have to call this) is that your university library basically buys a licence, and then you can freely pull what you want into your courses. (That probably won’t work here, without quite a lot of detailed correspondence, as copyright is very strictly enforced via or online delivery engine. If person X chooses a chapter from an anthology I can also use it, but if I want a different chapter from the same anthology the answer is no.) Andrew, on the other hand, also raises the point that there is not enough information there to actually tell what your class would get access to. Is it a pdf of the chapter? Or as he wonders “will [it] be one of those annoyingly impossible to read ebooks online that publishers seem keen to foist on libraries”.

The bit I struggle with, but this Handbook might actually be different is:

  • academics write the content (generally for free)
  • academics have to review the work to ensure it is of a sufficient standard (generally for free)
  • academics then have to purchase the work to actually use it

Now, when we didn’t have electronic networks and books were expensive to make (capital costs of access to printing machines and technologies, proofing required manual changes to things, and so on) publishers were essential intermediaries. Much like the role of tv. radio stations, or cinemas once upon a time. But this book does not need to be printed, in fact it is premised on not being printed. We also know that we could host this material (all peer reviewed and so on, so the same quality of material) in something as simple as HTML or pdf on a web site and let anyone use it. For free. So what does a publisher add here? The answer is simple, and sad. It adds an external metric of validation so that we, as academics, can claim publication by an academic publisher. That this imprimatur makes it count, as if the publisher actually establishes a legitimate benchmark. But it doesn’t. It is the academics who do the review that do this. Publishers are in the business of making money. If they think the title will sell, they publish. That comes first, above all else (which is why education texts, ie text books, matter for them). It is venture capitalism and so has little, at root, to do with research value, impact or quality. There are so many other ways this could be done, today, yet we remained mired in the hegemony not of the book (this is not, really, a ‘book’) but of publishers. We do this voluntarily, willingly, slavishly, yet so many of us will critique the hegemonic sway of [insert preferred ideological title here] in [insert preferred genre, style or media form here], while remaining blind and indifferent to our own, similar practices. Maybe it is my middle aged realisation that my career is not what I thought it would be, and that this is it, that makes a difference for me. I intend to write and publish where my ideas are best suited as themselves, whether that be a minor non-ranked journal, or the ERA ranked kick ass most important journal in the universe.

It is a great looking collection, that Handbook of Internet Research.