Middle Data

This literal lift from piece I wrote as part of the Living Archive project I am involved in:

big data

If you do anything that falls into the current catch-all that is the digital humanities it is hard to avoid it at the moment. Big data means, more or less, humanities projects that want to use data sets, that are big? Already notice the change. Humanities scholars and something called a ‘data set’. Not, some books, films, paintings. And then this data set is to be big. How big? Science big (well, not really as their big data sets make humanities data sets look small).

The thumbnail sketch of this is often via Moretti’s ‘distant reading’, where instead of closely reading a small set of texts (the novels of Dickens, the films of Ford, or even a sample of romance novels or westerns) you develop methods to analyse enormous numbers of novels. As a consequence the questions you ask are different, and also, once appropriately digitised, various pattern matching exercises can be performed simply to see what there is. So, instead of looking at 20 novels in your research, you study, say, 20 million.

This requires infrastructure – bandwidth, storage, tools, labs, boffins and geeks. Heady stuff. And most of all, it needs money. Politically, in some sectors of the research/university universe this is doubly attractive, simply because our performance is routinely measured not just by publications and successful grants, but the size of these grants. Therefore a single big data humanities funded project becomes particularly attractive simply because it means against your institution’s research criteria you are plainly being more successful – for better or worse, more research income always trumps less, regardless of outcomes (well, up to a point).

Manovich’s lab is into big data, and he’s written some useful pointers and cases studies from his lab (for example his Vertov essay, and the chapter in Berry’s anthology. ). Because of the scale of things visualisation becomes significant in this domain, and so there is a lot of highly skilled and interesting work that lies at the intersection of computing, digital humanities, visualisation, and then what questions to ask.

middle theory

This was Bordwell and Carroll’s shot across the bows of big theory cinema studies. Grand theory, in their critique, was continental inspired work that presumed to have a theoretical framework that was all encompassing, which was then wheeled out and placed over cinema (films, audiences, institutions, it didn’t really matter) but disregarded the specificity of the objects under study and historical practice. At times their description of ‘grand theory’ risked caricature (Sinnerbrink’s New Philosophies of Film covers this material well), and as a card carrying member of some forms of grand theory I am unwilling to dismiss it in quite this manner. However, what they do offer is what they describe as ‘middle theory’ which is specifically grounded in smaller, localised samples that looks at (well, so they’d like to think) what is, and then theorises about that. (Actually I don’t think Bordwell and Carroll are nearly as elegantly hermeneutic as that, but that’s by the by here.) What I like though is the intent and inclination here. You look closely, not naively or without some opening questions or gambits you’re wanting to wonder about, but it is a much more open process than that proffered by high theory where the risk, in term of the object of study, is that the theoretical framework becomes tautologically validated using the film as evidence, rather than the other way around. It is a method that in some ways can be thought to put the (romantic?) desire to understand the films (in the case of Bordwell and Carroll) as first, rather than the doing of theory as an end.

middle data

So between the enormity of big data, and the traditional intensely close reading of what we might as well describe as small data (the westerns of Ford, the tragedies of Shakespeare), we can see the rise of what I would like to describe as middle data. What is middle data? It is a methodological field, not an algorithmic process (for those that recognise it, this is from Barthes’ “From Work to Text” as a nod to the importance of grand theory to my own work) which means that it lies between the sorts of new practices that emerge when we apply novel digital techniques to things that can be treated as data using what I’d like to think of as more traditional propositions.

The living archive project is, pretty much, a middle data project. The data set is small, and focussed, though larger than what you’d ordinarily look at in depth for the usual close reading. It is not large enough to be capital ‘B’ big data, yet since it is thoroughly digital in its methods and how it has been thought of in itself as an inherently digital system it has been made in a way that enables and facilitates a rich variety of other research methods or propositions. By inherently digital I mean the project, from inception, was never conceived of as merely the digitisation and dissemination of an analogue collection onto the web, but looked to use the affordances of the digital in concert with the network to enable, invent, test, play with, other sorts of possibilities.

Some of these methods are quite simple, for example the ability to curate clips (acts) from performances into novel series outside of the shows in which they appeared through informal tags and curation into collections. This makes it reasonably simple for a scholar to collect all acts of a certain type into their own collection, to perhaps compare changes or even similarities in performance across the history of the circus, or perhaps to look at the history of costume, performance style, gender and performance, or public sphere politics, physical circus and performance.

However, it is more interesting than being able to curate parts into collections because the overall digital system has been built as a platform which means it provides APIs to let new and different interfaces to the data been easily written. This means it is a relatively lightweight (and agile) task to write a different interfaces to interrogate the data in different ways and to present the outcomes of this in different ways. For instance, PhD candidate Reuben Stanton spent only a few hours to develop an interface to retrieve and then visualise how many shows are currently public versus how many are private. This visualisation, which is currently known within the project team as ‘the iceberg’ dynamically draws a time line and places public shows above the ‘waterline’ and private below. While pragmatically useful to simply indicate, and encourage, further marking up of the nonpublic shows so that they can be made public by Circus Oz, it also indirectly reveals other things.

circusoz Timeline 01.jpg


For instance, there are few shows in the early years, then a peak of recorded shows, and then a more recent decline. The recent decline can be partly explained by a backlog of digitisation and upload, but it could also be speculated that the preserved record appears like this because video becomes an available (though expensive) technology, with an early spike of interest and use. It is not, though, very simple or straightforward to use, and so is not used regularly after initial enthusiasm. (Of course there is also the obvious point of trying to have preserved the original media for so long in now obsolescent formats means that less will be usable.) The rise could respond to the development of new formats like Digital DV, where tape length is mow much longer and more suited to recording a show, storage is cheaper and simpler (tape costs per hour plummeted at this time), and so it becomes easier, cheaper, and less disruptive to record a show. The tapes are physically much smaller and it is literally a simple task to film and then stick the unedited tape into a draw. Finally the more recent decline might reflect the move to a completely digital mode where, to begin with, it was trivial to record a lot (and we all did) but then storage of non–tape media (we could call it non physical but that isn’t really accurate) on hard drives was informal and may have been stored willy nilly then erased, lost, forgotten, deleted and otherwise and indirectly treated as emphemeral.

This is speculative, and none of it may be correct. However, the ability to interrogate the data through different questions and to visualise this, simply and quickly (in this case how many shows are there?, from when? which are public? and draw this) means the system and research project supports the ability to easily develop novel questions or propositions utilising the data. This means there is an agility to how the system can be utilised in addition to its explicit role to provide access to the company and the public to the record of performance of Circus Oz.

At the moment I like to conceive of this as ‘middle data’ as it utilises computational analysis, a constrained data set, and visualisation to facilitate varieties of close analysis (small data?). It is a method that implicitly requires and relies upon the digital and computational (it is not just digitisation but is also varieties of analysis and calculation enabled by the computer) to discover patterns to undertake closer analysis’s of our object of study.

searching

For example, in 2000 a colleague and I developed a film annotation system (the gloriously named Smil Annotation Film Engine, aka SMAFE) which allowed me to add in and out markers (timecode) to an entire film and to then extract these shots and sequences in various ways and to show the results. The film was John Ford’s 1956 western The Searchers and I marked up the film manually around the presence of doorways. This was based on an intuitive hunch I had, where in Ford’s work there is a clear distinction made and used about the inside and outside of the home and so I believed doorways may be a significant though unrealised elsment of the film’s miss-en-scene. Simply viewing the film, and trying to notice doors, is one method. However, once marked up the system then allowed me to search on specific criteria (camera oustide, inside, looking in, looking out, and so on). A strong poetic pattern around doorways was immediately apparent, and visible. The point here is that the tool allowed the exploration of questions normally associated with close reading using methodologies that relied upon the affordances of the computational. This is a method that worked in concert, and so then supported a more developed close reading of the film – just how were doorways part of the film’s miss-en-scene? why? and what might this contribute to our understanding of what the film might be about?

TheSearchers.jpg

references

Barthes, Roland. “From Work to Text.” Image–Music–Text. Trans. Stephen Heath. London: Flamingo, 1977. 155–64. Print.

Bordwell, David, and Nöel Carroll. Post–Theory: Reconstructing Film Studies. Madison: University of Wisconsin Press, 1996. Print.

Moretti, Franco. Distant Reading. 1st ed. Verso, 2013. Print.

Moretti, Franco. Graphs, Maps, Trees: Abstract Models for Literary History. Verso, 2007. Print.

Sinnerbrink, Robert. New Philosophies of Film: Thinking Images. 1st ed. Bloomsbury Academic, 2011. Print.

Manovich, Lev. “How to Compare One Million Images?” Understanding Digital Humanities. Ed. David M. Berry. London: Palgrave Macmillan, 2012. 249–278. Print.

Manovich, Lev. “Visualizing Vertov.” Russian Journal of Communication 5.1 (2013): 44–55. Taylor and Francis+NEJM. Web. 24 May 2013.

Tags: , ,

Archaries and Libchives

Libraries as ‘museums of marginalia’. Fascinating presentation by David Pearson, Director of Culture, Heritage and Libraries for the City of London Corporation about how marginalia shifts something that is more or less anonymous (my words) into an artefact of value. (Reminds me of an old project by computer scientist and ethnographer Cathy Marshall where she made a prototype annotation program for a laptop, her method involved buying heavily annotated second hand editions of text books at university book stores, and interviewing their purchasers, to model existing annotation practices. Was a seriously cool prototype, which, of course, went nowhere back then.) In relation to the Circus Oz living archive project this, for me, is highly suggestive as it is what we have been building. Which then begs the question of to what extent it is an archive, a library, or some intriguing emergent hybrid: an archary or libchive perhaps?

Tags: , ,

Sometime Around 7am

dawn.jpg

Recording media, all recording media, as sampling machines. A camera takes a single sample, of a more or less contracted instant. A film camera takes 24 visual samples per second. An analog sound recorder makes a continuous sample of a microphone’s diaphragm, a digital sound recorder samples 44,100 times per second (that just does my head in), while with digital video we measure sampling usually, like the film camera, in terms of frame rate.

In all cases the technology of recording is indifferent to what it records. A camera, microphone, film stock, SD card, or lens doesn’t get more interested because something exciting is happening. The operator might, or indeed does and that is why the recording machine is turned on to sample in the first case. But the technology itself, the machine, just samples, usually strictly and regularly.

I am using this as a basis for a new small speculative project. Each morning, somewhere around 7am, I stand at the same point in my front yard and film the ridge over the way. I am using vine because while it records H.264 compliant video at 30 frames per second (so a specified sampling rate) it imposes a second order sampling constraint where each clip is limited to six seconds. I am, in this project, turning myself + Vine into a sampling machine where the sample happens at a specific time – 7am daily, and is geo constrained (if I’m not at home a sample will not be made).

The project, tentatively and imaginatively called “sometime around 7am”, is a digital video materialist poetics where I am become a sampling relay and instantiate the same sampling role as a media recording machine. The time frame is enlarged (six seconds every 24 hours or so), but that just shifts it from the mechanical, to the digital, to the human.

Tags: , ,

The Questions

As we near the end of another semester of a subject that revolves around network literacies, online video, multilinearity, and, well, making as strange as I can the world for some students, I read the usual (and inevitable) complaints about Korsakow. Why use it? Why use something that isn’t ‘industry standard’? Why use something we won’t use again?

Some Quick Answers

  • In an emerging field (new media, internet practice, network specific media practice – that isn’t merely naive) that is being invented and debated while we teach, what, exactly, could ‘industry standard’ actually mean?
  • If you want merely industry standard (an industry by the way that is throwing money every which way as it tries to figure out how to save itself in the face of fundamental change to media making, consumption, use, the audience, advertising, that is, the way it was) then you’re confusing technical education with university (this is as much the fault of the university as anybody else’s as we trumpet ‘industry ready’, ‘real world relevance’, and ‘work integrated learning (which regularly risks being a fancy term for work experience like you might have done in High School) as our features).
  • Experimental practice often uses experimental methods, which often needs experimental tools
  • It’s cheap
  • It’s really easy to learn so we can spend weeks thinking about multilinear structure, design, and experience, and not weeks learning How to write a script so a bloody button can change colour because someone clicked on it
  • Network specific practice, that is making in the network rather than making off it and using it merely as a mute publishing vehicle, is about relational media
  • And Korsakow is very good for learning about relational media
  • Ever tried a new food? Drink? Experiernce? Didn’t like it? Does that mean you won’t try a new taste, drink, experience ever again? I’m serious, what’s with a culture that on the one hand embraces the ephemeral and transitory, yet can’t see value in just playing with something just for the experiment of playing with it?

That’s enough for now. This year though, for the first time, I’ve realised that by the end of semester most of these questions dressed up as complaints, well, no, they’re complaints masquerading as questions, that most of these complaints come from the students who haven’t come to the lectures and often not the labs. Those that come, the questions stop. (This could be because they just give up in the face of my stubbornness, which would well be the case.)

Tags: , ,

Documentary, Innovation, Futures

Documentary, like design, is a future orientated practice. It’s intent, even when dealing with ‘history’, is to effect change ahead of itself. As a result of this documentary as form (what it looks like) and practice (how it is made) has in general always been more innovative and experimental than fiction. I think for this reason, as film making catches up to what we can now do online, all the big changes are happening in documentary rather than fiction. For example there is the idoc project out of Britain, the Open Documentary Lab at MIT, and the IDFA Documentary Lab (Netherlands). Then there are the recent rise of new tools, including new versions of Korsakow in the offing, as well as popcorn, Klynt, Zeega, and W3Doc. So these are all new, but they definitely show that this field is about to take off, so something small scale and personal, such as Korsakow, is a good entrée to this stuff. This is also why we’ve worked predominantly in nonfiction. Nonfiction (documentary) is where this stuff is really gaining purchase.

Tags: , ,

Starting, Started

scrivenertarget.jpg

I have a book chapter that is due at the end of this month. I had originally proposed to write it around wondering what ‘social video is’. Something I’ve been wanting and meaning to write for a couple of years and this was an opportunity to do it. Then things changed a bit, partly because the chapter was to be a conference presentation and so on, and so I wanted to write it around another idea I’ve been mulling around relationality, archives, and assemblage. In other words I got bored with one and wanted to do the other. I hadn’t written the first, so really, until I got into it it really is a mistake to think there isn’t much there. (Often the sense that it is boring conceals the actual difficulty and significance of the work to be done.)

So late in the piece I sent through the new proposal, and it got the nod. Except it is late May and the chapter is due at the end of May. At times like this the stress and anxiety of having to start, and finish, work that is going to end up in a book is very high. So I then invent other ways of not starting. It doesn’t take much. Then, at some moment, I realise that I have to do it, that after years of repeating this cycle (where I also know that once I start writing it will generally go quite well) I really have to take some responsibility for myself. So I start. I just begin writing, and then in [Scrivener](http://literatureandlatte.com/) put in the target (7000 words), and give myself until the end of the first week of June to finish, and let Scrivener work out the rest. 400 words a day, that’s it. The relief is enormous, particularly when I realise that I’ve already written 500 words in a couple of sessions.

On the other hand, once the editors said yes to my new proposal, all of a sudden the original one looked interesting again. Something here perhaps about the grass always being greener? Also about one of the methods I use to not actually do something? Perhaps. Probably.

Things to notice here.

* I granted myself a short extension. Because I know the editors will nod their heads, I know that others will be late (academics who submit work on time are less common than those who will be late), I know that if I ask for the time they will give it as editors much prefer knowing that it is coming rather than wondering where it has got to, and that they also said yes quite late in the piece.
* That I procrastinated, deferred, and that the longer this went on for the more extreme the anxiety (and guilt) became. It is easy for this to become debilitating (in my own case I once did this for a major anthology to the point where at the 11th hour I apologised and withdrew, I haven’t been asked again).
* That this anxiety is quite normal, and healthy, but if you leave it to grow it is not.
* That just breaking it down into smaller parts makes something big (bloody hell, 7000 word chapter, where to start, how), manageable and approachable.
* That you can imagine something isn’t worth doing, but the test is in doing it.

Tags: , ,

Archives, Relations, the Sensory Motor Schema

An archive is a collection policed by archivists. An archive is a collection haunted by the rigours of integrity. An archive is, at heart, a closed institution.

An archive is usually thought to be made up of things, the objects that it is an archive of. It is the presence of these things that constitutes the archive as an archive. However, archives secretly aspire to be more than just this lump of things access and use come to matter. To be usable the things in an archive need to be thought of as empty or mute so that they can come to be used. That is, a minimal amount of constraining context is provided, always loosely, so that the things in the archive can more easily be placed in other contexts. This is not what has happened online.

Things in an archive could have any number of possible relations to other things and those that are deemed to matter (whether historical, political, social, cultural, contextual or merely contingent) will express a reduction or lessening of these relations amongst all those possible. This is Deleuze and Guattaris rhizomatic rule of n !1.

This means the attribution of relation to the things in an archive is always a reduction, not an addition, to what it could be.

Relations are of interest to archival thought because relations are, by definition, external to or outside of the things themselves. This means they are not properties of the thing, but are bought to bear upon the thing. This also suggests an archive can be thought to be less about the things it contains than about the possible relations that can be facilitated around these things.

Online the model that has developed is different to the usual conception of the archive because it is user, not artefact, centred (YouTube, Flickr, Cowbird). Here user centred means the archive is conceived of as a system to let individuals archive their practice (through their use of media which the vehicle to document practice, that it involves media is secondary not primary). This rapidly evolved into collecting, curating, cataloguing and collaborating content. Here the archive demonstrates the key networked attributes of granularity, porousness, and facets. So, can we conceive of the archive, in general, as consisting of open and flat things (a flat ontology) and the archive in itself as the system of relations it enables? Something like lego bricks? An archive as then an architecture for possible relations?

As a system of relations, and even possibly systems of systems of relations, online archives as web services are less an archive of what was than a performing of the everyday through their media traces. This also means they have qualities of the factual and the nonfiction as informal documentary trails.

For example, a system such as Cowbird offers nonfiction tableau. When each is machine connected they enter into emerging, variable and fuzzy series. These series are not intentional in an authorial sense, at best it is a programmatic intentioning.

Platforms such as these (and they offer a compelling template for the sorts of archives that are network based) let small pieces be crafted into other things. These series that they form are not stories. At best they can be a constellation of stories, though I think that is being generous there is nothing intrinsic to these procedures to mean that they are first of all narrative. Instead, narrative is a consequence of programmatic procedures, not the other way round (so small parts can be collected from people and these can be assembled into stories, but the small parts themselves do not need to be narrative)

They are then ergodic and cybertextual assemblages. As Anderson and McFarlane argue:

Assemblage is a term often used to emphasise emergence, multiplicity and indeterminancy, and connects to a wider redefinition of the sociospatial in terms of the composition of diverse elements into varieties of provisional sociospatial formation. To be more specific, assemblages are composed of hetergeneous elements that may be human and nonhuman, organic and inorganic, technical and natural. (Anderson and McFarlane)

How can we think critically and theoretically about these sorts of things? By going sideways. Deleuze describes the cinema as a particular system of archival assemblage enginesthinking. This system of thought relies on Bergsons sensory motor schema where some things are understood to perceive, decide, and act. These perceiving things constitute themselves as a centre from which some things get noticed, and others dont, and what gets noticed becomes the source of what is decided to be acted upon, what is decided to be done. There is a gap, or interval, between what is noticed and what is decided to be done in response to this noticing, and this gap, because it introduces variability and choice in relation to what could be done, is thought of as a centre of indetermination.

In the case of the cinema this gives rise to the perception, affect, and action images. These three terms provide an elegant framework by which to understand online works as you and/or the system notices, decides, and then does. Even more so, online projects such as the living archive demand this, as what is presented on screen is literally a centre of indetermination in terms of what is to be noticed and then done.

From this we can see that narrative, interactivity, and database aesthetics are a consequence of the sensory motor schema, not its cause. Furthermore, noticing and doing fall within the realm of experience and interaction design, and as Deleuzes schema indicates, it is the distance between these terms, of noticing and doing, that comes to matter. The more closely aligned they are, the more instrumentalised the interface and the experience. The further apart they are, the greater the centre of indetermination, then the more affective the work becomes.

Affect is then about indetermination, uncertainty, and interruption, and from this we can see that systems such as the living archive aspire to be affective assemblages, and it is this that constitutes them as living systems constituted by their ability to allow new varieties and densities of relations to be formed amongst its parts.

Anderson, Ben, and Colin McFarlane. Assemblage and Geography. Area 43.2 (2011): 124 127.

Tags: , ,

Lists

Lists. An alternative to narrative.

poems, lyrics, dance, paintings, photographs collected under a theme, family albums, my record collection, our constrained tasks, the collection of someone’s vine clips, the Vietnam war memorial wall in Washington, William’s Changed the Locks, Eco’s The Infinity of Lists, A poetry of lists: heuristic approaches to complexity and ambivalence, latour litanizer, list of my favourite films.

Lists don’t have ends, they open up connections and possibilities. They celebrate that things are actually and always densely connected rather than pretending they aren’t. This is why they are common with monuments as to narrate is to categorise and separate and claim to be able to know that which can’t be known. So an ethics to the list.

Any my favourite, “things that make the heart beat faster” from The Pillow Book of Sei Shonagon:

Sparrows feeding their young. To pass a place where babies are playing. To sleep in a room where some fine incense has been burnt. To notice that one’s elegant Chinese mirror has become a little cloudy. To see a gentleman stop his carriage before one’s gate and instruct his attendants to announce his arrival. To wash one’s hair, make one’s toilet, and put on scented robes; even if not a soul sees one, these preparations still produce an inner pleasure.

It is night and once is expecting a visitor. Suddenly one is startled by the sound of rain-drops, which the wind blows against the shutters.

Which formed the basis for this experimental iBook.

Tags: , , ,

Qwiki

Qwiki, (there’s a video demo) which used to be a web service, has reinvented itself as an iOS app. It uses your camera roll to make movies. These increasingly automated services are the future of what we are describing here as everyday media. They are the video version of the templates that Microsoft Word includes for letters, or business cards, and the like. If it is anything like the earlier web service, it will be very good very quickly.

Tags: , , ,