Programmatic Statements for a Facetted Videography

Original Citation:

Miles, Adrian. “Programmatic Statements for a Facetted Videography.” Video Vortex Reader: Responses to Youtube. Eds. Geert Lovink and Sabine Niederer. Amsterdam: Institute of Network Cultures and X@kAll, 2208, 223-30.

Programmatic Statements For a Facetted Videography

What happens to editing when video moves from a hard to a soft environment? This chapter is a rough-cut sketch that explores what video editing is, and the implications of this for an emerging, network specific video practice. While this essay discusses video with some degree of specificity the practice that is under consideration is not video art but those works that are, for want of a more accurate term at this historical point, representational and indexical in some manner. They’re videos of things. Such representational practices dominate internet based video practice including commercial, populist, critical and creative uses.


Granularity is a term that is appropriated from hypertext and refers to the smallest meaningful unit within a system. In hypertext this would be a node, in a blog it would probably be a post, and in video this is the shot. Obviously what constitutes ‘smallest’ and ‘meaningful’ are sensitive to different contexts, so that in classical hypertext a node could contain a single word, a phrase, or several paragraphs, as could a blog post, and of course a shot could be of extremely brief duration through to the recent examples of 90 minute plus continuous takes. However, historically it has been the granularity of the cinematic, and now videographic, shot that has provided the basis of cinematic practice as the capacity to subdivide a shot into smaller parts, and then join them to other similarly subdivided shots, is the basis of editing which forms the keystone to cinematic narration.

From the point of view of granularity the most significant feature of the shot is that it is always and already whole. You can’t have ‘half’ a shot as if the shot is twenty seconds and you then cut it in half you end up with two shots of ten seconds, each of which is still whole. This, of course, demonstrates that the ‘wholeness’ of a shot is qualitative, not quantitative, so that the integrity of the shot is not tied to scale or even duration. This is a significant feature of the shot, and while not unusual in the general scheme of things (for example our emotions provide a common enough example of something that is qualitative in the sense being discussed here) it is quite unusual in terms of a discursive and creative mode of practice because for so many other ways of doing to cut something in half, or other sized bits, produces quite different things. For example, you can’t just cut a sentence in half and still have a meaningful unit, or a book, or a line of a poem. Yet in video the granularity of the system is such that it can be subdivided in terms of duration and still be immanently meaningful — it is still a shot of a gun, or a vase of flowers, or of someone walking.

These are the wholes that film deals with, and this attribute of wholeness is external to the shot precisely because the shot can be subdivided. If this were an internal quality then cutting the shot would qualitatively change it, but as is well documented the most significant way in which the shot can be fundamentally altered is by the relations it is placed within — where and how it is placed within a sequence. This provides evidence of the external relations that are a necessary attribute of the shot, as the meaning or value attributed to the shot is highly contextually determined by these sequences. What that image of a woman’s face is understood to mean (apart from its simple and possibly trivial denotation as a particular woman’s face) is determined by the shots it finds itself surrounded by.

However, once we recognise the importance of such external relations we can see that any shot must, by definition, exist in a multiple set of possible relations with other shots (this is what allows for editing in the first instance), and that the specific art of editing in traditional film and video practice is of course the determination of these relations into a fixed, canonical and singular linear form. (A form that in the traditions of all good modernist and romantic aesthetics will appear to make perfect good sense in and of itself.) Editing is therefore the production of relations between small wholes into larger wholes where the larger whole (the sequence, the work) appears to be self sufficiently whole. These variable wholes are possible because its constituent parts have a high level of granularity.


This granularity has been very important to the relevance and use of digital technologies in film and video editing since non–linear editing systems offer the sorts of functionality in relation to sound and image that word processing has afforded text. In traditional film editing (as with the typewriter and traditional typesetting) sequences had to be edited manually, and there was no way to preview or visualise any transitions between shots apart from direct cuts. Older forms of video editing were even less flexible than film because it relied upon linear tape systems, so in many instances it would be impossible to insert an edit into an already cut sequence without overdubbing whatever footage was already at that point on the videotape. Computer based non–linear editing obviously does not have these limitations, and so allows for the visualisation of a wide variety of transitions and effects, and of course the insertion of new material at any point into the timeline with the ability to shift existing editing footage to accommodate the new insertion, or if you prefer to overwrite existing footage.

This suggests that video’s granularity (like text in word processing) has been instrumental in facilitating the development of digital editing and desktop cinema — that if video were not made of small parts with loose connections then the applicability of computing to video editing would have been lessened. These systems, just as with word processing, offer all the advantages of the digital for the production of content, but remove them for the user at the point of publication. For example, while using a word processor it is trivial to move text, annotate it (with voice, image or other text), change fonts, resize the screen and so on. But as a word processor all of these tools are actually directed towards getting those words on paper (hence pagination, page numbering and so on). Once on paper, all of those things just listed (and many others) are gone. It is exactly the same with video, where similarly the video work is malleable and fluid in quite extraordinary ways while being edited, but once committed to publication these features are removed — it becomes resolutely and immutably flat. This is what I have, elsewhere, described as the distinction between hard and softvideo , where in softvideo it is possible to imagine a video architecture and practice that is able to retain this granularity after publication, where videos can be created that consist of shots that no longer have a canonical sequence. The multiplicity of possible relations between shots, which granularity affords, can then be preserved and made available to the user or viewer as a material property of the completed video text.

Two Softvideo Systems

Two projects that achieve this, albeit through different strategies, are Videodefunct and the Korsakow System. Videodefunct currently allows the publication of clips or sequences that are individually tagged and then dynamically displayed through a triptych structure based on the user’s selection of tagged terms. By having a suitable reservoir of clips, with enough tags (so that clips share a large range of tags, many of which they have in common), the user can compose, in concert with the system architecture, individual videographic works by selecting individual tags. In Videodefunct the user selects a tag from an initial list. This generates a series of thumbnails where, again, the user makes a selection. This loads and plays a video in the central pane of the video triptych, and simultaneously generates relevant tag lists under the remaining two, empty video windows. Selecting these tag lists reveals a thumbnail index, which then allows videos to be loaded and played when selected. What may appear, and what sequences may be developed, are subject to this play of author defined, user selected tags and clips, with the sequences shown, and the relations created between sequences via the triptych video panes, always being variable and open through the ongoing aggregation of additional content (more clips) and of course by users selecting other tags or even repeating the same tags which can return other clips and sequences.

Similarly Florian Thalhofer’s Director based Korsakow System achieves a very similar outcome through the use of what is in effect a tagged clip library which supports basic Boolean operations. Within this architecture a clip can have any number of text tags applied to it, including at specific points in an individual clip’s timeline, and the engine searches for matches to these tags from its library based on the authored rules. This produces very complex associations between clips in the system, which can be as open or as closed as you wish. In other words clips can have lots of possible connections to other clips or a highly constrained set, and through the use of its Boolean rules it can make connections based on the usual criteria of ‘is’, ‘is not’, ‘else’, and ‘if’. In addition it is able to preserve rudimentary state information and utilise this as a parameter so that the number of times a clip has been played can be used as a governing rule for clip selection (or non selection). For example, a central video plays, and as it plays the system identifies clips that meet the criteria that the author has defined. These criteria might be that at the beginning of the active clip a search is made to find other clips that match a specific term, and then at twenty seconds find clips that don’t contain a specific term, and at thirty seconds select a clip at random. These clips are displayed as thumbnails below the central video window, and selecting any of these loads this in turn in to the central window and plays it, and this clip will then parse its arguments and populate the clip pane. This architecture is very similar to a hypertext system such as Storyspace with its use of guard fields (rule governed link structures) and provides the possibility to produce ‘tangle’ like series within a larger work that are densely interconnected (whether as shots or sequences doesn’t matter), and then narrow corridors or pathways out of such tangles into other densely connected series, or some combination of these — a structure utilised in HTML by Amerika’s Grammatron, and in Storyspace by Joyce in Afternoon: A Story.

In both of these examples we have three major levels of sequence and relation operating. The first is determined by whoever creates and selects the shots or sequences that form the basic clip library within each authoring environment. These are, strictly speaking, hard video as they are fixed in the usual and traditional way of shots and sequences. The second level operates largely through what is commonly known as spatial montage where relations between shots and sequences are no longer only temporal within a single video window but now spatially distributed across the screen. In the case of Videodefunct this is realised through its triptych of video panes, while the Korsakow System offers a single dominant video window below which appear thumbnails of related clips. Through this collaging of video windows montage moves from being only the sequential relation of parts within a single video window — this and then this — to both the sequential relation of parts and the simultaneous relation of multiple screens to each other. Finally, a third level operates where some aspect of decision making is granted to the system itself where, much like the throw of a dice, the constraints can be quite strict but the outcome remains and is determined outside of the user or the author’s individual agency.

For Videodefunct and the Korsakow System the attachment of tags to shots in concert with rules of combination proves capable of producing complex patterns and relations amongst their respective libraries. As a consequence this larger video work, that is a single Videodefunct or Korsakow project conceived as a whole, is precisely the generation and discovery of such patters by users. This poses significant and fascinating problems in turn for narrative practice in such softvideo environments as we move from being video makers creating specific and single video works towards designers of combinatory engines and the possible narrative, and non narrative, discourses they enable.

Relations and Facets

These systems allow us to revisit and reconsider the role of editing. As we saw it is possible to cut a shot in any number of places and for the shot to retain its wholeness, and to then place this shot into a variety of sequences with other shots and that these sequences will have a substantial, if not a determinant, effect upon the meaning of such shots. As such we can describe the shot as a whole that has multiple possible relations to any other shot where these relations are determined by where the edit is made (an internal series of relations) and what it is then connected to (an external series of relations). I intend to describe these relations as ‘facets’ as facet has connotations of a shot being multifaceted, of having an enormous number of views, or faces, towards which it looks out towards other shots, where these facets are not just internal to a shot but are constituted by the very possibility of the relations it may form with other shots. These facets are then not determined internally, as some sort of immanent given where you could catalogue all the facets of a particular shot, but rather they come to be by the interest they arouse (I can’t think of any other way to describe this at the moment) in or for other shots by the attitude or pose they offer other shots.

In practice any edit may have several such facets simultaneously, and, as a shot is more or less infinitely divisible (it can be cut at any point) there are an enormous set of facets available. Remember, it is not just each frame that may provide a facet, but also those relations with other shots and edit points that might inform a decision to edit, each of which in turn can be thought of as providing or having facets. They are orientated towards each other by the possible action of an edit.

For example, a simple shot may consist of a figure walking. What I am calling the facets of this shot are all of the possible parameters that may be used (consciously or otherwise) to edit this shot with another. These might include elements of the content of the shot, for example where the figure walks to or from, or what they walk towards. It could include pacing and duration, and the speed of the walk. Shot scale, angle, lighting, graphic patterning, colour, storyline, dialogue and character action and so on all provide facets which can be used in making an editing decision. In determining an edit some facets may be more important than others, and indeed may be more ‘visible’ than others. However, such facets are always a multiplicity and can be thought of as those aspects of the shot that are made to become available to other shots by virtue of the relations established through the edit. Which facets get identified are a consequence of these possibilities of connection. This is, historically, one of the reasons why things like storyboards and shot lists have been developed in professional cinematic and televisual production as they are, if you like, a way to domesticate and industrialise (manage) this multiplicity and so an effort to predetermine and constrain these relations towards normative and narratively hegemonic models with their attendant teleological structures.

Virtual and Actual

Conceptually what I have described as facets have a strong affinity with Lévy’s concept of the virtual and the actual. Schematically, the virtual is that set of possible expanding futures that any instant has before it, where, for example, the possible futures I may have a few minutes from now are much more highly constrained in terms of what I may be doing than one year into my future. In addition, all of these possible futures are considered to be virtual, they are all present as possibilities in this future, and while some may be more likely than others, in terms of the virtual all exist. On the other hand the actual are those aspects or trajectories within the virtual that actually come to be — that are actualised. Now, Lévy makes a very substantial distinction between an almost garden variety sort of virtual and actual where what comes to be is a more or less mechanistic playing out of the consequences of the present moment, which he terms the possible. This is contrasted to a system where what comes to be actualised is a qualitative change, an act of creation. In the former what comes to be involves no creation or creativity, and so is about the production of the same rather than the new, while the latter is a response to a problem posed within the virtual. As Lévy notes:

Actualization thus appears as the solution to a problem, a solution not previously contained in its formulation. It is the creation, the invention of a form on the basis of a dynamic configuration of forces and finalities. Actualization involves more than simply assigning reality to a possible or selecting from among a predetermined range of choices. It implies the production of new qualities, a transformation of ideas, a true becoming that feeds the virtual in turn.

Editing has these qualities of actualisation precisely because editing establishes novel and external relations between parts. These relations do not reside implicitly within the shots — if they did it would not be possible to edit any shot into another — yet it is clear that what these shots do and mean is certainly as much a consequence of the relations they are established within as it is of what the content of the shot may be. (A shot of a gun firing is a gun firing, but what comes before and after that particular shot makes all the difference to what we understand that shot of the gun to mean.)

In addition editing, certainly editing that wants to move away from the simple representation of a highly descriptive storyline (which in Lévy’s terms would be editing that is subject to the possible), is a response to the problem posed by the shot and its possible relations, where this problem is a ‘knot of tendencies or forces that accompanies a situation, event, object, or entity’. Clearly in video editing these forces are never singular (which accounts for the intense promiscuity of video and film, we can and do join anything to anything), yet in traditional hard video practice this promiscuity and the qualitative possibilities immanent within every shot must be reduced to a single and fixed vector at the point of editing, and is forever hypostatised within the published work.

We can then define editing as the activity of actualising the virtual that each shot expresses. The shot poses and contains problems, where each of these problems express what are best thought of as vectors of force offering particular trajectories – how to narrate the story, cutting on action, colour, narrative event, shot scale, shot length, contrast, mise–en–scene, total length of the work and so on. How a work is edited becomes the actualisation of these virtualities, and in their actualisation they are not merely possible (the realisation of the same) but are the creation and invention of the new.

These actualisations, while made linear, sequential and fixed in hard video, provide a theoretical and practical point of difference for a softvideo poetics. A softvideo architecture that allows these multiple facets to remain available, in some manner, after the work is ‘published’. This is the achievement of Videodefunct and the projects created within the Korsakow System as each allows for a multiplicity of actualisations between shots and sequences after publication.


A theoretical argot is needed to make concrete the concept of these facets, and to describe how video works may acknowledge the granularity of the shot and the multiplicity of these facets after publication. Such a model implicitly requires, and accepts, that the network and computer is no longer merely a tool of production and distribution, but is integral to the possibility of being able to create and use video online.

Videodefunct and the Korsakow System are substantial steps towards a softvideo practice that is able to maintain the facetted nature of the connections between shots after publication. Each provides a system for the production of multiple relations between content and user, and while it is a commonplace (and naïve) error to describe systems such as Videodefunct and Korsakow as ‘interactive’ they are more accurately and productively characterised as combinatory environments which provide templates or structures that provide for the possibility of connections being formed. That is, they are not authoring or publishing systems in the traditional sense in which I author and then ‘publish’, but engines that allow content to be contributed and then ‘mixed’ (for want of a better term) in an ongoing basis. Such practices look strongly towards design and systems development as our role here moves from being content creator towards the architecture of poetic and possibly autopoietic systems. As the example of blogging demonstrates, where technical features such as a publicly available permalink for every post and trackback exists a fine level of granularity is preserved producing an architecture where parts can easily be loosely connected to other parts. Similarly video must maintain its granularity after publication so that it becomes porous to its own possible connections to those clips that are near to hand (those in the systems clip library) as well as far (other clips available via HTTP requests). In this way any video shot or sequence remains available to be actualised after the moment of publication. While such an architecture is only one element towards realising a softvideo practice it provides the affordances to develop highly granular works that allow for the multiplicity of connections between parts. This contributes to a videographic poetics that is able to look beyond internet video’s current atavistic misjudging of the merely televisual as a properly network specific videography.


The author would like to acknowledge the assistance of Argos, Brussels and the Australasian Centre for Interaction Design for their support in allowing participation in the Video Vortex conference series.


Amerika, Mark. Grammatron.

Bernstein, Mark. ‘Patterns of Hypertext’, Proceedings of the Ninth ACM Hypertext Conference. Pittsburgh: ACM, 1998. 21–9.

Deleuze, Gilles. Cinema One: The Movement–Image, Trans. Hugh Tomlinson and Barbara Habberjam. Minneapolis: University of Minnesota Press, 1986.

Deverell, Keith, Seth Keen, and David Wolf. ‘About’, Videodefunct, n.d.

Joyce, Michael. Afternoon: A Story, Watertown: Eastgate Systems, 1987.

Lévy, Pierre. Becoming Virtual: Reality in the Digital Age, Trans. Robert Bononno. New York: Plenum Trade, 1998.

Manovich, Lev. The Language of New Media, Cambridge: MIT Press, 2001.

Miles, Adrian. ‘Cinematic Paradigms for Hypertext’, Continuum: Journal of Media and Cultural Studies 13.2 July (1999): 217-26.

Miles, Adrian. ‘Softvideography’, Cybertext Yearbook 2002–2003. Eds. Markku Eskelinen and Raine Koskimaa. Vol. 77. Jyväkylän: Research Center for Contemporary Culture, 2003. 218 – 36.

Mitry, Jean. The Aesthetics and Psychology of the Cinema, Trans. Christopher King. Bloomington: Indiana University Press, 1997

Schulmeister, Rolf. ‘Structural Features of Hypertext’, Hypermedia Learning Systems.

Thalhofer, Florian. [Korsakow Syndrom]. Korsakow System, n.d.

video blogging, et al.