Jonathan Cohen, Meg Withgott1, and Philippe Piernot2
Interval Research Corporation
1801 Page Mill Road
Palo Alto, CA 94304
This paper describes the evolution, implementation, and use of logjam, a system for video logging. The system features a game-board that senses the location and identities of pieces placed upon it. The board is the interface that enables a group of people to log video footage together. We report on some of the surprising physical and social dynamics that we have observed in multi-person logging sessions using the system.
Tangible user interfaces, TUI, CSCW, video ethnography, video logging, user experience, 2D sensing/tracking
In the summer of 1994, a group of video ethnographers from Interval Research accompanied the Lollapalooza concert tour to venues around the USA. They collected about a hundred hours of videotape of attendees visiting the "Electric Carnival," a tent on the concert site containing about sixty high technology interactive kiosks. The ethnographers were interested in the attendees' reactions to the technology, and their tapes include interviews and observations.
At the end of the summer, the group confronted a massive video logging problem. Video logging is exacting and solitary work. How could the logging process be made less tedious, and how could the ethnographers share their knowledge of the video they logged so they had the best choice of clips when they presented this material? Could a group-based logging approach facilitate this process?
Independently but simultaneously, other Interval researchers constructed an electronic game-board, which detects the 2D position and can identify a number of domino-like pieces placed upon it.
This paper describes logjam, a prototype system for video logging, which features the game-board as the interface to enable a group of people to log footage together. We report on the surprising physical and social dynamics that we have observed in multi-person logging sessions using the system.
Before describing logjam itself, this section explains the logging task, presents an example of group-logging, and describes tangible user interface (TUI) work that influenced the design of logjam.
The Logging Task
The task of the video ethnography project (VEP) was to influence researchers--technology inventors and designers--by acquainting them with future consumers' stories, lifestyles, goals, needs and desires . VEP distributed questionnaires, and conducted interviews, focus groups, and visited the homes of people representing potentially interesting segments of society. VEP videotaped much of their research.
These tapes, forming a collection on the order of a hundred hours per study, were then logged and annotated in a process that familiarized several members of the VEP team with the footage. With the help of the logs, the footage was used and reused to construct differently edited videotapes aimed at technologists' various interests.
Logging the Electric Carnival footage was seen as a process of finding and marking various locations, events, and behaviors in the video, collectively called "categories." Some categories seen in the tapes included:
At any given instant in the video, several categories might apply, and the start and end times of one particular category were mostly independent of the start and end times of another category.
A timeline seemed like the best representation for this style of logging, since it could offer a visual presentation of the duration and overlap of categorized events. Of the commercial and experimental logging software available then, including CVideo(TM), FileMaker(TM), Marquee , MediaStreams , and Timelines , only the latter two supported timeline representations that displayed the duration of an annotation. None of the systems supported group-logging such as described in the next section.
Passage to Vietnam: A Group Logging Experience
Bonnie Johnson [personal communication] described the sorting and categorization process Rick Smolan and his company, Against All Odds, undertook in making their work Passage to Vietnam . She wrote that the company brought:
...70 photographers, 15 videographers, and crews from NBC and a Japanese television production company (who shot HDTV ...) to Vietnam for a week. ... In ten days, ten editors from major publications watched 200,000 pictures and reduced the candidates for publication to 1000. Each editor looks at the work of 7 photographers; then the whole group looks at everything and they vote out loud with what should be kept. ... A list of 10 or so categories such as "food" and "war" was developed out of this process... Everyone watched the 50 hours of tape. They thus arrived [at] 380 potential clips...
We found that their system works for them to the limit of the ability to remember the pictures and the clips. ... It was the process of a group of people watching footage together, no doubt spinning stories about what they were seeing that "fixed" the pictures in their minds, helped them develop the stories.
What was striking about the Passage to Vietnam editorial process, as Johnson noted, was the rapid and transitory group effort it required. The group agreed to a decision and recorded it, then moved on to the next decision. Perhaps most important, the group established a common perspective on, and categorization of, the raw input. This kind of group effort is not unusual in rapid media production efforts.
Would a similar process work for VEP's logging efforts? What kind of interface could we build to support such a group of loggers? We would not want it to replace conversation, merely to augment and focus the dialogue.
Durrell Bishop's marble answering machine prototype  was the strongest influence on the tangible design of logjam. Also important was the emphasis on the mixture of the physical and the virtual found in the special issue of Communications of the ACM devoted to Computer Augmented Environments . Wellner's DigitalDesk was particularly inspiring in its aim "to go beyond so-called 'direct manipulation' with a mouse (which in fact is not direct at all) and to explore the possibilities of tactile interaction with real and electronic objects using the fingers." [p. 92] DigitalDesk, and Mackay's related Video Mosaic--tailored to editing and controlling video--, used video cameras to enable the integration between paper and data. Mitchel Resnick's article [pp. 64-71], about moving computation into small objects, was also a source of inspiration.
THE INTERFACE MEETS THE TASK
At the same time as Interval's ethnographers were coming to terms with the Electric Carnival logging task, another group at Interval, influenced by the TUI work described above, constructed an electronic game-board, which could detect the position and identity of a number of domino-like pieces, or "blocks," placed upon it. The board could send "board events" to a host computer (see figure 1).
Figure 1. The game-board with a few blocks. The hand shows scale. The board-event processor and the serial line to the host computer are visible at the top.
A further development involved the realization that a block could represent a category, and that a block's presence on the board while a videotape was playing could mean that the category applied to the video at that timecode.
Thus arose the notion of building "logjam," a system for group-logging video that employed a game-board. We imagined a scenario where loggers, watching a videotape playing at normal speed, would quickly plop different blocks on the board, representing different categories that applied to the tape at that moment. The loggers would just as quickly remove blocks from the board when that category no longer applied.
LOGJAM SYSTEM OVERVIEW
The main components of the system were a host computer (Macintosh) with a monitor, keyboard and mouse; a VTR (Sony VDeck) and video monitor; the game-board and its associated board-event processor, and a pair of footpedals for controlling the VTR (see figure 2).
The VDeck was a computer controllable, frame-searchable Hi-8 (analog) VTR. The computer sent commands to the VDeck, for example, to play, stop, or go to a particular frame.
When a logger dropped a block on the game-board, or picked it back up again, the board-event processor recorded the event. The computer continuously polled the board-event processor for new events. The computer also continuously polled the VDeck for the current timecode of the video, so it could be associated with the board events.
Footpedals were used to control the VDeck. Pressing down on the pedals generated keystrokes that the Mac converted into VDeck commands.
Figure 2. Logjam system diagram. Double lines represent serial I/O; thin lines represent ADB.
The logjam board was able to detect when a block made contact with it. The blocks were designed to sit on the board like on a Scrabble(TM) tray, making contact with the board in two places--the bottom and back of each row. The board was arranged in a matrix of 4 rows and 12 columns for a total of 48 possible locations.
The blocks and the board were constructed out of a variety of hardwoods. The board was about 22¼ inches wide by 7 inches long by 1 inch high, and the blocks approximately 1 3/4" by 1 1/4" by 1/2". There was room to put 12 blocks across a row. The rows, built of mahogany, were arranged in four parallel 45° "valleys" on the board. Thus, when a block was placed on the board it rested face-up and out (see figure 1).
Each wooden block contained a Dallas Semiconductor DS2401 "silicon serial number" . Each chip held a unique 48-bit ID, and had a ground and a data line. Sending a pulsed signal with the correct protocol over the data line resulted in a return signal, along the same line, containing the 48-bit ID. A copper strip on the bottom of the block was connected to the ground line of the chip, and a second copper strip on the back of the block was connected to the data line of the chip (see figure 3).
Electrical layout on the board matched that of the blocks. Ground was at the base of the row, and data was at the back. Thus, when a block was placed on the board, the proper electrical contacts were made.
In several early tests, we found that people often placed the blocks on the board so the contacts did not quite line up. Magnets were inserted into each block and into each board location to pull a block into good alignment when it was placed on the board. The magnetic attraction also added power to the clicking sound made on contact.
The ground and data lines ran from the board to the board-event processor, a Motorola M68HC11 ("6811") microprocessor running custom firmware. Looking for block-IDs, the 6811 could poll all 48 locations on the board from five to ten times per second, depending on the number of blocks on the board. Internally, the 6811 kept a map of which block-IDs were present at which board location. Whenever a block was put down or picked up, the internal map was updated, and the 6811 added a new block down or block up event to a list.
Figure 3. Close-ups of a block. Counterclockwise from lower left: front with "group" category label; back with DS2401, diode, magnet and copper strip for data; bottom with copper strip for ground; hardwood top.
Unreliability and Variable Latency
We never completely solved two major problems with the logjam prototype.
First, the board was not completely reliable. We did not engineer a tight enough coupling between the physical action of dropping a block on the board (or picking it up), and the computational action that was supposed to follow. Sometimes the electrical connection did not quite occur, or was so "bouncy" that the system interpreted a single event as several quick drop-and-pick-up events.
Second, the system had a variable latency. Because there were two polling architectures involved (both on the 6811 and the Macintosh), and because the Lisp software took time out for garbage-collection, there were occasional delays of one to fifteen seconds before an event was registered.
We will return to these two problems later in the paper.
Mapping Board Locations to Functions
With two exceptions, placing a block on any location in the front three rows of the board meant "create an annotation starting at the current video timecode in the category represented by this block." Picking up a block meant "end the annotation being created in this category at this timecode."
The two exceptions were found at the left side of the board (see figure 4). These "binding" locations gave loggers a way to create or change a binding between a block and a category. Dropping a block on these locations did not affect the log.
Figure 4. Function map of the 48 locations on the board. Legend: c = create annotation; b = bind category, v = video speed control (see that section below). Placing a block on a "c" location is a different kind of event than placing it on a "b" or "v" location.
The logjam software, written in Macintosh Common Lisp (MCL) version 2.0.1, kept track of the bindings between blocks and categories, synchronized board events with videotape timecode, offered a screen-based palette for video control, and gave loggers an editable timeline representation of the ongoing log. At the time (1994), we chose MCL because it was a fast prototyping environment and because there was a large code base and an active community of programmers in-house.
The timeline representation was very similar to that used in MediaStreams  or Timelines . Time marched along the horizontal axis, and the vertical space was broken up into rows, one per category. A single annotation or "snippet" on the timeline represented a span of time on the videotape to which a particular category applied (see Figure 5).
When a block was dropped on the board, a "block-down" (falling-pitch) sound played and several graphical events took place in the timeline view. First, the row containing the category bound to that block scrolled so it was visible in the window. Second, an "open snippet" icon was displayed in the row near left edge of the window. Third, a new snippet, represented by a small rectangle, was created ("opened") in that row. The left edge of the snippet was fixed at the starting time code--the time that the block was dropped. As long as the block stayed on the board, the right edge of the snippet was tied to the changing timecode of the video. When a block was picked up, a "block-up" (rising-pitch) sound played, the open snippet icon in that category's row disappeared, and the open snippet in that category "closed", that is, its right edge was fixed at the current timecode.
(Note that the block event sounds were originally designed for ToonTown, another prototype that used the game-board, in this case to control a shared audio space .)
Using the keyboard and mouse with the timeline window, a logger could type text beneath a snippet to explain it more fully, move snippets around, create them, delete them, or change their starting or ending times. A logger could also click in the timecode ruler area at the top of the window to get the VDeck to scan forward or back until it reached that timecode on the tape.
A logger could change the time scale of the view using a scrollbar at the bottom left of the timeline. Since text display (not text entry) was clipped to the length of a snippet, zooming in ensured that all a snippet's text could be read. Zooming out gave a good overview of the log so far.
Another important function was the ability to create a new category on the fly. If a logger dropped a new (unbound) block on one of the board's binding locations, it brought up a dialog box on screen, which allowed text entry of a new category name. This category was bound to the silicon serial number in the block.
Figure 5. Logjam timeline window. The time scale, spanning about 30 minutes, is across the top. The central black vertical line shows the current video timecode, at about 01:15:00:00. Category names run down the left side. "Where shot" is the currently selected category. The other two categories visible are "What's happening" and "What." Rectangles in the rows are the logged snippets. Using the leftmost scroll bar to zoom the view in further, clipped text annotations can be seen in full.
If a logger dropped an unbound block on one of the "create-snippet" locations, it brought up the same dialog box, but the video was paused so the timecode of the event was saved. Thus loggers did not miss any footage while they were filling out the dialog box. Once the dialog was dismissed, the video was returned to its former playback speed, a new row was added to the timeline, and the new snippet was opened in that row at the saved timecode.
If a logger dropped a previously bound block on a binding location, the dialog box displayed the block's current binding. This could be purely for checking, or the logger could edit the information, changing the binding.
VIDEO SPEED CONTROL
Because loggers may wish to specify precisely the timing of annotations for later editing, it was important to provide accurate and simple video control.
The VDeck only operated at a set number of speeds (seven forward and seven reverse). These speeds did not have a clear incremental relationship, so true analog speed control (like a jog-shuttle) was not possible, and faking it would not fool loggers.
To give loggers a choice, logjam provided three different interfaces for controlling VDeck speed: blocks-and-board, a pair of footpedals, and a mouse-clickable palette on the computer screen. None of these interfaces allowed loggers to use all of the available VDeck speeds--by consensus, we left out the slower ones.
Each footpedal had two positions, "down" and "way down." With the right pedal, down meant "play forward" and way down meant "fast forward." With the left pedal, down meant "reverse play" and way down meant "rewind." Stepping off the pedals meant "stop."
On the screen, the palette had nine buttons, each representing a different speed. Clicking a button set the speed of the VDeck (see figure 6).
Figure 6. The video control palette
In the blocks-and-board interface, the back row of the board was devoted to video speed control. The location at the center of the row meant "pause," and locations to the right of center signified increasing forward speed: slow, normal play, 2X, etc. To the left of center, the speed increased, but the tape played in reverse. To play the VDeck at a particular speed, a logger dropped any block onto that board location (whether the block was bound to a category or not.) To stop the VDeck, the logger dropped a block on the pause location.
For loggers, dropping and picking up blocks to control video was very clumsy when compared to pressing a footpedal or clicking a mouse, particularly when the variable latency of the logjam event software was taken into account.
Multi-Person Video Speed Control
The board interface was an attempt to implement a multi-person video speed control. In practice, however, our loggers improvised their own system. First, there was no lack of feedback about speed. They had all watched enough video to tell the speed just by seeing and hearing the playback. Furthermore, the VDeck was usually set to overlay a graphic on the video representing the speed. So, any speed change effected by the person with the mouse or footpedals was immediately obvious.
Second, to change the speed, loggers just shouted. For example, "Whoa!," "Stop!," and "Go back!" The latency was comparable to that of the board, and it was physically easier. Before anyone started the VDeck playing again, they alerted the others.
We had wanted the state and control of the VDeck to be "open"--that is, public and accessible to all of the loggers simultaneously, whereas both screen palette and footpedals were only accessible to one person at a time. What we found was just working together in the same location provided openness.
A team of VEP members agreed to use logjam to log some of the Electric Carnival footage. The VEP group quickly arranged themselves for working with the system. The board was placed on a table, with the video monitor behind it. One person used the keyboard while three or four others sat near the game-board, picking up and putting down blocks. One of these others also used the footpedals (see figure 7).
The task of the person at the keyboard was to transcribe the video, as well as clean up mistakes, control the VDeck, and occasionally reassure the others that their blocks were recognized correctly by the program.
Group members assigned disjoint sets of categories to themselves (and to each other, like trading-cards), so they could log in parallel. The board supported this style of activity because the software could handle multiple block events, and because there were plenty of block locations available. In practice, the number of blocks on the board simultaneously seldom exceeded more than a dozen.
The loggers had no trouble switching between parallel and group activity. The group would quickly focus to discuss the meaning of a particular category; or a suggestion for a new category, or whether the event on screen was sufficiently "illustrative" to warrant categorizing with a particular block.
Figure 7. A logging session. From the right, C watches the video and waits for the right moment to drop a block; D has his blocks arrayed in front of him; J observes; and S, just out of frame at the left, handles the logjam GUI.
It proved valuable that loggers could create and bind new categories as they identified new events, locations, or behaviors in the video. In the end, they defined more than eighty categories for the Electric Carnival footage, though only around twenty were used with any frequency.
When loggers defined a new category, they attached a sticky label to the block to differentiate it. In the heat of the moment, the label was hand-written, but later they replaced these labels with printed ones. Eventually, even labels with printed text were not sufficiently distinguishable when a logger was trying to grab a block at speed. At first, icons were added to the labels, but later they were also color-coded to emphasize related categories.
An important observation is that the location of the blocks mattered when they were not on the board. All participants arranged their own sets of category blocks on the table in front of them, and loggers had individual layout styles and methods for grouping their own blocks, making it easy to find one's own blocks during logging. The blocks' constant physical presence also reminded loggers of the categories they were seeking.
Some other interactions took place with the blocks that are not easily imaginable with GUIs. For example, snatching: if one logger took too long to log an event in a category she was responsible for, a second logger might grab that block from the first logger's space on the table and drop it for her. Sweeping occurred if the group had been logging a particular scene, and there were a number of category blocks on the board. When the scene changed suddenly (or the tape ended), so that the categories no longer applied, then everyone reached out, sometimes with both hands, to sweep the blocks off the board.
In all, the group used logjam to log eight one-to-two-hour tapes, and spent around fifteen to twenty-five hours--two to three times video duration. In our original scenario, we had naively imagined that group-logging video would not take much more time than just watching it, since group members would be annotating different categories in parallel. But talking things over, or waiting for a person who needed to review the tape or type in a long description, took additional time.
Nonetheless, loggers informally reported that group-logging sessions seemed to take less time than individual logging sessions because they were more fun. Even some self-professed "logging-loathers" dropped in for a couple of the group sessions.
One person continued to use logjam for solo logging after the group-logging effort ended. This was a surprise, particularly since the system was only a prototype. Over time this person logged more tapes than the group did, and even built up entirely different sets of categories for logging different sets of tapes. However, she did not use the board and blocks. Instead, she relied on a number of keyboard-and-mouse shortcuts to accomplish the same functions. Why did she not use the logjam board?
First, there was not room on her desktop. Group logging took place on a big cleared-off table in a shared work area, not squeezed into a little space on a filled-up desk in a small cubicle. Second, her hands were already on the keyboard and it was easier to click or type a shortcut key than to reach over, pick up a block, and place it on the board. Third, the keyboard interface bypassed the board's reliability problems. Fourth, although group logging was useful, and even enjoyable, by this point she had a clear idea of how to apply the categories and just wanted to get on with the job, sans discussion.
Thus the logjam interface was not the answer to all logging problems--it only worked well in a specific context. For example, it was no improvement on other systems if a logger was only after a text transcription.
But the group-logging process had several clear benefits. First, it was typical that one of the loggers in the group had shot the tape being logged. That person could cue the others--for example when something interesting was coming up, or when the next section of tape could be skimmed. Second, the process was a way to show logging newcomers the ropes, integrating them with the whole group at once. Third, this process enforced a consensus about the meaning of particular categories. As one logger reported, since the resumption of individual logging, people use categories more idiosyncratically. This makes it difficult to use other people's logs when collecting footage for a new edit. Fourth, the group accumulated a shared knowledge of the videos they logged together, thereby gaining a common reference.
COUPLING THE PHYSICAL AND THE VIRTUAL
Fitzmaurice, et al., define "tightly coupled" systems as those that keep their "physical and virtual representations perfectly synchronized ." This section addresses some of the issues of tight coupling that came up with logjam, where the configuration of the blocks on the board was coupled with the state of the video annotation.
At the lowest level, one unintended disparity between these representations made the system less useful. Because the board was occasionally unreliable, and the system had a variable latency, a block event would occur on the board but would not immediately (or sometimes not ever) register at the computer.
This confusion led loggers to doubt whether they had really marked the video when they dropped a block. It made them a little tentative, and they would look over at the computer screen to see if the snippet had appeared in the timeline window, or they would ask the person closest to the screen whether the snippet was displayed. This interrupted the flow of work--loggers were focusing their attention on the tool rather than the task. This is an old CHI response-time lesson repeated on a new platform.
At a higher level, the potential for a great disparity existed, but it never seemed troublesome in practice. The board and blocks could only represent the state of annotations as the video was being logged the first time. If the video was rewound to a section that had already been logged, there was no way for the board to reconfigure its blocks to match. So when people wanted to review or edit video annotations, they used the GUI and ignored the TUI. This shift took place naturally.
We hypothesized that group logging could solve three problems: how can we speed up the logging process and make it less tedious, and how can loggers share their knowledge of the video they log? The speed-up would occur because loggers could work in parallel. Working in a group promised to be more fun and create a shared knowledge of the footage.
The logjam interface relied on a game-board that could sense the location and identity of pieces placed upon it. This interface supported group logging by being easily shared among group members. However, one unresolved engineering issue had major repercussions for the quality of the experience. Sometimes there was a latency or disconnection between a block event and the action it was supposed to trigger. As a result, loggers never completely trusted the board.
Still, logjam supported some group processes, for example, lively and probing discussions about the meaning of a category. Also, since people were logging together, they were able to build a common knowledge of the video and the categories. For those who had previously logged or transcribed recorded media with traditional tools, group logging was a welcome relief from a tedious process.
Logjam did not speed up the work, however. Although loggers worked in parallel, whatever time this saved was soon spent in discussions, or waiting on the slowest logger.
There were aspects of the interface that would be difficult to replicate with a GUI. For example, sweeping and snatching, or the individual arrangements of the category blocks on the table with the ability to select one item from 80 (without having to scroll or pop up a menu).
We did not anticipate actions like these before building the prototype; the loggers improvised them. Buxton suggests that the ability to improvise with an instrument is a mark of its worth . While snatching and sweeping actions are probably specific to logjam, we think users of other TUIs will improvise actions within those systems because one of the great strengths of TUIs is that they allow people to take advantage of all the degrees of freedom available in the physical world.
Furthermore, all the ways people make use of the physical world may not have to be mirrored in the virtual one--just as the logjam system did not need to "know" how the category blocks were arranged when they were not on the board, though that was important to the loggers. In a TUI system, this kind of capability comes without extra programming.
Another significant dimension of TUIs is the wonderful, high-resolution quality of objects in the world. Ishii and Ullmer remind us of the rich character of historical scientific instruments , and Buxton describes the material refinement and careful craft inherent in artist's tools . Though logjam did not aspire to such estimable heights, the board and blocks did express a sturdy physical presence. Hardwood and copper made a pleasing combination, and the blocks, with a nice heft and a domino-like size, felt good in the hand. There was an unmistakable sense of physical contact when a block was dropped on the board (and the electronics worked). Such physical sensations do not exist on the screen.
Logjam's method of binding a physical container to its virtual content was similar to the method used in the ToonTown prototype, where a special board location was used for binding a block to one of the people sharing the audio space . Contrast Bishop's answering machine prototype, in which one of the marbles in a reservoir was automatically assigned to an incoming phone message . In the mediaBlocks system, placing a block in the slot of a media recording device begins recording the media, and picking up the block halts the recording--the block is bound to that chunk of media data . Binding seems like a basic activity for TUIs.
The game-board interface succeeded best when physical actions were well-matched to functions but failed when they were not. Dropping and picking up blocks worked for creating snippets, but video speed was best controlled by other means. A GUI seemed better suited for the tasks of editing the logs and for solo logging because it was more efficient than the TUI.
Yet even when a TUI seems called for, some people argue that the work is doable "virtually," all within a GUI. They will say "I don't want any more stuff in my life" or "What happens if I lose the pieces?" Though we respond by pointing out their own successful real-world practice with "stuff," we suspect their arguments are not really intended to be answered. Perhaps they are simply reluctant to change interfaces, like those who balked at the introduction of the desktop metaphor.
Nevertheless, we believe logjam succeeded on two levels. 1) Transforming the logging problem from a solo labor to a group activity was powerful because it supported the emergence of consensus, allowed the task to be parallelized, and mitigated the tedium. 2) The equating of a set of categories with a set of blocks, and the direct association of logging actions with physical actions allowed a simple system to serve as a platform for a complex group activity.
Thanks to Bill Verplank for designing and building the board; Bob Alkire et al., for 6811 programming and foot pedals; Scott Wallters for wiring and magnetizing the board; Durrell Bishop for inspiration; Brian Williams and Neil Mayle for writing the timeline-scaling code, Baldo for the diagram; and Bonnie Johnson for helping to start and continuing to support the work. Special thanks to the logging crew, especially Sue Faulkner. Finally, thanks to the readers of various drafts for their helpful suggestions, including the anonymous CHI reviewers.