Sie sind auf Seite 1von 15

Aaron Lieb 02 . 12 .

09
Feature Descriptions
This document outlines the core functionalities of the ProZeuxis system, organized into

different categories that directly relate each feature to a physical aspects of live performance; Space,
Movement, Dynamics, Rhythm, Pitch, and Mood. Each group of features will be listed in a prioritized
order. The first item in each group represents a necessary feature, or “core functionality”, while each
subsequent feature may be categorized as “would be nice to have.” This list of features can be seen as
the larger picture of what the final system would be capable of, even if everything listed here can not be
implemented within the scope of thesis execution. By describing each feature individually, I am also
outlining the overall system architecture as modular components. These components can then be more
easily understood as they pertain to their counterparts in the more technical data model.

Space
Composition Projection – The primary feature of this system is the capability of projecting
composited visuals toward the performance area, while also presenting the
user with a completely different set of user interface (UI) graphics. Running
visual elements for both of these purposes within one application would be
problematic in the area of performance. A more efficient way to implement
both the table UI and the projected composition would be to run each as
separate client applications. In this design, a ProZeuxis server application
would act a liaison by providing a common protocol for communication
between the two. Each of these components be defined and abbreviated as:

Console Client (CC) - Runs the UI for the VJ performer


- Sends and receives messages with a ProZeuxis
Server application via socket communication

Presentation Client (PC) - Processes a visual composition based on


messages from the ProZeuxis Server
- Processes the composition using these messages
and a predefined Visual Bank (VB) of images,
movie clips, and procedural effects
- Sends the final composition to a dedicated
projector
ProZeuxis Server (PS) - Sends messages between the CC and PC via a
specified socket
- Acts as a data adapter for incoming camera and
audio feeds
- The PS will be able to analyze these inputs and
send the results as a simple message to all
clients listening on the appropriate socket

This client-server architecture will allow the system to better process


visuals through a dedicated Presentation Client. This design will allow the
clients to run on separate machines, allowing for a variety of possible
configurations. For a setup where the PC and CC are running on independent
machines, it will be the PS application's task to make sure that each client
has a copy of the same Visual Bank elements. A master copy of this Visual
Bank, or (VB), containing the images, clips, and procedural effects will be
located on the same machines as the PS. As a user connects to the PS, via
an instance of the CC, they will have the ability to access the server's file
system and load up the desired VB elements that they wish to use for that
performance. The server will then make sure that both the CC and PC
machines (if not the same) have a copy of the VB data. This transaction
should occur before a performance begins. By prepping the system in this
way, the clients can send messages to one another describing how to
manipulate this media without having to send the media itself.
For the prototype system that will be developed for the first test
performance, these three components will most likely reside on the same
physical machine. However, nothing will be lost in terms of development time
to design the system in this way. The system will still be able to perform
more efficiently than if all three components were running concurrently in the
same bloated application.
The following, are five possible implementations of the architecture
as it supports the functionality of “Composition Projection” via one or more
instances of the Presentation Client. Each example becomes more complex,
but also more capable in terms of allowing the PC to processes incoming
messages and generate visuals.
1. Simple – Machine A: runs all three applications as separate processes

fig. 1.1

2. Distributed – Machine A: runs Console Client


Machine B: runs ProZeuxis Server and Presentation Client

fig. 1.2

3 Complex – Machine A,B,C: run one of the system applications each

fig. 1.3
4. Multi-Console – Machine A: runs Console Client A
Machine B: runs Console Client B
Machine C: runs ProZeuxis Server and Presentation Client

fig. 1.4

5 Multi-Presentation -Machine A: runs Console Client


Machine B: runs ProZeuxis Server and
Presentation Client-A
Machine C: runs Presentation Client-B

fig. 1.5
From these examples, the potential for scaling the system's
capabilities to fit the needs of differently sized performances can be seen.
The larger the demand for visual output, the more equipment and visual
performers may also be needed.
It may be necessary to clarify the significance of the components of
Audio and Video Inputs as shown in the diagram. They were added to
emphasize the design consideration concerning the way the system will
process these inputs. It shows all audio and video inputs being sent to the
PS to be analyzed once in order to distribute the results as simple messages
to all connected CCs, thus saving processing power.
The exception to this, according to fig. 1.3 and 1.5, would seem to
be that PCs not residing on the same physical machine as a PS receive their
own split copy of the video input. The purpose of this input is not to have the
video analyzed by the PC, but rather to make it more quickly accessible for
visual effects that would project any portion of the video back toward the
stage such as a “jumbotron” effect.

Positive Space Guide – In many cases, performers and set designers want to be creative, and have
sets with elaborate or oddly shaped projection screens. For example, the Steely Dan,
“Think Fast Tour 2008” featured a screen comprised of large square sections pieced
together in a haphazard, pixelated arrangement (fig. 2.1). A Positive Space Guide
would be a tool that the VJ uses to address this issue of custom shaped screens. It
would function by allowing them to set up regions representative of these surfaces
before the performance begins.
Steely Dan performing “Josie”, 2008

fig 2.1 To set up the guide, they will need to enter an initial calibration mode in
which the system begins by projecting no light at the stage (fig. 2.2). The user will
then use a set of touch sensitive drawing tools presented to them via the reacTable
console to block out simple polygons of a desired color to represent the positive
space of the projection screens. As they interactively tweak the positions of their
polygons, they will be able to observe the coverage that the polygons are creating as
they are projected toward the screens (fig 2.3).
This process can be repeated to block off different portions of a screen,
designating them as unique regions of positive space. The user could also designate
fig 2.2
internal areas of an existing region to mark them as separate regions (fig 2.4). When
the process is complete, and all screens are sufficiently covered by the guide, the
calibration mode will be exited and the guide will be made accessible for the duration
of the performance.
This will allow the user to see on the reacTable interface where their visuals
will be projected, rather than needing to look back and forth between the console and
the stage each time they want to move a visual element's position. This guide would
also be a useful feature, as some generative effects for the system could be designed
to use these regions as a parameter. Such effects may be able to be “snapped” to
fig 2.3
the boundary of a Positive Space Region so they appear to be contained within it.
Also, for less sophisticated content such as video clips, a Positive Space Region
could be applied as a master alpha channel before it is projected to the stage
(fig 2.5).
The capabilities gained by sectioning off a performance space using this
feature could allow for some interesting possibilities. This could be combined with the
previous concept of Composition Projection, in which the notion of multiple Projection
Clients and VJ performers were outlined as possibilities. Perhaps the projection

fig 2.4 screen is a normal rectangular surface, covering the entirety of the stage behind the
performers. One VJ performer could section off shapes that fall behind the stage
performers as positive space. The second VJ performer could then calibrate their
Presentation Client, with its own projector, to use the first VJ's negative space, as
their region of positive space. The positive space regions of the two performers could
even overlap to cause some areas of the screen to be projected on twice for live
fig 2.5 compositing of images using multiple projectors (fig 2.6).

fig 2.6

Movement
Live Motion Tracker – A cornerstone component of any system capable of extended reality effects, is
the ability to track motion in real-time. The system will address this by including a
feature with basic color and planar image tracking capabilities. There are already
existing libraries with techniques worked out for achieving this goal. The unique
feature will be how the tracker is able to be configured by the user and applied to
visual aspects of the composition. Possible usages of a Live Motion Tracking node
via these libraries include:

Color Tracking Average-to-Point – A color tracker could be configured to detect a


specific color value specified by the user. As an example, let's say the user
creates a tracker to detect a specific shade of blue to match the singer's
outfit. If the tracker is set to the “Average to
Point” setting, it will first detect tracking
“globs” from the live camera feed based on
where it sees close matches to the set color.
It will then determine the largest glob and
create a tracking point for it's center point.
fig 3.1 A tracker, created in this way, would then

be linked to a tangible node located on the reacTable. This node would have the
capability to share this (x,y) point parameter to any other node parameter that
accepts points. For example, lets say we also have a second node that is configured
as a generative visual that emits randomly colored bubbles from a center point

(fig3.2). By default, this


type of effect would
probably take it's own
x,y position on the
reacTable to use as the
fig 3.2 fig 3.3 emitter center point.

We could instead link this node's center point with the center point provided by the
tracker. To accomplish this, the two nodes would be slid next to one another. When
this is done, for any node, all compatible input and output parameters of each will
appear as small selection bubbles floating around the nodes (fig 3.3). The user could
then use their finger to drag the matching node parameters toward one another in
order to link them. In this case, we would be linking the Emitter Node's center
position with the Tracking Node's Color Average Point.
Linking the node parameters together in
this way would allow for automatic tracking
of the emitter node's center point to the
position of the singer (fig 3.4). The musical
performers would not need to worry about
being in a specific location in order to to
fig 3.4 properly hit an effect cue.

Color Tracking Vector Shape – The color tracking library is also capable of turning
color tracked data into a vector shape rather then a single point. Configuring this
kind of tracker would work much in the same way as the Average To Point, but
would have a vector shape as it's output parameter (fig 3.5). A vector shape
can be thought of as just another parameter that can be used to plug into other
effects. Just like the other example, the user would slide other generative effect
nodes toward a Tracking marker set to Vector Shape in order to assign any
compatible parameter.
For example, you might have an effect that is capable of using a vector
shape to draw a colored halo emitting from the shape's edge (fig 3.6). This
effect, too, would then be able to track where the halo should be drawn based
on the movements of the singer wearing bright blue clothing.

fig 3.5 fig 3.6

Planar Image Tracking – This tracking component is somewhat different from the
previous two color based tracking examples. It will require more effort to set up,
but will also provide Z-depth parameters for position and rotation in 3D space.
To accomplish this setup, the user would be required to specify an image area
that they wish to track as it is seen on the stage from the tracking camera. To
do the configuration, the tracked object would need to be somewhere on stage
fig 3.7 where it can be clearly seen by the tracking camera (fig 3.7). The user would
then capture a still frame from this camera, and use a drawing tool on the
reacTable console to draw a region of interest that they wish to detect with this
tracker (fig 3.8). A user would most likely want to set up tracking markers for
each image plane before the performance. These could include posters held by a
performer, graphic t-shirts, logos on amps, kick drums, guitar bodies, etc.
The benefit of this type of tracking over color trackers, as mentioned, is the
ability to determine x,y and z position and rotation information from the tracked
fig 3.8 objects. The collaboration of the Image Planar Tracking node with other nodes
would, again, work in the same way by sliding the nodes near one another and
connecting compatible parameters. The parameters of these trackers would be
ideal to connect to generative effect nodes that have a corner pinning capability,
or a 3D cameras component.
Imaging the band logo on a performers shirt is setup as a tracking plane. As
the performer moves and pivots in relationship to the tracking camera, a
generative effect with a 3D camera could automatically react to these
movements. If this effect were used with some basic game engine type 3D room
effect, it might be possible for the performer to use his body movements to
pantomime a walking actions in order to to navigate through this virtual space.
This could be a useful visual effect to tell a story using the visuals and song
simultaneously. For example, the singer could guide the audience though
different spaces that contain objects that remind him of a lost love, while also
singing about the things as they are seen.
These 3D rooms could even be designed to switch to a more realistic pre-
rendered video clip of an interactive environment if the performance required a
more intimate interaction with the objects in the virtual space. This could be a
nice addition to techniques currently being employed for large arena concerts.
Roger Waters' “Dark Side of the Moon Tour 2006-2008” featured a living
Roger Waters,
background environment of a hyper realistic tabletop setup (fig 3.9). The scene,
“Dark Side of the Moon Tour 2007”

fig 3.9 containing memorabilia symbolic of post WWII America, was comprised of
prerendered sequences of cigarette smoke, and a hand changing the radio dial.
As the radio changed it also triggered corresponding diagetic sounds that acted
as an introduction to several of the songs. The Planar Image Tracking system
could allow the position and movements of a performer to rudimentarily trigger
certain events contained in this type of living background environment.

Gestural Interaction – Of course, there would be limitations to the Image Planar Trackers. Trackers
are easily lost by the system when lighting and other conditions do not allow for
a clear view of the image region. For example, tracking markers on a performer's
chest would “disappear” each time they bend or turn away from the camera.
Also, some visual effects may work better if manipulated using some more
intelligent form of tracking. Because of these shortcomings, some situations
would probably be more suited to a type of Gestural Interaction.
Implementing a Gestural Interaction node would first require a number of
gesture event listeners to be defined. A gesture, in this sense, can be thought
of as any unique path made by a cursor defined in 2D space. An existing
gesture class, listens for a cursor to move in four gestural patterns; clockwise,
counter clockwise, horizontal shake, and vertical shake. Simple gestures, such
as these, are easily disambiguated from one another, which helps prevent
gestural misfires. Properly implementing just these four would already provide a
fair amount of possibility for a motion tracked performer.
In order to get this, “cursor” data to feed into our Gestural Tracker, however,
we would first need to define a motion capture technique that suits the needs of
the performance. One way would be to use one of our two, fore mentioned
tracking nodes to provide this x,y data. This could allow for a performer with a
uniquely colored glove and an aptly configured color tracker to deliver swirling or
shaking gestures toward the camera.
For a situation where the performer may want to be facing away from this
tracking camera, or would be in some other way obscured, an alternative live
motion capture technique could be used instead. The existing gesture class,
“ezGestures”, also contains a WiiMote© implementation that could be used as
an alternative. This setup of capturing audio, video, and also OSC data from
a WiiMote© would add another layer of complexity to an already complex project
description. This would most likely be outside of the scope of this thesis, but is
worth mentioning that WiiMote© data or any other alternative data source could
absolutely by included if the need happened to arise.
Assuming a user has successfully set up a Gestural Tracking node, and
completed the steps necessary for defining one of many possible cursor sources,
it would then be ready to be used with a generative patch. The Gestural
Tracking node's output parameter, in this case, would be somewhat unique
compared to others previously mentioned. The Gestural Analysis Node would
be capable of outputting the currently detected state of its predefined gestural
listeners. For short, we can call this a Gesture State Object (gsObj). The gsObj
would contain information about which gesture is currently being performed and
it's current state.
A generative patch could utilize this data object in many different ways.
Animated sequences could start and stop based on which gesture is detected.
Colors could be instructed to lighten or darken with a vertical shake gesture. A
string of sequentially performed gestures could also fire off an action. If the
performer gestures clock wise,
vertical shakes, and stops in short
amount of time, this sequence may
cause some specific, preprogrammed
visual response (fig 3.10).
fig 3.10

Dynamics
Audio Level Detection – For these following three aspects of performance, I will focus on their
relationship to the audio signal that will be coming into the system from the
stage. The first of which, dynamics, involves the techniques for interpreting
decibel levels of this audio. First, however, it may be necessary to elaborate on
some general facts about how the system will retrieve this signal, as it will
directly relate to how the system will be used.
Since real time capture of an audio signal, on most computer systems
comes in the form of one master line-in, we will assume that ProZeuxis will,
at least support this one input signal. This will still be able to provide a useful
source of information about the audible qualities of the performance. A simple
setup may derive this audio signal from the final mix coming from the venue's
PA system, as provided by the audio engineer. For more control, and a broader
set of capabilities, however, it would be better to assume that the VJ would be
capturing two or more separate audio signals with the usage of a wireless
microphone system and a mixer connected to the ProZeuxis Server (fig 4.1).

fig 4.1
This might seem like a strange thing to do, for a visual artist to set up their
own system of controlling individual audio signals. However, this would need to
be done in order to filter our audio, based on which audio source we would like
to focus our, “visual” attention. For a performance with multiple instrumentalists,
it would be beneficial to have access to an unmixed input of what each is
playing for a more precise system of audio detection. Because of the way audio
detection algorithms function, any visual patch attempting to use the audio data
would first need to convert it into a Fast Fourier Transform or an (FFT). The
FFT is a sampling of the audio signal overtime, in which the sound is
represented as a spectrum of sub-bands. With the simplest usage of our Audio
Detection node, with no extra parameters configured, we could send the entire
FFT stream directly to any visual patch expecting this as an incoming parameter.
There are countless visual patches that have their own methodology for using
the entire audio FFT to perform some visual effect. This is a fine approach in
its own right for specially coded effects, whose results could not be otherwise
achieved by linking it to our ProZeuxis style data nodes and providing more
simplistic data.
However, if every visual patch we had was using its own unique routine
for breaking down the FFT into a usable form, this would quickly become costly
on the processor and slow down the performance in more ways than one.
Instead, it would be better to allow the user to specify some extra parameters
that effectively set up beat detectors for specific ranges of frequencies. First you
would need to specify which band of the signal you wish to analyze for
beats, or sudden changes in volume. We can always detect the mixed signal
for this information. This kind of detection would react whenever the overall
audio signal exceeds a specified threshold. For honing in on a specific aspect
of the song, however, this simplistic approach may not be accurate enough.
For example, lets say we would like to set up a simple Audio Level Detector
that is configured to listen to a specific range of frequencies. This range could
be explicitly set by the user or selected from a preset. An existing library for
such audio detection, “minim”, already defines preset ranges to listen for kick,
Mixed Signal (cross talk)
snare, and hi-hat. With this, you could detect the kick drum range to feed into a
generative patch that would react to the kicks. However, without first filtering the
drummer's audio using the mixer, signal crosstalk from other musicians playing
sounds in that same “kick drum” defined frequency range would make your
desired visual effect perform quite miserably (fig 4.2).
fig 4.2 With one of the microphones located directly in front of the kick drum,
Kick Drum (isolated with mixer) or even inside, there should not be a problem isolating this signal and sending
it to our desired visual effect by way of the properly configured Audio Level
Detection node (fig 4.3). The same preconditions will apply for the proper
linkage of this new type of node and a visual patch. First, we can identify that
this Audio Detection Node will be capable of suppling a boolean variable over

fig 4.3 time stating “a beat was detected now”, and any generative patch asking for this
type of “pulse” as an input parameter would be capable of linking to it.
The second way this exact node could communicate with a visualization
would be to also provide the decibel value that was detected in the specified
range, as opposed to a thresholded, on-off, beat. This could tell the patch that
there was a beat in our range, and also provide a parameter indicating how loud
it was. Since the process for arriving at both of these types of data are the
same, it might be best to leave it up to the user to specify which they would
like to send to their visual at the time of linkage. If we are telling the visual
patch that we want to send it the “pulse” data, it might say, “Great, now pick a
midi knob that will control the Threshold for this connection”. Any time the user
would want to adjust the, “sensitivity” of this beat detector, they would simply
need to refer back to their configured knob, slider, etc. and tweak it until it
seems to be reacting to their liking.
The second way of connecting the Audio Beat Detector, of course, would be
to supply the numerical data of these beats to a visual patch. With this kind of
connection, you would not necessarily need to have any other mechanism in
place to control a threshold value. It would not be impossible to imagine,
Saxophone
Kick Drum

however, that some patches might be designed to use a threshold controller for
Guitar

this kind of audio data. You could effectively control how much the detected beat
Bass

data effects your patch by tweaking the threshold in this situation.


At first glance, this approach might not seem like it would be capable of
handling multiple Beat Detection Nodes operating at the same time. This is
not entirely true, however. Although the cross talk issue that we mentioned does
put a damper on any chance absolute accuracy, the proposed setup would have
a built in way for combating this. Beings that each of our four separate
microphone signals are being run through a mixer, the VJ could equalize the

fig 4.4 individual channels to limit each signals' dynamic frequency range (fig 4.4).
By limiting the frequencies that are capable of being produced by each
mixer channel though equalization, you can manipulate each signals' chances
of being picked up by this or that Beat Detection node, even after the signal has
been mixed. In this way, you would be able to limit cross talk, and achieve a
separation between frequencies for each instrument, or other audio signal. Once
this has been successfully accomplished, a multitude of Beat Detection Nodes,
checking beats on different ranges could be configured and working
independently of one another.

Rhythm
Tempo Detection – The function of the previous Audio Detection Node is an essential step

toward doing any other type of further audio analysis such as Tempo

Detection. Typically, detecting a tempo of a mixed composition is an


exceeding difficult task, again due to issues involving cross talk. Since,
we will have the ability to somewhat isolate certain key musical signals,
however, it might be possible to achieve a satisfactory tempo detection.
A good place to start, would be to have a kick drum type node, or
some of element of the composition that is dictating the overall pulse of
the music isolated. This initially configured Beat Detection node could
then be used to feed into a secondary Tempo Detection Node.
Essentially, its entire purpose would be to keep count of the beats
provided to it through this connection. Based on its counting, it could
calculate a relative Beat Per Minute (bpm) that could be used in any
visual patch requiring this type of “speed” parameter.
As a note, it would also be possible for a visual patch to have
its own method for determining a speed or bpm just from a given Beat
Detection Node connection directly. However, sticking with my concerns
with this type of design, it would always be better to do this analysis
once, and have the ability to share it many times over.
As a very simplistic example, this type of parameter could be shared
to an a patch that displays a video clip. The bpm could be manipulate
the overall speed at which the video plays back.

Pitch
Tonality Detection –
.
Mood
Visual Categorization –

Das könnte Ihnen auch gefallen