Sie sind auf Seite 1von 16

Rodrigo F.

Cdiz
Music Technology Program
A Fuzzy-Logic Mapper for
School of Music, Northwestern University
711 Elgin Road
Audiovisual Media
Evanston, Illinois 60208-1200 USA
rcadiz@northwestern.edu

Recent technological developments have enabled sis (perceive; Van Campen 1999), thus meaning
us to synthesize images and sounds concurrently a union of the senses. Synaesthesia occurs when
within single computers, even in real time, giving stimulation in one sensory modality automatically
birth to novel and genuinely integrated audiovisual triggers a perception in a second modality in the
art forms (Hunt et al. 1998). But how should we or- absence of any direct stimulation to this second
ganize and compose such works? Given a certain modality (Harrison and Baron-Cohen 1997).
soundscape, what would form an appropriate se- This article is structured as follows. First, the
quence of images to that soundscape? Given a cer- motivations for this work are presented and dis-
tain sequence of images, what soundscape is cussed, including discussions of audiovisual do-
appropriate to it? If the image sequence and the mains, synaesthesia, and isomorphisms. Second,
soundscape are being created concurrently, how fuzzy logic is introduced, including its main fea-
should we compose them? tures. Third, details of the proposed fuzzy-logic
Authors have proposed different approaches to mapper are presented. Fourth, ID-FUSIONES
these questions (Whitney 1980; Hunt et al. 1998; (2001) and TIME EXPOSURE (2005), computer
Lokki et al. 1998; Rudi 1998; Kim and Lipscomb musicvideo works that use the proposed model,
2003; Gerhard and Hepting 2004; Yeo et al. 2004). are discussed as actual implementations of the ap-
These approaches differ significantly, and they are proach described in this article. Finally, conclusions
based on diverse principles, such as correspondence and directions for future work are addressed.
of aural to visual harmony, audiovisual modeling of
mathematical principles, audiovisual rendering,
data sonification, algorithmic control, and parame- Motivation
ter space exploration. It is important to notice that
there is no easy or correct solution, because the To understand how the composition of an audiovi-
problem we must deal with lies in combining two sual work should be addressed, it is important to
entirely different media in time (Hunt et al. 1998). consider how presenting music with visuals affects
A fuzzy-logic approach to the challenge of com- the listener differently from presenting music or
posing both sound and moving image within a co- visuals alone.
herent framework is proposed here as an alternative
solution. This approach is based on a fuzzy-logic
model that enables a flexible mapping of either au- Experiencing the Audiovisual
ral or visual information onto the other, and it is
able to generate complex audiovisual relationships There is substantial empirical evidence to support
by very simple means. This mapping strategy is in- the common subjective experience that music and
spired by two fundamental ideas: isomorphism and moving images interact in powerful and effective
synaesthesia. Isomorphism applies when two com- ways (Bullerjahn and Guldenring 1994; Iwamiya
plex structures can be mapped onto each other 1994; Lipscomb and Kendall 1994; Rosar 1994; Sir-
based on the fact that changes in one modality con- ius and Clarke 1994). However, as Finaas (2001) sug-
sistently cause changes in another modality (Hof- gests, it is often difficult to predict the exact
stadter 1999). The word synaesthesia comes influence of the visual stimuli that relate to audio
directly from the Greek syn (together) and asthe- stimuli. The visual elements and their relationship
to the music can vary tremendously. Hunt et al.
Computer Music Journal, 30:1, pp. 6782, Spring 2006 (1998) suggest that combining music and visuals
2006 Massachusetts Institute of Technology. produces combinatorial relationships of such com-

Cdiz 67
plexity that it forces composers to develop exten- visual composite, regardless of visual stimulus. In
sive algorithmic control to maintain stylistic con- other words, a musical soundtrack can change the
sistency. Sirius and Clarke (1994) composed music meaning of a film presentation.
and used computer-generated moving images to in- As a summary, there is significant evidence sup-
vestigate the interaction of different visual and mu- porting the idea that music and visuals interact in
sical parameters. Their findings show that the powerful ways when combined together, and that in
effects of music in the rating of visual images are the case of non-documentary modes of presenta-
usually additive and that there are no interactions tion, the relationship between aural and visual ma-
among specific musical styles and particular visual terial can achieve high levels of complexity. These
images. In other words, no specific audiovisual kinds of audiovisual interactions may resemble the
combinations acquire particular semantic charac- perceptual phenomenon known as synaesthesia.
teristics.
Finaas (2001) addressed the question of whether
presenting music live, audiovisually, or only aurally Synaesthesia
makes any difference for listeners experiences. Au-
diovisual presentations were categorized in three Synaesthesia is technically defined as an involun-
different submodes: simple documentary, which is tary physical experience of a cross-modal associa-
just a video recording of the live performance; TV- tion. In other words, a crossing of the senses occurs;
type, in which live performance is alternated with having one sense stimulated would cause stimula-
images from various perspectives, close-ups, and tion in another sense as well. For those who possess
images of details; and non-documentary, which synaesthesia it is an obvious and integral part of
does not aim to be a faithful description of the per- their sense perception. For a synaesthetic person
formance. Non-documentary is the type of presen- there is no question about using synaesthesia or
tation that best describes the approach presented not, because it is simply there. But to those who
here, and it also generally corresponds to Rudis do not possess synaesthesia, it remains complex and
(1998) idea of concert videos in which the moving very often misunderstood (Berman 1999). The most
images do not necessarily resemble the performance common type of synaesthesia is called colored
or the score of the music being presented. Finaas re- hearing and corresponds to the phenomenon of
viewed eight studies in the literature finding that, seeing colors when hearing music or vowels (Van
in the case of non-documentary modes of presenta- Campen 1999). A good example of colored hearing
tion, the results showed that no mode of presenta- is given by Marks (1997), as he describes the case of
tion was clearly favored and that non-documentary a synaesthete that saw a brown strip against a dark
visual material had some tendency to create impres- background when presented with a tone pitched at
sions of activity and complexity, while its effect on 50 Hz with an amplitude of 100 dB.
preference was unpredictable. Synaesthesia consists not of random associations
Lipscomb and Kendall (1994) investigated the re- between isolated phenomena or qualities of two
lationship between the musical soundtrack and vi- sensory domains, but rather expresses correlated di-
sual images in the motion picture experience, which mensions or attributes (Marks 1997). In this sense,
can usually be classified as non-documentary. They synaesthesia acts like a mapping from one modal
selected five scenes from commercial motion pic- dimension to another. The proposed fuzzy mapper
tures along with their musical scores. Each sound- operates on the same principle.
track was paired with every visual excerpt, resulting Artistic works that fuse the senses are often re-
in a total of 25 audiovisual composites. In one of the ferred to as synaesthetic art (Hertz 1999). This is
experiments, subjects were asked to rate all 25 com- possibly owing to the appropriation of the term
posites on semantic differential scales. The findings synaesthesia by many art historians who have
show that music exercised a strong and consistent written about the interrelationships of music and
influence over the subjects responses to an audio- art (Berman 1999).

68 Computer Music Journal


However, it is very important to distinguish the other units that face the same set of environmental
term synaesthesia from the term synaesthetic conditions.
art. As Hertz (1999) suggests, Synaesthetic art is a
deliberate contrivance, the product of an artistic as-
piration, and we should not confuse it with the neu- Fuzzy Logic
rological phenomenon of synaesthesia. Persons, not
artworks, are synaesthetic. Also, these kinds of Fuzzy logic is a concept derived from the branch of
artworks are usually created for an audience that is mathematics called fuzzy sets (Zadeh 1965). In a nar-
not synaesthetic. Cytowic (1997) also states that row sense, it refers to a logical system that generalizes
there is a sharp demarcation of synaesthesia as a traditional two-valued logic for reasoning under un-
sensual perception, as distinct from a mental object certainty, allowing multiple values of truth. In a broad
like cross-modal associations in non-synesthetes, sense, it refers to all the theories and technologies
metaphoric language, or even artistic aspirations to that employ fuzzy sets (Yen and Langari 1999). In gen-
sensory fusion. The fuzzy-logic mapper described eral, when fuzzy logic is applied to computers, it al-
here allows the creation of synaesthetic art by pro- lows them to emulate the human reasoning process,
viding a framework that permits the mapping of quantify imprecise information, and make decisions
information from one dimension, either visual or based on vague and incomplete data (Kosko 1993).
aural, into the other, resembling the perceptual phe-
nomenon of synaesthesia.
Binary Logic versus Fuzzy Logic

Isomorphism In binary logic, variables are either true or false,


black or white, 1 or 0. Aristotles law of the ex-
Isomorphism is a very general concept that appears cluded middle holds: we can only have A or not-A
in several areas, not only in mathematics. The word (Kosko 1993). A given object cannot be part of both
derives from the Greek iso, meaning equal, and A and not-A at the same time. If so, there is a con-
morphosis, meaning to form or to shape. In a tradiction. In contrast, fuzzy logic is defined with
strict mathematical sense, a morphism is a map be- uncertain terms, and partial values of truth are ad-
tween two objects in an abstract category. General mitted. Things are not true or falseblack and
morphisms are called homomorphisms. A homo- whiteanymore; they can be partially true or false
morphism that is a one-to-one mapping is called an or any shade of gray. Mathematically, fuzzy logic ac-
isomorphism. Informally, an isomorphism is a map cepts values between 1 and 0.
that preserves sets and relations among elements
(Weisstein 2005). In other words, if A and B are iso-
morphic, then there are elements or features of A Fuzzy Sets
that map onto B, and vice versa, even though A and
B may appear different. A fuzzy set is a set whose members belong to it to
Hofstadter (1999) affirms that isomorphism ap- some degree. In contrast, a standard or non-fuzzy
plies when two complex structures can be mapped set contains its members all or none (Kosko 1993).
onto each other in such a way that to each part of Theoretically, a fuzzy set F of a universe of discourse
one structure there is a corresponding part in the X = {x} is defined as a mapping, mF(x) : X [0, a] by
other structure, where corresponding means which each x is assigned a number in the range [0, a].
that the two parts play similar roles in their re- When a = 1, which is the usual, the set is called nor-
spective structures. According to DiMaggio and mal. In the extreme case where the distribution is of
Powell (1983), in 1968 Hawley defined isomorphism zero width, the membership function is reduced to
(in a sociological context) as a constraining process singularities, i.e., the fuzzy set reduces to a crisp set.
that forces one unit in a population to resemble (A crisp set is one in which boundaries are sharply

Cdiz 69
defined and membership is all or none.) If the sin- tion without a mathematical model of how outputs
gularities are of two possibilities, we have binary depend on inputs. They are model-free estimators.
logic. Here, mF is the degree of membership (or de- They learn from experience with numericaland
gree of truth) of x. Fuzzy sets can have any shape, sometimes, linguisticsample data (Kosko 1992).
and no shape has been proven to be the best (Mitaim A fuzzy system is defined as a system with oper-
and Kosko 2001). However, the most commonly ating principles based on fuzzy information pro-
used shapes are triangular and Gaussian functions. cessing and decision making. In such a system,
inputs are classified into fuzzy sets, and outputs are
de-classified from fuzzy sets to scalars or crisp sets.
Fuzzy Rules and Fuzzy Patches The process of classification is defined as fuzzifica-
tion and the de-classification as defuzzification.
Fuzzy logic attempts to emulate the way humans Fuzzification and defuzzification are critical opera-
reason with vague rules of thumb and common tions in the design of fuzzy systems, as both of these
sense. Malki and Umeh (2000) propose the follow- operations provide nexus between the fuzzy set do-
ing example: If the weather is fine, then we may de- main and the real-value scalar domain (Roychowd-
cide to go out. If the forecast says the weather will hury 2001).
be bad today, but fine tomorrow, then we might de- Figure 1 shows the fuzzification of the concept of
cide not to go today and postpone it until tomorrow. loudness. The goal of the fuzzification process is
A fuzzy rule is a conditional of the form if X = A to classify a variable or concept into different fuzzy
then Y = B, where A and B are fuzzy sets (Kosko regions or sets by calculating degrees of member-
1993). Each fuzzy rule in a system determines a ship in each fuzzy set. In this particular case, there
fuzzy patch, defined as the product (A B) of the are three fuzzy sets or membership functions to
fuzzy sets A and B in the system input-output state which loudness can belong, labeled LOW,
space (the set of all possible combinations of input MEDIUM, and HIGH. The membership func-
and outputs). These patches can have a simple tions are normalized, and the universe of discourse
geometry in the system state space, for example, an X contains all possible values for loudness levels in
ellipsoid (Dickerson and Kosko 1996). the range 30120 dB.
If we take, for example, a 90-dB loudness level, we
can determine the degree of membership in each
The Fuzzy Approximation Theorem one of the three membership functions by finding
the intersection of the scalar value with each fuzzy
Using the concept of fuzzy patches, Kosko (1994) set. As it is possible to observe, 90 dB belongs 70.69%
proved what he called the Fuzzy Approximation to the fuzzy set HIGH, 24.97% to the fuzzy set
Theorem, which states that a fuzzy system can uni- MEDIUM, and 0% to the fuzzy set LOW.
formly approximate any real continuous function Once the inputs to a fuzzy system are fuzzified,
on a compact domain to any degree of accuracy. The one or more fuzzy rules are computed in parallel to
idea is to approximate the target function by cover- produce the corresponding outputs. This process is
ing its graph with fuzzy patches in the input-output called fuzzy inference. Different inference methods
state space and averaging patches that overlap. Ow- are used depending on the purpose and nature of the
ing to this theorem, fuzzy systems are considered fuzzy system. The most common method is the
universal approximators (Ying et al. 1999). Mamdani method and was proposed in 1975 by
Mamdani and Assilian (Ross 2004). There are sev-
eral variants of this method; the min-max method
Fuzzy Systems represents one of the most popular. Using the min-
max Mamdani method of inferenceif, for ex-
Fuzzy systems estimate input-output functions. ample, we have two inputs and one outputthe
Unlike statistical estimators, they estimate a func- fuzzy rules have the form

70 Computer Music Journal


Figure 1. A fuzzy represen-
tation of loudness.

1
LOW MEDIUM HIGH and geology (Demicco and Klir 2004). However,
0.9
fuzzy logic has seldom been used in the artistic and
creative fields, which is very surprising, given that
0.8
other related theories and techniques such as neural
0.7
0.7069
networks and genetic algorithms have been widely
used for artistic purposes. In the specific case of
Degree of membership

0.6
music, despite the fact that Landy (2001) included
0.5 fuzzy logic as one of the potential areas for music
0.4
research in the future, only a small number of appli-
cations related to creative activities have been re-
0.3
0.2497
ported in the literature.
0.2
Fuzzy systems provide several advantages for cre-
ative applications. Fuzzy systems are powerful and
0.1
work in a way that resembles some characteristics
0 of human behavior. Parallel computation of fuzzy
30 40 50 60 70 80 90 100 110 120
Universe of discourse (in dB) rules usually reduces the computation time com-
pared to a traditional mathematical approach. Fuzzy
systems, owing to the fuzzy approximation theo-
if x1 is A1k and x2 is Ak2 then yk is Bk for k = 1, 2, . . .
rem, allow the approximation of highly non-linear
where A1k and Ak2 are input fuzzy sets, and Bk is the systems with any desired degree of precision. Fuzzy
desired output. Given a set of r disjunctive fuzzy systems are model-free estimators; thus, it is not
if-then rules, the aggregated (fuzzy) output will be necessary to know a mathematical model in ad-
given by vance to approximate any system. Fuzzy rules can
be easily specified in the form of if-then state-
Bk(y) = maxk [min [Ak1(input(1)), Ak2(input(2))]] for
ments, allowing the building of fuzzy systems with
k = 1, 2, . . . , r
simple linguistic terms.
After the fuzzy inference process has taken place, According to Ross (2004), the fact that fuzzy sys-
the aggregated outputs are then defuzzified so that tems are universal approximators is a result of the
the desired numeric variables are obtained. Several isomorphism between algebra and the structure of a
methods and techniques for defuzzification have fuzzy system, which is composed of an implication
been proposed in the literature, each one with its between actions and conclusions, or antecedents
advantages and disadvantages (Ross 2004). One of and consequents. The reason for this isomorphism
the most often used is the center of mass or cen- is that both entities (algebra and fuzzy systems) in-
troid method, in which the value of the center of volve a mapping between elements of two or more
gravity of the aggregated fuzzy output is used as the domains. Just as an algebraic function maps an in-
resulting scalar value. put variable to an output variable, a fuzzy system
maps an input group to an output group. With fuzzy
systems, these groups can be linguistic propositions
A Fuzzy-Logic Mapper or other forms of fuzzy information.
Ross (2004) emphasizes that it will be the conse-
Fuzzy logic systems have been widely used in a va- quence of this isomorphism that fuzzy systems will
riety of fields, most prominently engineering and become more popular in the future. In the spirit of
control applications (Kosko 1993; Klir and Yuan this idea, this article proposes a fuzzy mapper as a
1995), but they have also been applied to other areas way to produce isomorphic mappings between aural
as diverse as data analysis (Bandemer and Gottwald and visual information. This can be done in the fol-
1995), economics, business and finance (Von Al- lowing way: if we have two aural elements A1 and
trock 1997), sociology (Dimitrov and Hodge 2002), A2, using the fuzzy mapper we can generate their

Cdiz 71
counterparts in the visual domain, V1 and V2. In be used (some visual and aural as inputs, some visual
other words, given A1, the fuzzy mapper will al- and aural as outputs).
ways produce V1, assuming that the rules are time- At this point, it is important to mention that
invariant. Consequently, given A2, the fuzzy these parametersonly eight altogetherwere
mapper will always produce V2. If there is any kind chosen to keep the simplicity of the compositional
of relationship between A1 and A2, there will also process. In theory, the proposed model can take any
be some relationship (not necessarily the same) be- number of input and output parameters with no
tween V1 and V2. If we accept a general definition limitation other than computational power. Some
of isomorphism as a map that preserves sets and re- of these parameters, especially the aural ones, coin-
lations among elements, then it could be argued cide with those identified to be perceptually rele-
that the proposed fuzzy-logic mapper is isomorphic. vant in the literature. Seashore (1967) mentions
The fuzzy mapper discussed in this article is basi- that sound waves have four, and only four, char-
cally a Mamdani-type fuzzy system in which visual acteristics; namely, frequency, amplitude, duration,
or aural parameters are used as inputs or outputs. and form. Sounds of every conceivable sort, from
The inputs to the system are obtained by any kind pure tone to the roughest noise, can be recorded
of compositional process, and then the fuzzy map- and described in terms of these four. (p. 16) By the
per is used to obtain outputs in the corresponding term form, Seashore refers to timbre. Lipscomb
domain. All the defuzzification for generating out- (1995) found pitch, loudness, and timbre to be per-
put is performed using the center-of-mass method ceptually relevant in the aural domain; location,
of defuzzification. shape, and color were found relevant in the visual
ID-FUSIONES and TIME EXPOSURE, two audio- domain.
visual works based on the proposed fuzzy mapper, In terms of audiovisual relationships, Lipscomb
are presented now as actual implementations of this and Kim (2004) investigated the relationship be-
system. In ID-FUSIONES, aural parameters were tween auditory and visual components of an audio-
used as inputs, and visual parameters were obtained visual composite. Data analysis revealed significant
by the mapper. In TIME EXPOSURE, the opposite between-subjects differences in responses to audio-
situation occurred: visual parameters were used as visual material, suggesting the following primary
inputs, and aural parameters were obtained as a re- relationships: pitch relates to vertical location,
sult of applying the fuzzy mapper. loudness relates to size, and timbre relates to shape.
Duration did not pair as a best match with any vi-
sual parameter. Color matched equally with pitch
Example 1: ID-FUSIONES and loudness. The proposed fuzzy mapper is flexible
enough to produce not only these primary relation-
ID-FUSIONES is an audiovisual work composed by ships but also any kind of relationship between the
the author in collaboration with visual artist Luz audiovisual parameters. The chosen parameters for
Mara Cury in the period May 2000March 2001. this piece, almost the same ones used by Lipscomb
This work consists of 206 audiovisual events that and Kim in their study, are described below; these
occur in a time frame of approximately eight min- consist of frequency, intensity, duration, and noisi-
utes. Each event is unique and has eight variables ness in the aural domain and color, shape, size, and
(four visual and four aural) that define it completely. motion in the visual domain.
In this particular piece, aural parameters were con-
sidered inputs and visual parameters were consid-
ered outputs, although this could vary with different Aural Parameters
applications of the fuzzy model. The relationship
could be inverted (visual as input and aural as out- The Frequency parameter corresponds to the actual
put), as it happens in the second example described primary or fundamental frequency component of
later in this article, or even a mixed approach could the spectrum of each sound event, measured in

72 Computer Music Journal


Figure 2. First eight events
of the score of ID-
FUSIONES (2001).

Hertz. Sonic events in this piece are assumed to be


static, in the sense that the fundamental frequency
does not change in time for a given event, although
the overall spectrum can. The second parameter,
Intensity, corresponds to the maximum intensity
level of the attack portion of an ADSR (attack-
decay-sustain-release) envelope, measured in deci-
bels of sound pressure level (dB SPL). In this work,
levels between 60 and 100 dB SPL were used to
cover the usual dynamic range for most musical
works.
Each event has a temporal duration that deter-
mines its existence. The Duration parameter of
each event is measured in seconds. Although this
parameter is considered an input to the fuzzy map-
per and an aural variable, it also determines the du-
ration of the events in the visual plane. The fourth
and final aural parameter used was called Noise
(Spectrum). Sound events were considered noisy
or not noisy according to the shape of their spec-
trum. The flatter the spectrum, the noisier the
sound. This parameter is measured in a normalized
scale from 0 (not noisy) to 1 (noisy).

Visual Parameters

Initially, the entire visible light spectrumred to


violet was considered as possible values for the
Color parameter. This range was then subdivided equivalent to velocity, because it could have a cer-
into 16 colors plus two distinct shades of gray (a re- tain direction and a certain speed. Fifteen arbitrarily
striction imposed by the software that was used to chosen different types of motion were considered
generate the visual material). This parameter is mea- for this parameter.
sured by the corresponding wavelength of each color, Figure 2 shows the first eight events of the score
measured in meters (using a scale factor of 10 7). of ID-FUSIONES. Each event comprises two rows:
Each event has a certain Shape, classified on a dis- the first row shows the aural parameters, or input
crete complexity scale between 0 and 1. For example, vector, obtained from the compositional process,
a simple straight line would have a 0 score in this and the second row displays the corresponding vi-
scale, a simple curved line a 0.1, a triangle around a sual parameters, obtained from the fuzzy mapper,
0.3 or 0.4, and a highly complex shape with several for that particular event.
line segments and different types of lines a 1.
Each event has a certain Size on the screen that
could be small, medium or large. This parameter is Fuzzy Sets
measured on a normalized discrete scale from 0 to 1.
The fourth and final visual parameter considered Once the eight parameters were defined, it was nec-
was Motion. All visual events of the work are in essary to classify them into fuzzy sets or member-
constant movement. Motion, in this context, is ship functions. The majority of the variables were

Cdiz 73
Figure 3. (a) Membership SPL shown on the x axis; given on the x axis. In each
functions for Frequency, (c) membership functions plot, VL refers to very
with frequency in Hz for Duration, with units in low, L to low, M to
shown on the x axis; (b) sec on the x axis; (d) mem- medium, H to high,
membership functions for bership functions for Noise and VH to very high.
Intensity, with units of dB (Spectrum), with noisiness

(a) (b)
Membership functions for Frequency Membership functions for Intensity

VL L M H V H VL L M H VH
1 1

0.8 0.8
Degree of membership

Degree of membership
0.6 0.6

0.4 0.4

0.2 0.2

0 0

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 50 55 60 65 70 75 80 85 90 95 100

(c) (d)
Membership functions for Duration Membership functions for Noise

VL L M H VH NO YES
1 1

0.8 0.8
Degree of membership

Degree of membership

0.6 0.6

0.4 0.4

0.2 0.2

0 0

0 1 2 3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

classified into very low (VL), low (L), medium (M), distribution of the size of the various sets, which
high (H), and very high (VH) regions. When possible, looks like a logarithmic function. This was done to
the parameters were fuzzified taking into considera- resemble the actual perception of pitch by the audi-
tion some of their perceptual characteristics. tory system.
Figure 3 displays the input fuzzy variables, and Figure 3b shows the membership functions for In-
Figure 4 shows the output variables. Figure 3a tensity. As with frequency, triangular shapes were
shows the membership functions for the aural pa- used and the uneven size distribution was done to
rameter of Frequency. As can be seen, frequency is reflect, roughly, the perception of loudness (in dB
classified into five different fuzzy sets. Triangular SPL). Figure 3c shows the membership functions for
functions were used for each set. Note the uneven Duration. In this case, triangular functions were used,

74 Computer Music Journal


Figure 4. (a) Membership functions for Shape; (d)
functions for Color, with membership functions for
wavelength (5 10 7) in me- Size. Units of measure-
ters on the x axis; (b) mem- ment on the x axis for (b)
bership functions for (d) are arbitrary, as ex-
Motion; (c) membership plained in the text.

(a) (b)
Membership functions for Color Membership functions for Motion

VL L M H VH VL L M H VH
1 1

0.8 0.8
Degree of membership

Degree of membership
0.6 0.6

0.4 0.4

0.2 0.2

0 0

4 4.5 5 5.5 6 6.5 7 0 5 10 15

(c) (d)
Membership functions for Shape Membership functions for Size

VL L M H VH S M L
1 1

0.8 0.8
Degree of membership
Degree of membership

0.6 0.6

0.4 0.4

0.2 0.2

0 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

but they were evenly spaced. A time scale from 010 Figure 4a shows the membership functions for
sec was used. Events lasting more than 10 sec were Color. Gaussian functions were used to emulate the
considered very long (VH). Figure 3d shows the mem- actual frequency distribution of the visible spectrum.
bership functions for Noise (Spectrum). Sounds were Figure 4b shows the membership functions for Shape,
classified into noisy or not-noisy. Note that this re- in which evenly spaced triangular functions were
sembles more a binary classification, with two large considered. Figure 4c shows the membership func-
yes or no areas, than a fuzzy one (although it tions for Size. Only three fuzzy sets were used in this
still is considered fuzzy). This is another advantage case, classifying size into small, medium, and big. Fig-
of the fuzzy approach, because it allows bivalent ure 4d shows the membership functions for Motion,
and multivalent variables equally. in which evenly spaced Gaussian curves were used.

Cdiz 75
Figure 5. Fuzzy mapper op-
eration showing (a) input
(aural) parameters and (b)
output (visual) parameters.

Table 1. Example of Fuzzy Rules from ID-FUSIONES (a)


1. If the frequency is very low and the sound is not noisy,
then the color is high, the motion is very high, and the
shape is very low.
2. If the intensity is very high and the sound is not noisy,
then the shape is low and the size is large.
3. If the frequency is very low and the intensity is very
high and the sound is not noisy, then the color is
medium.
4. If the intensity is high and the duration is very low,
then the size is medium.

Fuzzy Rules

The relations between the different variables were


regulated by means of 26 fuzzy inference rules of de-
cision that control the behavior of the entire mapper.
These rules relate one or more input variables with
one or more output variables and have the form of (b)
if-then statements. As an example, four of the 26
rules used in ID-FUSIONES are shown in Table 1.

Operation of the Fuzzy Mapper

Figure 5 shows a graphical view of the operation of


the fuzzy mapper given a particular input vector and
using only the four rules detailed in Table 1. In this
example, the input vector is [275 95 1 0.5] (represent-
ing the Frequency, Intensity, Duration, and Noise
parameters, respectively), and the output vector ob-
tained by the fuzzy mapper is [6.16 13.5 0.217 0.548]
(for Color, Motion, Shape, and Size). As stated ear-
lier, Frequency is measured in Hz, Intensity in dB
SPL, Duration in sec, Color (wavelength) in meters
(scaled by 107), and the rest of the parameters are
measured in discrete ordered scales depending on
their nature.
The rules are displayed as rows, numbered from
14. The four columns in Figure 5a show the input
(aural) variables; the four columns in Figure 5b show is affected by each rule is shown. All four rules are
the output (visual) variables. The input vector values computed in parallel, using the Mamdani method
are shown in the text row above the first rule, next of inference, and four fuzzy output variables are
to each input name, and they are also represented obtained, shown in the four rectangles below the
by vertical lines that cross the input variables. In- fourth row. These fuzzy outputs are then defuzzi-
side each rectangle, the membership function that fied using the center of mass (or centroid) method of

76 Computer Music Journal


Figure 6. (a) Screen shot
ID-FUSIONES at 2:10; (b)
at 5:29; (c) at 8:02; (d) at
10:42.

(a) (b)

(c) (d)

defuzzification to obtain scalar values for the out- excerpts are available online at www.rodrigocadiz
put (visual) parameters. The obtained scalar values .com/idfusiones. [Editors note: Also see the forth-
are shown in the text row above each output col- coming DVD accompanying Vol. 30, No. 4 of the
umn and next to the output names and also repre- Journal.]
sented as vertical lines on the four rectangles at the
bottom of the last row. Because the output parame-
ters are measured in discrete scales, when non- Example 2: TIME EXPOSURE
integer values are obtained for the output variables,
they are rounded to the nearest integer to obtain an TIME EXPOSURE (2005) is an interactive multi-
appropriate output value. media work that organically incorporates several
Figure 6 shows four screen shots of ID-FUSIONES forms of artistic creation by means of both tradi-
taken at different times. Additional screenshots and tional techniques and new technologies. This work

Cdiz 77
Figure 7. (a) Painting at
month 1; (b) at month 6;
(c) at month 12; (d) two-
dimensional spatial-
frequency spectrum of
painting at month 1; (e) at
month 6; (f) at month 12.

is a collaboration between the author and Carmen (a)


Gonzlez, a visual artist and researcher. TIME EX-
POSURE integrates traditional painting, music
composition, and video art and technology to create
a new artwork that uses time and its physical ef-
fects as artistic material.
Central to this project is the idea that, in the de-
struction of art, its own creation is implicit; to aban-
don a work of art for its own sake constitutes indeed
a way to regenerate it. Time is here conceived as the
only agent capable of creating and destroying art at
the same time: it can destroy an artwork but at the
same time will generate a new one.
The constructive sequence of this project con-
templated the creation of six paintings, especially
created for this project using traditional techniques, (d)
as its point of departure. Each of these six initial
works has a predominant color drawn from the pri-
mary and secondary colors (red, green, blue, yellow,
violet, and orange).
These paintings were then exposed outdoors over
the time span of one year, and the changes to both
their appearance and structure provoked by the ex-
posure to the environment were observed. During
this interval of time, the exposed paintings were dig-
itally recorded daily to document the changes they
endured. These recordings were then edited and
mounted, creating different visual sequences in dig-
ital video format. These resulting visual sequences
also provide the source of musical sequences that
add an audible plane to the visuals on each of the
six video sequences.
A computer program developed by the author an- Figures 7ac show three different shots (published
alyzed the visual sequences and computes their spa- in grayscale) of the blue painting taken at different
tial frequency and color content as they changed times (months 1, 6, and 12, respectively). Figures
over time. Each of the individual recorded images 7df show the spatial-frequency domain (obtained
of the original paintings had a different spatial fre- by taking the two-dimensional Fourier transform)
quency and color structure as a result of environ- of each shot. Here, it is possible to appreciate the
mental exposure. This three-dimensional visual change in frequency content of the images caused
structure (two spatial dimensions and one color di- by outdoor exposure.
mension) was mapped to a three-dimensional sonic Each one of these shots constitutes a video frame,
structure (pitch, intensity, and density), by means of and each frame lasts 1 sec in performance time.
the fuzzy-logic mapper described in this article. There are about 300 frames in total for each paint-
Consequently, the music also shows a kind of de- ing, covering a time span of a little more than a
composition in analogy to the original destructive year in physical time. The change in the color and
trajectory of the paintings. More details of this ap- brightness of the three figures is especially notice-
proach are provided below. able. Figure 7a was taken on a rainy day, and the

78 Computer Music Journal


(b) (c)

(e) (f)

color is dark because the canvas was wet. Figure 7b seen by diagonal lines that appear in the spectrum
shows some clear signs of de-coloration after 6 at different angles.
months of outdoor exposure. Figure 7c was taken on Figure 8 shows the average power spectrum
a sunny day, and some shadows and a brighter im- across the x and y axes of the three images. Clearly,
age can thus be seen. important differences in the high-frequency content
It is also possible to notice some significant of Figures 7a and 7c compared to Figure 7b can be
changes in the spatial-frequency structure of the observed, most likely owing to the environmental
three images. Figure 7d shows much more high- exposure.
frequency content than Figure 7e. Based on this in-
formation, it is possible to observe a low-pass filter
effect that takes place over the images as time pro- Visual Parameters
gresses. Figure 7f show more high-frequency con-
tent than Figure 7e, owing primarily to the presence Figure 8 also shows horizontal lines representing
of sunlight and the shadow. Also, the predominant the average of the average spectrum in both direc-
frequencies in the three images change, as can be tions. These averages were used as inputs to the

Cdiz 79
Figure 8. (a) Average power
spectrum across the X di-
mension; (b) average
power spectrum across the
Y dimension.

(a) (b)
0 0
Month 1 Month 1
Month 6 Month 6
Month 12 10 Month 12

20
20

30
40

40

60 50

60

80
70

80
100

90

120 100
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
Normalized spatial frequency Normalized spatial frequency

fuzzy mapper. Other measurements such as the Table 2. Average RGB Components for Figure 7ac
standard deviations of the average spectra, the spec-
Image R G B
tral centroid in both directions, and the average val-
ues of the RGB components of each frame (shown in Painting at month 1 51 110 153
Table 2 for the case of Figures 7ac) were used as in- Painting at month 6 69 100 140
puts to a fuzzy-logic mapper very similar to the one Painting at month 12 107 123 124
described for ID-FUSIONES. If we inspect the aver-
age values of the blue component of the RGB com-
posite for each frame, we can notice a significant This work is available in DVD format but also as
loss of blue color that this painting suffers as time a computer-executable version that allows the
passes. viewer to control in real time the speed of the flow
of physical time. Additional screen shots and video
excerpts are available online at www.rodrigocadiz
Aural Parameters .com/timeexposure. [Editors note: Also see the
forthcoming DVD accompanying Vol. 30, No. 4 of
The outputs of the fuzzy mapper in this case are au- the Journal.]
ral frequency, intensity, and sonic density. These
parameters were used to control synthesis algo-
rithms that generate the soundscape for the piece. It Conclusions and Future Work
is important to emphasize that each one of the six
visual sequences used the same fuzzy mapper to The composition of audiovisual media is not an
generate the sonic content. This means that each easy task. Several authors have proposed very differ-
one of the visual sequences generated a different ent approaches with different results. The interaction
soundscape, obtained from the specific spatial fre- of music and visual material in non-documentary
quency and color structure of the visual material, modes of presentation is a very complex process
but derived from the same set of fuzzy rules, bring- (Lipscomb and Kendall 1994; Hunt et al. 1998;
ing a meta-coherence to the overall work. Finaas 2001).

80 Computer Music Journal


A fuzzy-logic mapper for audiovisual media has Bullerjahn, C., and M. Guldenring. 1994. An Empirical
been proposed and successfully used in two com- Investigation of Effects of Film Music Using Qualita-
puter music-video pieces. The main advantage of tive Content Analysis. Psychomusicology 13:99118.
this approach is its simplicity of design and use. Cytowic, R. E. 1997. Synaesthesia: Phenomenology and
The mapper is based on if-then-like fuzzy infer- NeuropsychologyA Review of Current Knowledge.
In S. Baron-Cohen and J. E. Harrison, eds. Synaesthesia:
ence rules that resemble human thinking and com-
Classic and Contemporary Readings. Cambridge, Mas-
mon sense. The input and output parameters for the sachusetts: Blackwell, pp. 1739.
mapper, the rules, and their quantities are not re- Demicco, R., and G. J. Klir. 2004. Fuzzy Logic in Geology.
stricted in any way and can be freely chosen by the Boston: Elsevier.
composer. Dickerson, J. A., and B. Kosko. 1996. Fuzzy Function Ap-
There are, however, some limitations. The pre- proximation with Ellipsoidal Rules. IEEE Transac-
sented implementation does not work in real time, tions of Systems, Man, and Cybernetics 26(4):542560.
and it can only generate non-interactive mappings. DiMaggio, P. J., and W. W. Powell. 1983. The Iron Cage
A very important direction for future work in this Revisited: Institutional Isomorphism and Collective
area would be to extend this system to a real-time Rationality in Organizational Fields. American Socio-
based framework. This would allow, for instance, logical Review 48:147160.
Dimitrov, V., and B. Hodge. 2002. Social Fuzziology:
generation of visual images on the fly in reaction
Study of Fuzziness of Social Complexity. Heidelberg,
to live music or vice versa. Another possible direc- Germany: Physica-Verlag.
tion for future research would be to develop a Finaas, L. 2001. Presenting Music Live, Audiovisually, or
neuro-fuzzy mapping model. Such a system could AurallyDoes It Affect Listeners Experiences Differ-
be able to learn how to react to a given unknown ently? British Journal of Music Education 18(1):5578.
input, either aural or visual, and generate map- Gerhard, D., and D. H. Hepting. 2004. Cross-Modal Para-
pings based on the knowledge acquired through metric Composition. Proceedings of the 2004 Interna-
past experiences. tional Computer Music Conference. San Francisco,
California: International Computer Music Association,
pp. 505512.
Acknowledgments Harrison, J. E., and S. Baron-Cohen. 1997. Synaesthesia:
An Introduction. In S. Baron-Cohen and J. E. Harrison,
eds. Synaesthesia: Classic and Contemporary Read-
Thanks to Luz Mara Cury and Carmen Gonzlez ings. Cambridge, Massachusetts: Blackwell, pp. 316.
for all their help and support. ID-FUSIONES was Hertz, P. 1999. Synesthetic ArtAn Imaginary Num-
funded by a grant for artistic creation and research ber? Leonardo 32:399404.
of the Andes Foundation, Santiago, Chile. TIME EX- Hofstadter, D. R. 1999. Gdel, Escher, Bach: An Eternal
POSURE was funded by a grant from the Center for Golden Braid. New York: Basic.
Interdisciplinary Research in the Arts (CIRA) of Hunt, A., et al. 1998. A Generic Model for Composi-
Northwestern University, Illinois, USA. Thanks tional Approaches to Audio Visual Media. Organised
also to Dr. Richard Ashley, Dr. Gary S. Kendall, and Sound 3(3):199209.
Dr. Scott D. Lipscomb for their valuable comments Iwamiya, S. 1994. Interactions Between Auditory and Vi-
and suggestions. sual Processing When Listening to Music in an Audio
Visual Context: 1. Matching; 2. Audio Quality. Psy-
chomusicology 13:133153.
Kim, E., and S. D. Lipscomb. 2003. An Investigation into
References the Relationship Between Auditory and Visual Signals
in a Multimedia Context. Poster presented at the 2003
Bandemer, H., and S. Gottwald. 1995. Fuzzy Sets, Fuzzy Conference of the Society for Music Perception and
Logic, Fuzzy Methods with Applications. Chichester, Cognition, Las Vegas, Nevada, 1619 June.
UK: Wiley. Klir, G. J., and B. Yuan. 1995. Fuzzy Sets and Fuzzy Logic:
Berman, G. 1999. Synesthesia and the Arts. Leonardo Theory and Applications. Upper Saddle River, New Jer-
32:1522. sey: Prentice Hall.

Cdiz 81
Kosko, B. 1992. Neural Networks and Fuzzy Systems. Ross, T. 2004. Fuzzy Logic with Engineering Applica-
New Jersey: Prentice-Hall. tions. Chichester, UK: Wiley.
Kosko, B. 1993. Fuzzy Thinking: The New Science of Roychowdhury, S. 2001. An Inquiry into the Theory of
Fuzzy Logic. New York: Hyperion. Defuzzification. In W. Pedrycz, ed. Granular Comput-
Kosko, B. 1994. Fuzzy Systems As Universal Approxima- ing: An Emerging Paradigm. Heidelberg: Physica-
tors. IEEE Transactions on Computing 43(11):1329 Verlag, pp. 143162.
1333. Rudi, J. 1998. Computer Music Animations. Organised
Landy, L. 2001. From Algorithmic Jukeboxes to Zero- Sound 3(3):193198.
Time Synthesis: A potential AZ of Music in Tomor- Seashore, C. E. 1967. Psychology of Music. New York:
rows World (A Conference Provocation). Organised Dover.
Sound 6(2):9196. Sirius, G., and E. F. Clarke. 1994. The Perception of Au-
Lipscomb, S. D. 1995. Cognition of Musical and Visual diovisual Relationships: A Preliminary Study. Psy-
Accent Structure Alignment in Film and Animation. chomusicology 13:119132.
Ph. D. thesis, University of California, Los Angeles. Van Campen, C. 1999. Artistic and Psychological Experi-
Lipscomb, S. D., and R. A. Kendall. 1994. Perceptual ments with Synesthesia. Leonardo 32:914.
Judgment of the Relationships Between Musical and Von Altrock, C. 1997. Fuzzy Logic and Neurofuzzy Appli-
Visual Components in Film. Psychomusicology cations in Business and Finance. Upper Saddle River,
13:6069. New Jersey: Prentice Hall.
Lipscomb S. D., and E. Kim. 2004. Perceived Match Be- Weisstein, E. W. 2005. Morphism. From MathWorld
tween Visual Parameters and Auditory Correlates: An A Wolfram Web Resource. Available online at
Experimental Multimedia Investigation. Proceedings http://mathworld.wolfram.com/Morphism.html.
of the 8th International Conference on Music Percep- Whitney, J. 1980. Digital Harmony: On the Complemen-
tion and Cognition. Evanston, Illinois: Society for Mu- tarity of Music and Visual Art. Peterborough, New
sic Perception and Cognition, pp. 7275. Hampshire: McGraw-Hill.
Lokki, T., et al. 1998. Real-Time Audiovisual Rendering Yen, J., and R. Langari. 1999. Fuzzy Logic: Intelligence,
and Contemporary Audiovisual Art. Organised Sound Control, and Information. Upper Saddle River, New
3(3):219233. Jersey: Prentice Hall.
Malki, A. H., and C. G. Umeh. 2000. Design of a Fuzzy Yeo, W. S., J. Berger, and Z. Lee. 2004. SonART: A Frame-
Logic-Based Level Controller. Journal of Engineering work for Data Sonification, Visualization, and Net-
Technology 17(1):3239. worked Multimedia Applications. Proceedings of the
Marks, L. E. 1997. On Colored Hearing Synaesthesia: 2004 International Computer Music Conference. San
Crossmodal Translations of Sensory Dimensions. In Francisco: International Computer Music Association,
S. Baron-Cohen and J. E. Harrison, eds. Synaesthesia: pp. 180184.
Classic and Contemporary Readings. Cambridge, Mas- Ying, H., et al. 1999. Comparison of Necessary Condi-
sachusetts: Blackwell, pp. 4998. tions for Typical Takagi-Sugeno and Mamdani Fuzzy
Mitaim, S., and B. Kosko. 2001. The Shape of Fuzzy Sets Systems As Universal Approximators. IEEE Transac-
in Adaptive Function Approximation. IEEE Transac- tions on Systems, Man, and Cybernetics 29(5):508514.
tions on Fuzzy Systems 9(4):637656. Zadeh, L. A. 1965. Fuzzy Sets. Information and Con-
Rosar, W. H. 1994. Film Music and Heinz Werners The- trol 8:338353.
ory of Physiognomic Perception. Psychomusicology
13:154165.

82 Computer Music Journal

Das könnte Ihnen auch gefallen