Sie sind auf Seite 1von 9

Available online at www.sciencedirect.

com

Knowledge-Based Systems 21 (2008) 200208 www.elsevier.com/locate/knosys

Modelling aective-based music compositional intelligence with the aid of ANS analyses
Toshihito Sugimoto b, Roberto Legaspi a,*, Akihiro Ota b, Koichi Moriyama a, Satoshi Kurihara a, Masayuki Numao a
a b

The Institute of Scientic and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan Department of Information Science and Technology, Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan Available online 23 November 2007

Abstract This research investigates the use of emotion data derived from analyzing change in activity in the autonomic nervous system (ANS) as revealed by brainwave production to support the creative music compositional intelligence of an adaptive interface. A relational model of the inuence of musical events on the listeners aect is rst induced using inductive logic programming paradigms with the emotion data and musical score features as inputs of the induction task. The components of composition such as interval and scale, instrumentation, chord progression and melody are automatically combined using genetic algorithm and melodic transformation heuristics that depend on the predictive knowledge and character of the induced model. Out of the four targeted basic emotional states, namely, stress, joy, sadness, and relaxation, the empirical results reported here show that the system is able to successfully compose tunes that convey one of these aective states. 2007 Elsevier B.V. All rights reserved.
Keywords: Adaptive user interface; EEG-based emotion spectrum analysis; User modelling; Automated reasoning; Machine learning

1. Introduction It is no surprise that only a handful of research works have factored in human aect in creating an intelligent music system or interface (e.g., [1,6,13,17,23]). One major reason is that the general issues alone when investigating music and emotion are enough to immediately confront and intimidate the researcher. More specically, how can music composition, which is a highly structured cognitive process, be modelled and how can emotion, which consists of very complex elements and is dependent on individuals and stimuli, be measured? [7]. The other is the fact that music is a reliable elicitor of aective response immediately raises the question as to what exactly in music can inuence an individuals mood. For example, is it the case that musical structures contain related musical events (e.g., chord progression, melody change, etc.) that allow emotionally*

Corresponding author. Tel.: +81 6 6879 8426; fax: +81 6 6879 8428. E-mail address: roberto@ai.sanken.osaka-u.ac.jp (R. Legaspi).

stimulating mental images to surface? Although attempts have been made to pin-point which features of the musical structure elicit which aect (e.g., [2,20]), the problem remains compelling because the solutions are either partial or uncertain. Our research addresses the problem of determining the extent by which emotion-inducing music can be modelled and generated using creative music compositional AI. Our approach involves inducing an aects-music relations model that describes musical events related to the listeners aective reactions and then using the predictive knowledge and character of the model to automatically control the music generation task. We have embodied our solution in a constructive adaptive user interface (CAUI) that rearranges or composes [13] a musical piece based on ones aect. We have reported the results of combining inductive logic programming (in [8,13]) or multiple-part learning (in [7]) to induce the model and a genetic algorithm whose tness function is inuenced by the model. In these previous versions of the CAUI, an evaluation instrument based on

0950-7051/$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2007.11.010

T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

201

the semantic dierential method (SDM) was used to measure aective responses. The listener rated musical pieces on a scale of 15 for a set of bipolar aective descriptor pairs (e.g., happysad). Each subjective rating indicates the degree of the positive or negative aect. We argue that for the CAUI to accurately capture the listeners aective responses, it must satisfy necessary conditions that the SDM-based self-reporting instrument does not address. Emotion detection must capture the dynamic nature of both music and emotion. With the rating instrument, the listener can only evaluate after the music is played. This means that only one evaluation is mapped to the entire musical piece rather than having possibly varied evaluations as the musical events unfold. Secondly, the detection task should not impose a heavy cognitive load upon the listener. It must ensure that listening to music remains enjoyable and avoid, if not minimize, disturbing the listener. In our prior experiments, the listener was asked to evaluate 75 musical pieces, getting interrupted the same number of times. If indeed the listener experienced stress or anxiety in the process, it was dicult to factor this in the calculations. Lastly, the emotion detection task should be language independent, which can later on permit cross-cultural analyses. This exibility evades the need to change the aective labels (e.g., Japanese to English). We believe that the conditions stated above can be satised by using a device that can analyze emotional states by observing the change in activity in the autonomic nervous system (ANS). Any intense feeling has consequent physiological eects on the ANS [19]. These eects include faster and stronger heartbeat, increased blood pressure or breathing rate, muscle tension and sweating, accelerated mental activity, among others. This is the reason ANS eects can be observed using devices that can measure blood pressure, skin or heart responses, or brainwave production. Researchers in the eld of aective computing are active in developing such devices (e.g., [14]). We have modied the learning architecture of the CAUI to incorporate an emotion spectrum analyzing system (ESA)1 that detects emotional states by observing brainwave activities that accompany the emotion [11]. The learning architecture is shown in Fig. 1. The relational model is induced by employing the inductive logic programming paradigms of FOIL and R taking as inputs the musical score features and the ESA-provided emotion data. The musical score features are represented as denitions of rst-order logic predicates and serve as background knowledge to the induction task. The next task employs a genetic algorithm (GA) that produces variants of the original score features. The tness function of the GA ts each generated variant to the knowledge provided by the model and music theory. Finally, the CAUI creates using its melody-generating module an initial tune consist-

Fig. 1. The learning architecture of the CAUI.

ing of the GA-obtained chord tones and then alters certain chord tones to become non-chord tones in order to embellish the tune. Using the ESA has several advantages. The dynamic changes in both emotions and musical events can now be monitored and mapped continuously over time. Secondly, it allows mapping of emotion down to the musical bar level. This means that many training examples can be obtained from a single piece. Using the self-reporting instrument, the listener needed to hear and evaluate many musical pieces just to obtain fairly enough examples. Thirdly, more accurate measurements can now be acquired objectively. Lastly, it is unobtrusive thereby relieving the listener of any cognitive load and allowing him/her to just sit back and listen to the music. In this paper, we rst discuss the domain knowledge representations, learning parameters and learning tasks used for the CAUI in Sections 2 to 3. Section 4 details our experimentation methodology and analysis of the empirical results we gathered. Section 5 briey locates the contribution of the CAUI in the eld. Discussions on what we intend to carry out as possible future works can be found part of our analysis and conclusion. 2. Knowledge acquisition and representation In order to obtain a personalized model of the coupling of emotional expressions and the underlying music parameters, it is vital to: (1) identify which musical features (e.g., tempo, rhythm, harmony, etc.) should be represented as background knowledge, (2) provide an instrument to map the features to identied emotion descriptors, (3) logically represent the music parameters, and (4) automatically induce the model. Although the inuence of various features have been well studied (e.g., refer to a comprehensive summary on the inuence of compositional parameters [2] and an overview of recent investigations on the inuence of performance parameters [4,5]), the task of the CAUI is to automatically nd musical structure and sequence features that are inuential to specic emotions. 2.1. Music theory

1 Developed by the Brain Functions Laboratory, Inc. (http:// www.b.co.jp/main.html).

The aspect of music theory relevant to our research is the interaction of music elements into patterns that can

202

T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

Fig. 2. Basic aspects of music theory that are being used for this version of the CAUI.

help the composition techniques. We have a narrow music theory that consists of a limited set of music elements (see Fig. 2). The reason is that we need the predictive model to be tractable in order to perform controlled experimentations and obtain interpretable results. The denitions of the concepts listed in Fig. 2 can be found in texts on music theory. The methods by which music theory is utilized by the genetic algorithm and melodic transformation heuristics are explained in Section 3. Fourteen musical piece segments were prepared consisting of four pieces from classical music, three from Japanese Pop, and seven from harmony textbooks. The amount of time a segment may play is from 7.4 to 48 s (an average of 24.14 s). These pieces were selected, albeit not randomly, from the original 75 segments that were used in our previous experiments. Based on prior results, these selected pieces demonstrate a high degree of variance in emotional content when evaluated by previous users of the system. In other words, these pieces seem to elicit aective avours that are more distinguishable. 2.2. Emotion acquisition features of the ESA Through proper signal processing, scalp potentials that are measured by an electroencephalograph (EEG) can provide global information about mental activities and

emotional states [11]. With the ESA, EEG features associated with emotional states are extracted into a set of 45 cross-correlation coecients. These coecients are calculated for each of the h(58 Hz), a(813 Hz) and b(13 20 Hz) frequency components forming a 135-dim EEG state vector. Operating a transformation matrix on this state vector linearly transforms it to a 4-dim vector E = e1,e2,e3,e4, with the four components representing levels of stress, joy, sadness and relaxation, respectively. The maximum time resolution of the emotion analysis performed in real-time is 0.64 s. More detailed discussions on the ideas behind ESA can be found in [11]. The emotion charts in Fig. 3 graphically show series of readings that were taken over time. The higher the value means the more evident is the emotion being displayed. The two wave charts at the bottom indicate levels of alertness and concentration, respectively. These readings help gauge the reliability of the emotion readings. For example, the level of alertness should be high when the music is being played indicating that the listener is being keen to the tune. Low alert points are valid so long as these correspond to the silent pauses inserted between tunes since there is no need for the user to listen to the pauses. However, acceptably high values for concentration should be expected at any point in time. The collected emotion data are then used by the model induction task.

Fig. 3. EEG signals used for emotion analyses are obtained using scalp electrodes.

T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

203

Brainwave analysis is a delicate task that can easily be distorted by external factors including an eye blink. Hence, careful attention needs to be given when acquiring the readings. The listener needs to be in a closed room with minimal noise and other external distractions as possible. The listener is also required to close his/her eyes at all times. This set-up is necessary to obtain stable readings. Any series of measurements should be taken without disturbing the listener. 2.3. First-order logic representation of the score features The background knowledge of the CAUI are denitions in rst-order logic that describe musical score features. The language of rst-order logic, or predicate logic, is known to be well-suited both for data representation and describing the desired outputs. The representational power of predicate logic permits describing existing feature relations among data, even complex relations, and provides comprehensibility of the learned results [12]. Score features were encoded into a predicate variable, or relation, named music(), which contains one song_frame()

and a list of sequenced chord() relations describing the frame and chord features, respectively. Fig. 4 shows the music() representation (- means NIL) of the musical score segment of the prelude of Jacques Oenbachs aux Enfers. Orphee The CAUI needs to learn three kinds of target relations or rules, namely, frame(), pair() and triplet(), wherein the last two represent patterns of two and three successive chords, respectively. These rules comprise the aects-music relational model. Fig. 5-left, for example, shows structural information contained in the given sample relations and the actual musical notation they represent. Fig. 5-right shows a segment of an actual model learned by the CAUI that can be used to construct a musical piece that is supposed to induce in one user a sad feeling. 2.4. Model induction using FOIL and R The CAUI employs the combination of FOIL and R (Renement by Example) to model the musical structures that correlate with the listeners emotions with the musical structures comprising the set of training examples.

Fig. 4. A musical score represented in music() predicate.

Fig. 5. A segment of a set of rules that are supposed to stimulate a sad feeling.

204

T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

FOIL [16] is a rst-order inductive learning system that induces a theory represented as function-free Horn clauses. Each clause is a conjunction of literals, where each literal consists of a relation and an ordering of the variable arguments of the relation. The training examples are represented extensionally as sets of ground tuples, i.e., the constant values of the relations present in the examples. Tuples belonging or not belonging to the relation are labelled as and tuples, respectively. FOIL assumes that all tuples exhibit a relationship R and the tuples do not. FOIL iteratively learns a clause of the theory and removes from the training set the tuples of the relation R covered by that clause until all tuples are covered by one or more clauses. Induction of a single clause starts with it having an empty body, and body literals are iteratively added at the end of the clause until no tuple is covered by the clause. FOIL selects one literal to be added from a set of candidate literals based on an information gain heuristic that estimates the utility of a literal in discriminating from tuples. The information gained for adding a literal is computed as GainLi T I T i I T i1 i I T i log2 T i
T i Ti

I T i1 log2

T i1 T i1

T i1

1 2

T i and T i denote the number of and tuples in the training set Ti. Adding the literal Lm to the partially developing clause: = R(v1,v2,. . .,vk) :-L1, L2,. . .,Lm1 results to the new set Ti+1, which contains the tuples that remained from Ti. T denotes the number of tuples in T i i that led to another tuple after adding Lm. The candidate literal Li that yields the largest gain becomes Lm. R [21] is a system that automatically renes the theory in the function-free rst-order logic. It assumes that the induced theory can only be approximately correct, hence, needs to be rened to improve its accuracy using the training examples. R implements a four-step theory revision process, i.e., (1) operationalization, (2) specialization, (3) rule creation, and (4) unoperationalization. Operationalization expands the theory into a set of operational clauses, detecting and removing useless literals. A literal is useful if its normalized gain, i.e., computing only for I(Ti) I(Ti+1) of Eq. (1), is >h, where h is a specied threshold, and if it produces new variables for the other literals in the clause, i.e., it is generative [21]. R considers the useless literals as faults in the theory. Specialization uses FOIL to add literals to the overly general clauses covering tuples to make them more specic. Rule creation uses FOIL to introduce more operational clauses in case some tuples cannot be covered by existing ones. Finally, unoperationalization re-organizes the clauses to reect the hierarchical structure of the original theory. The training examples suitable for inducing the model are generated as follows. Each musical piece is divided into musical bars or measures. A piece may contain eight to 16

bars (an average of 11.6 bars per piece). Every three successive bars in a piece together with the music frame are treated as one training example, i.e., examplei = (frame, bari2, bari1, bari). Each bar consists of a maximum of four chords. The idea here is that sound owing from at least three bars is needed to elicit an aective response. The rst two examples in every piece, however, will inherently contain only one and two bars, respectively. The components of each bar are extracted from music() and represented as ground tuples. A total of 162 examples were obtained from the 14 pieces with each bar having an average playtime of 2.1 s. Recall that emotion readings are taken while the music is being played. Using the available synchronization tools of the ESA and music segmenting tools, the emotion measurements are assigned to the corresponding musical segments. Subsequently, each emotion measure is discretized to a value between 1 and 5 based on a pre-determined threshold. Using the same range of values as that of the SDM-based instrument permits us to retain the learning techniques in [8] while evaluating the new emotion detection scheme. It is also plausible for us to dene a set of bipolar aective descriptor pairs ed1ed2 (e.g., joyfulnot joyful). It is important to note that antonymic semantics (e.g., stressed vs. relaxed and joyful vs. sad) do not hold for the ESA since the four emotions are dened along orthogonal dimensions. Hence, four separate readings are taken instead of just treating one as inversely proportional to the other. This is consistent with the circumplex model of aect [15] where each of the four emotions can be seen in dierent quadrants of this model. One relational model is learned for each aect in the four bipolar emotion pairs ed1ed2 (a total of 4 2 = 8 models). To generate the training instances specic to FOIL, for any emotion descriptor ed1 in the pair ed1ed2, the examples labelled as 5 are represented as tuples, while those labelled as 64 as tuples. Conversely for ed2, and tuples are formed from bars which were evaluated as 1 and P2, respectively. In other words, there are corresponding sets of and tuples for each aect and a tuple for ed1 does not mean that it is a tuple for ed2. Examples are derived almost in the same way for FOIL + R . For example, the tuples of ed1 and ed2 are formed from bars labelled as P4 and 62, respectively. 3. Composing using GA and melody heuristics Evolutionary computational models have been dominating the realm of automatic music composition (as reviewed by [24]). One major problem in user-oriented GA-based music creation (e.g., [3,22]), however, is that the user is required to listen and then rate the composed musical sequences in each generation. This is obviously burdensome, tiring and time-consuming. Although the CAUI is user-oriented, it need not solicit user intervention since it uses the relational model as critic to control the quality of the composed tunes.

T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

205

Fig. 6. GA chromosome structure and operators.

We adapted the conventional bit-string chromosome representation in GA as a columns-of-bits representation expressed in music() form (see Fig. 6, where F is the song_ frame() and Ci is a chord()). Each bit in a column represents a component of the frame (e.g., tempo) or chord (e.g., root). The performance of our GA depends on two basic operators, namely, single-point crossover and mutation. With the rst operator, the columns of bit strings from the beginning of the chromosome to a selected crossover point is copied from one parent and the rest is copied from the other. Mutation inverts selected bits thereby altering the individual frame and chord information. The more fundamental components (e.g., tempo, rhythm and root) are mutated less frequently to avoid a drastic change in musical events, while the other features are varied more frequently to acquire more variants. The fundamental idea of GA is to produce increasingly better solutions in each new generation of the evolutionary process. During the genetic evolution process, candidate chromosomes are being produced that may be better or worse than what has already been obtained. Hence, the tness function is necessary to evaluate the utility of each candidate. The CAUIs tness function takes into account the user-specic relational model and music theory: fitnessChromosomeM fitnessUserM fitnessTheory M 3

Table 1 Meanings of the objects in Eq. (5) tnessX tnessFrame tnessPair tnessTriplet Pi (component/s of M) song_frame() (chordi(), chordi+1()) (chordi(), chordi+1(), chordi+2()) L 1 n1 n2 Target relation frame() pair() triplet()

where M is a candidate chromosome. This function makes it possible to generate frames and chord progressions that t the music theory and stimulate the target feeling. tnessUser(M) is computed as follows: fitnessUserM fitnessFrameM fitnessPairM fitnessTripletM 4

scripts denote chord positions, computing for tnessPair(M) will have 7 Pis (L = 8-1): P1 = (chord1(), chord2()),. . .,P7 = (chord7(),chord8()). With tnessFrame(M), it will only be P1 = song_frame(). The values of the subfunctions in Eq. (5) will dier depending on whether an ed1 (e.g., sad) or ed2 (e.g., not sad) music is being composed. Let us denote the target aect of the current composition as emoP and the opposite of this aect as emoN (e.g., if ed1 is emoP then emoN refers to ed2, and vice versa). dF and dFR (where F and FR refer to the models obtained using FOIL alone or FOIL +R , respectively) return +2 and +1, respectively, if Pi appears in any of the corresponding target relations (see Table 1) in the model learned for emoP. On the other hand, d0F and d0FR return 2 and 1, respectively, if Pi appears in any of the corresponding relations in the emoN model. In eect, the structure Pi is rewarded if it is part of the desired relations and is penalized if it also appears in the model for the opposite aect since it does not possess a distinct aective avour. The returned values (2 and 1) were determined empirically. tnessTheory(M) seeks to reward chromosomes that are consistent with our music theory and penalize those that violate. This is computed in the same way as Eq. (4) except that each of the three functions at the right shall now be computed as fitnessX M
L X i1

Each function at the right-hand side of Eq. (4) is generally computed as follows: fitnessX M
L X i1

AveragegP i

AveragedF P i ; d0F P i ; dFR P i ; d0FR P i 5

The meanings of the objects in Eq. (5) are shown in Table 1. The only variable parameter is Pi, which denotes the component/s extracted from M, that will serve as input to the four subfunctions of tnessX. If there are n chord() predicates in M, there will be L Pis formed depending on the tnessX. For example, given chromosome M: = music (song_frame(),chord1(),. . .,chord8()), where the added sub-

The denitions of the objects in Eq. (6) follow the ones in Table 1 except that Pi is no longer checked with the relational models but with the music theory. The subfunction g returns the score of tting Pi with the music theory, which is either a reward or a penalty. Structures that earn a high reward include frames that have complete or half cadence, chord triplets that contain the transition T S D of the tonal functions tonic (T), subdominant (S) and dominant (D), and pairs that transition from dominant to secondary dominant (e.g., V/II II). On the other hand, penalty is given to pairs or triplets that have the same root, form

206

T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208

Fig. 7. An actual GA-generated musical piece.

and inversion values, have the same tonal function and form, or have the transition D S. All these heuristics are grounded in basic music theory. For example, the cadence types are scored based on the strength of their eects such that the complete cadence is given the highest score since it is the strongest. Another is that the transition T S D is rewarded since it is often used and many songs have been written using this. D S is penalized since a dominant chord will not resolve with a subdominant. Overall, the scheme we just described is defensible given that music theory can be represented using heuristics for evaluating the tness of each GA-generated music variant. The character of each generated variant is immediately t not just to the music theory, but more importantly, to the desired aective perception. It is also clear in the computations that the presence of the models permits the absence of human intervention during composition thereby relieving the user of unnecessary cognitive load and achieving full automation. Fig. 7 shows one of the best-t GAgenerated chromosomes to stimulate a sad feeling. The outputs of the GA contain only chord progressions. Musical lines with only chord tones may sound monotonous or homophonic. A non-chord tone may serve to embellish the melodic motion surrounding the chord tones. The CAUIs melody-generating module rst generates chord tones using the GA-obtained music() information and then utilizes a set of heuristics to generate the nonchord tones in order to create a non-monotonic piece of music. To create the chord tones, certain aspects of music theory are adopted including the harmonic relations V7 I (or D T, which is known to be very strong), T D, T S, S T, and S D, and keeping the intervals in octaves. Once the chord tones are created, the non-chord tones, which are supposed to be not members of the accompanying chords, are generated by selecting and disturbing the chord tones. All chord tones have an equal chance of being selected. Once selected, a chord tone is modied into a non-chordal broderie, appoggiatura or passing tone. How these non-chord tones are adopted for the CAUI is detailed in [7]. 4. Experimentation and analysis of results We performed a set of individualized experiments to determine whether the CAUI-composed pieces can actually stimulate the target emotion. Sixteen subjects were asked to

hear the 14 musical pieces, at the same time, wear the ESAs helmet. The subjects were all Japanese male with ages ranging from 18 to 27 years. Although it is ideal to increase the heterogeneity of the subjects prole, it seems more appropriate at this stage to limit their diversity in terms of their background and focus more on the possibly existing dierences in their emotional reactions. For the subject to hear the music playing continuously, all the pieces were sequenced using a music editing tool and silent pauses of 15 s each were inserted before and after each piece with the exemption of the rst which is preceded by a 30-s silence so as to condition the subject. Personalized models were learned for each subject based on their emotion readings and new pieces were composed independently for each. The same subjects were then asked to go through the same process using the set of newly composed pieces. Twenty-four tunes were composed for each subject, i.e., three for each of the bipolar aective descriptors. Fig. 8 shows that the CAUI was able to compose a sad piece, even without prior handcrafted knowledge of any aect-inducing piece. We computed for the dierence of the averaged emotion readings for each ed1ed2 pair. The motivation here is that the higher the dierence is the more distinct/distinguishable is the aective avour of the composed pieces. We also performed a paired t-test on the dierences to determine if these are signicant. Table 2 shows that the composed sad pieces are the only ones that correlate with the subjects emotions. A positive dierence was seen in many instances, albeit not necessarily signicant statistically. This indicates that the system is not able to dierentiate the structures that can arouse such impressions. The version of the CAUI reported in [8] is similar to the current except for two things: (1) it used self-reporting and (2) evaluated on a whole-music, instead of bar, level. Its compositions are signicant in only two out of six emotion dimensions at level a = 0.01 using students t-test. The current version used only 14 pieces but was able to produce signicant outputs for one emotion. This shows that we cannot easily dismiss the potential of the current version. The results obtained can be viewed as acceptable if the current form of the research is taken as a proof of concept. The acceptably sucient result for one of the emotion dimensions shows a promise in the direction we are heading and motivates to further enhance the systems capability in terms of its learning techniques. The unsatisfactory

Fig. 8. A CAUI-composed sad musical piece.

T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 Table 2 Results of empirical validation Subject Stressed Joyful Sad Relaxed Average dierence of ed1 (+) and ed2 () emotion analyses values A 1.67 2.33 0.67 3.00 B 0.67 0.33 1.33 1.33 C 1.00 1.00 0.67 1.33 D 1.00 0.67 0.67 2.33 E 2.67 1.00 1.33 1.00 F 0.67 0.33 0.00 0.67 G 0.67 0.33 1.67 1.33 H 1.00 0.00 1.33 0.67 I 0.67 0.33 1.67 0.67 J 0.67 0.33 0.33 2.00 K 0.33 0.33 0.67 0.00 L 0.67 0.33 2.33 0.00 M 0.67 0.33 0.33 1.33 N 0.33 2.33 1.00 2.00 O 0.33 0.33 0.67 1.00 P 1.67 1.67 0.00 1.00 Average Sample variance Standard error t Value Signicant (5%) Signicant (1%) 0.13 1.18 0.28 0.45 False False 0.04 1.07 0.27 0.16 False False 0.63 0.85 0.24 2.63 True True 0.02 2.12 0.38 0.06 False False

207

sition task. This is in contrast to certain works that did not deal with music composition even if they have achieved detecting the emotional inuence of music (e.g., [1,9]) or to systems that solicit users ratings during composition (e.g., [22,23]). Other works attempt to compose music with EEG or other biological signals as direct generative source (e.g., refer to the concepts outlined in [18]) but may not necessarily distinguish the aective characteristics of the composed pieces. We single out the work of Kim and Andre [6] which deals with more aective dimensions whose measures are based on users self-report and results of physiological sensing. It diers with the CAUI in the sense that it does not induce a relational model and it dealt primarily with generating rhythms. 6. Conclusion This paper proposes the technique of composing music based on the users emotions as analyzed from changes in brainwave activities. The results reported here show that learning is feasible even with the currently small training set. The current architecture also permitted evading a tiring and burdensome self-reporting as emotion detection task while achieving partial success in composing an emotioninducing tune. We cannot deny that the system falls a long way short of human composers, nevertheless, we believe that the potential of its compositional intelligence should not be easily dismissed. The CAUIs learning architecture will remain viable even if other ANS measuring devices are used. The problem with the ESA is that it practically limits itself from being bought by ordinary people since it is expensive and it restricts users mobility (e.g., eye blinks can easily introduce noises). We are currently developing a multi-modal emotion recognition scheme that will allow us to investigate other means to measure expressed emotions (e.g., through ANS response and human locomotive features) using devices that permit mobility and are cheaper than the ESA. References
[1] R. Bresin, A. Friberg, Emotional coloring of computer-controlled music performance, Computer Music Journal 24 (4) (2000) 4462. [2] A. Gabrielsson, E. Lindstrom, The inuence of musical structure on emotional expression, in: P.N. Juslin, J.A. Sloboda (Eds.), Music and Emotion: Theory and Research, Oxford University Press, New York, 2001, pp. 223248. [3] B.E. Johanson, R. Poli, GP-Music: An interactive genetic programming system for music generation with automated tness raters, Technical Report CSRP-98-13, School of Computer Science, The University of Birmingham, 1998. [4] P.N. Juslin, Studies of music performance: A theoretical analysis of empirical ndings, in: Proc. Stockholm Music Acoustics Conference, 2003, pp. 513516. [5] P.N. Juslin, J.A. Sloboda, Music and Emotion: Theory and Research, Oxford University Press, New York, 2001. [6] S. Kim, E. Andre, Composing aective music with a generate and sense approach, in: V. Barr, Z. Markov (Eds.), Proc. 17th International FLAIRS Conference, Special Track on AI and Music, AAAI Press, 2004.

results obtained for the other emotion descriptors can also be attributed to shortcomings in creating adequately structured tunes due to our narrow music theory. For instance, the composed tunes at this stage consist only of eight bars and are rhythmically monotonic. Admittedly, we need to take more of music theory into consideration. Secondly, since the number of training examples has been downsized, the number of distinct frames, i.e., in terms of attribute values, became fewer. There is no doubt that integrating the more complex musical knowledge and scaling to a larger dataset are feasible provided that the CAUI suciently denes and represents the degrees of musical complexity (e.g., structure in the melody) and acquires the needed storage to store the training data (this has become our immediate obstacle). It is also an option to investigate the eect of just a single music element that is very inuential in creating music and stimulating emotions (e.g., the role of beat in African music). This will permit a more focused study while lessening the complexity in scope. 5. Related works To comprehend the signicant link that unites music and emotion has been a subject of considerable interest involving various elds (refer to [5]). For about ve decades, articial intelligence has played a crucial role in computerized music (reviewed in [10]), yet there seems to be a scarcity of research that tackles the compelling issues of a user aectspecic automated composition. As far as our limited knowledge of the literature is concerned, it has been dicult to nd a study that aims to measure the emotional inuence of music and then heads towards a fully automated compo-

208

T. Sugimoto et al. / Knowledge-Based Systems 21 (2008) 200208 ment, and psychopathology, Development and Psychopathology 17 (2005) 715734. J.R. Quinlan, Learning logical denitions from relations, Machine Learning 5 (1990) 239266. D. Riecken, Wolfgang: Emotions plus goals enable learning, in: Proc. IEEE International Conference on Systems, Man and Cybernetics, 1998, pp. 11191120. D. Rosenboom, Extended Musical Interface with the Human Nervous System: Assessment and Prospectus, Leonardo Monograph Series, Monograph No. 1 (1990/1997). C. Roz, The autonomic nervous system: Barometer of emotional intensity and internal conict, A lecture given for Confer, 27 March 2001, a copy can be found in: http://www.thinkbody.co.uk/papers/ autonomic-nervous-system.htm. J.A. Sloboda, Music structure and emotional response: some empirical ndings, Psychology of Music 19 (2) (1991) 110120. S. Tangkitvanich, M. Shimura, Rening a relational theory with multiple faults in the concept and subconcept, in: Machine Learning: Proc. of the Ninth International Workshop, 1992, pp. 436444. M. Unehara, T. Onisawa, Interactive music composition system Composition of 16-bars musical work with a melody part and backing parts, in: Proc. IEEE International Conference on Systems, Man and Cybernetics, 2004, pp. 57365741. M. Unehara, T. Onisawa, Music composition system based on subjective evaluation, in: Proc. IEEE International Conference on Systems, Man and Cybernetics, 2003, pp. 980986. G.A. Wiggins, G. Papadopoulos, S. Phon-Amnuaisuk, A. Tuson, Evolutionary Methods for Musical Composition, International Journal of Computing Anticipatory Systems 1 (1) (1999).

[7] R. Legaspi, Y. Hashimoto, K. Moriyama, S. Kurihara, M. Numao, Music compositional intelligence with an aective avour, in: Proc. 12th International Conference on Intelligent User Interfaces, ACM Press, 2007, pp. 216224. [8] R. Legaspi, Y. Hashimoto, M. Numao, An emotion-driven musical piece generator for a constructive adaptive user interface, in: Proc. 9th Pacic Rim International Conference on Articial Intelligence, Lecture Notes in Articial Intelligence, vol. 4009, Springer, 2006, pp. 890894. [9] T. Li, M. Ogihara, Detecting emotion in music, in: Proc. 4th International Conference on Music Information Retrieval, 2003, pp. 239240. [10] R. Lopez de Mantaras, J.L. Arcos, AI and Music: From Composition to Expressive Performances, AI Magazine 23 (3) (2002) 4357. [11] T. Musha, Y. Terasaki, H.A. Haque, G.A. Ivanitsky, Feature Extraction from EEGs Associated with Emotions, Artif Life Robotics 1 (1997) 1519. [12] C. Nattee, S. Sinthupinyo, M. Numao, T. Okada, Learning rst-order rules from data with multiple parts: Applications on mining chemical compound data, in: Proc. 21st International Conference on Machine Learning, 2004, pp. 7785. [13] M. Numao, S. Takagi, K. Nakamura, Constructive adaptive user interfaces Composing music based on human feelings, in: Proc. 18th National Conference on AI, AAAI Press, 2002, pp. 193 198. [14] R.W. Picard, J. Healey, Aective Wearables, Personal and Ubiquitous Computing 1 (4) (1997) 231240. [15] J. Posner, J.A. Russell, B.S. Peterson, The circumplex model of aect: an integrative approach to aective neuroscience, cognitive develop-

[16] [17]

[18]

[19]

[20] [21]

[22]

[23]

[24]

Das könnte Ihnen auch gefallen