Sie sind auf Seite 1von 10

European Journalof

P. Wilhelm
Psychological
& D. Schoebi:
Assessment
2007
Assessing
Hogrefe
2007;Mood
Vol.
& Huber
23(4):258267
in Daily
Publishers
Life

Assessing Mood in Daily Life


Structural Validity, Sensitivity to Change,
and Reliability of a Short-Scale to
Measure Three Basic Dimensions of Mood
Peter Wilhelm1 and Dominik Schoebi1,2
1

University of Fribourg, Switzerland, 2University of California, Los Angeles, USA

Abstract. The repeated measurement of moods in everyday life, as is common in ambulatory monitoring, requires parsimonious scales,
which may challenge the reliability of the measures. The current paper evaluates the factor structure, the reliability, and the sensitivity
to change of a six-item mood scale designed for momentary assessment in daily life. We analyzed data from 187 participants who reported
their current mood four times per day during seven consecutive days using a multilevel approach. The results suggest that the proposed
three factors Calmness, Valence, and Energetic arousal are appropriate to assess fluctuations within persons over time. However, calmness
and valence are not distinguishable at the between-person level. Furthermore, the analyses showed that two-item scales provide measures
that are reliable at the different levels and highly sensitive to change.
Keywords: ambulatory assessment, ecological momentary assessment, electronic diary, mood, affect, multilevel confirmatory factor
analysis

Introduction
The repeated measurement of moods and emotions with
high frequency is common in ambulatory psychological
and psychophysiological assessment. Measurement schedules range from one assessment per day taken for several
weeks (e.g., Cranford, Shrout, Iida, Rafaeli, Yip, & Bolger,
2006) to high-frequency assessment within a 24 h period
(e.g., Ebner-Priemer & Sawitzki, 2007; Myrtek, 2004). Because of the high repetition rate in such studies, the duration
of a single assessment should be kept short to minimize the
burden on participants. The higher the participants burden
caused by the frequency and duration of single assessments, the more likely their compliance and motivation to
give valid responses will decline. Moreover, when participants need to rate redundant items, additional effects like
the exaggeration of subtle differences between items may
occur, compromising the psychometric properties of a scale
(Bolger, Davis, & Rafaeli,2003; Fahrenberg, Leonhart, &
Foerster, 2002; Lucas & Baird, 2006).
Consequently, some researchers have used single items
to assess different facets of mood (e.g., Fahrenberg, Httner, & Leonhart, 2001; Myrtek, 2004). The use of single
items, however, raises the problem that the reliability of the
state specific component of the measure cannot be determined and separated from measurement error. Therefore, a
variety of multi-item mood scales have been used, ranging
from long item lists (e.g., Buse & Pawlik, 1996; Kubiak &
Jonas, 2007) to specifically designed or adapted short
European Journal of Psychological Assessment 2007; Vol. 23(4):258267
DOI 10.1027/1015-5759.23.4.258

scales (e.g., Cranford et al., 2006). For these short scales,


reliability coefficients and sometimes factor structures
have been reported, which are usually based on the analyses of the between-person variance (e.g., individuals averages over time). Yet, the within-person variance has often
been ignored (for exceptions see e.g., Buse & Pawlik, 1996,
2001; Cranford et al., 2006; Schimmack, 2003; Zelinski &
Larsen, 2000; Zevon & Tellegen, 1982).
The goal of this article is to evaluate the psychometric
properties of a parsimonious six-item mood measure that
was developed to assess three basic dimensions of mood in
peoples daily lives. We do so using a multilevel modeling
approach to investigate the variance and covariance between items at the between-person and the within-person
level simultaneously.

What Are Moods?


Moods are rather diffuse affective states that subtly affect
our experience, cognitions, and behavior. They operate
continuously and provide the affective background, the
emotional color to all that we do (Davidson, 1994, p. 52).
Moods can be consciously experienced as soon as they gain
the focus of our attention, and are then characterized by the
predominance of certain subjective feelings.
Moods should be distinguished from emotions. Although the definition of emotions depends heavily on theoretical frameworks (e.g., Ekman & Davidson, 1994; Lew 2007 Hogrefe & Huber Publishers

P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life

259

Table 1. Correlations between the three basic dimensions Valence (V), Energetic arousal (E), and Calmness (C) in different
studies
r(ValenceEnergetic arousal)

r(ValenceCalmness)

r(Energetic arousalCalmness)

.49

.70#

.33#

Study 2: 135 American students, two times

.47

.57#

.20#

c,a

Schimmack & Reisenzein (2002, p. 415)


710 American and Canadian students

.46

.65#

.28#

Steyer et al. (1997, p. 14)b


503 German participants; 47% students, four times

.50 to .62

.66 to .72

.43 to .53

Matthews et al. (1990, p. 25)c


388 British participants, mostly students

.43

.37#

.04#

Schimmack & Grob (2000, p. 335, 337)


Study 1: 207 American students

Notes:
The original dimensions were labeled as follows: pleasure displeasure V, awake tiredness E, tension relaxation C:
b
The original dimensions were labeled as follows: good bad mood V, wakefulness tiredness E, calmnessuncalmness C:
c
The original dimensions were labeled as follows: valence / hedonic tone V, energetic arousal E, tense arousal C

Correlations were between latent factors and, therefore, adjusted for measurement error
#Calmness was coded the other way around, such that high values indicated high tension. To ensure comparability with our coding system, the
signs of the original correlations were reversed.
a

is & Haviland-Jones, 2000), most researchers would agree


that emotions are short-term reactions to events or stimuli
that manifest themselves in different subsystems of the organism (expression and behavior, physiology, subjective
experience, and cognitions). In contrast to emotions, moods
are not necessarily linked to an obvious cause that can be
related to an event and its specific appraisal. They show
little synchronization of the different subsystems, do not
interrupt ongoing behavior, and do not prepare immediate
actions (Scherer, 2005). Usually the intensity of moods is
low to medium and they may last over hours and days.

How Can Moods Be Conceptualized and


Measured?
During the last two decades competing two-dimensional approaches have dominated the discussion about the structure
of mood and affect. One model, proposed by Russell, assumes that the core affect of a feeling is a single integral
blend of the independent dimensions valence and arousal
(Russell, 2003, p. 148). Russell, Weiss, and Mendelsohn
(1989) introduced an affect grid to assess valence and
arousal simultaneously via two items. Its brevity makes the
affect grid very attractive for ambulatory assessment research
(Reicherts, Salamin, Maggiori, & Pauls, 2007). However, because each dimension is assessed with one item only, measurement error cannot be determined for a single occasion. In
addition, Schimmack and Grob (2000) criticized that the labels of the activation dimension are not close to common
language and experience. In contrast to Russell, Thayer
(1989) argued that two basic arousal dimensions need to be
distinguished to describe a mood state, namely tense arousal
(relaxationtension), and energetic arousal (tirednesswakefulness). In Thayers view, valence is not a separate dimension, but a mix of his basic arousal dimensions.
2007 Hogrefe & Huber Publishers

Watson and Tellegen (1985) proposed that affects can be


described by two uncorrelated basic dimensions, which are
called positive affect (PA) and negative affect (NA). They
developed the Positive and Negative Affect Schedule
(PANAS) to measure each dimension with 10 unipolar items.
According to Watson and Vaidya (2003, p. 356) the PANAS
has gained much popularity because of the rich body of
psychometric data that have established the reliability and
validity of the scales. However, the validity of the theoretical
conception of the PA and NA dimensions, the factorial solution on which they are based, and the difficulty in interpreting
the scores and relating them to commonly experienced feelings were criticized (e.g., Fahrenberg, 2006; Russell & Carroll, 1999a; Schimmack, 1999). Moreover, some critics rejected the basic assumption that affect can be sufficiently
described by two orthogonal dimensions (Matthews, Jones,
& Chamberlain, 1990; Schimmack & Grob, 2000). They advocated a model in which valence (V; ranging from unpleasant to pleasant), calmness (C; ranging from restless/under
tension to calm/relaxed), and energetic arousal (E; ranging
from tired/without energy to awake/full of energy) form the
basic dimensions. Although these dimensions are substantially correlated (cf. Table 1), they cannot be reduced to a twodimensional model. In addition, different experimental manipulations, such as taking sedative drugs or sleep deprivation, caused different patterns of changes in the three mood
dimensions, which would not have been captured by the twodimensional approaches discussed above (Matthews et al.,
1990). Different instruments exist to measure the three mood
dimensions: The UWIST Mood Adjective Checklist (Matthews et al., 1990) assesses each dimension with eight unipolar items; the German-language Multidimensional Mood
Questionnaire (MDMQ) provides short-scales consisting of
four unipolar items (Steyer, Schwenkmezger, Notz, & Eid,
1997). Schimmack and Grob (2000) used six unipolar items,
which they combined into three bipolar items per dimension.
European Journal of Psychological Assessment 2007; Vol. 23(4):258267

260

P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life

In addition to measures based on dimensional models of


mood and affect, various instruments have been developed
to assess qualitatively distinctive mood states (e.g., the revised Multiple Affect Adjective Check List (MAACL-R),
the PANAS-X or the Profile of Mood States (POMS); the
latter assesses, e.g., tension-anxiety, depression-dejection,
anger-hostility, and others). The general problem with
these approaches is that neither the nature nor the number
of distinguishable mood states is clear. Moreover, the proposed specific mood-states are usually highly correlated
(Schimmack, 1999; Watson & Vaidya, 2003).

trait and indicates true variability between occasions, and


measurement error. In LST theory reliability is defined as
the ratio of true variance (latent-trait variance + latent-state
residual variance) to total variance at a given occasion. Another variance decomposition, which takes the serial dependency of repeated measures into account, was proposed
by Kenny and Zautra (2001). In their model, the total variance is divided into a stable trait, an autoregressive trait,
and a state component, which contains situational influences and error, and is supposed to vary randomly over time.

Methods to Evaluate the Psychometric


Properties of Scales Used in Ambulatory
Assessment Studies

The Current Study

Earlier approaches to demonstrate the factor structure of


repeated measurement data followed Cattells suggestion
to factorize the between-person correlations, which are repeated time by time (R-technique) separately from the
within-person correlations, which are repeated person by
person (P-technique; e.g., Zevon & Tellegen, 1982). Contemporary approaches use structural equation models
(SEM) or multilevel models (MLM) to estimate the factorial structure between and within persons simultaneously
(see data analysis).
Specific reliability coefficients for ambulatory assessment measures have been calculated in various ways. Although the computational details differ, all of these methods decompose the total variance into trait, state, and error
components. To obtain indicators for the within-person reliability, Buse and Pawlik (2001) correlated test halves
across occasions for each participant (Cattells P-matrix)
and averaged those coefficients across participants. To obtain indicators of the aggregate reliability, which is based
on the between-person variance, the odd-even method was
applied (e.g., Buse & Pawlik, 1996; Perrez, Schoebi, &
Wilhelm, 2000).
Cranford et al. (2006) decomposed the variance of their
measures into variance between persons, days, items of the
same scale, the two way interactions, and residuals. Using
generalizability theory they combined the variance components to demonstrate high aggregate reliability and satisfactory within-person reliability for their three-item mood
scales. A similar but less formalized approach was proposed by Fahrenberg et al. (2002).
Other approaches to obtain specific reliability estimates
are based on structural equation modelling (SEM). One important class of models in this framework are latent-state
latent-trait (LST) models. In LST theory (e.g., Steyer,
Schmitt, & Eid, 1999) the total variance of a variable at a
given occasion is partitioned into three components: a latent-trait component, which does not change over occasions and indicates true consistency, a latent-state residual,
which captures the occasion-specific deviation from the
European Journal of Psychological Assessment 2007; Vol. 23(4):258267

The purpose of this study was to evaluate the psychometric


properties of a short mood measure designed to assess three
basic mood dimensions in peoples daily lives. Data were
collected from a sample of 187 participants who reported
their mood state four times a day over the course of a week
by means of the current mood measure. Using a multilevel
approach, the three-factorial structure that was proposed by
Matthews et al. (1990), Schimmack and Reisenzein (2002),
and Steyer et al. (1997) was simultaneously tested between
persons and within persons. We further showed how error
variance can be separated from latent variance at the different levels to obtain level-specific reliability coefficients
and evaluate each scales sensitivity for measuring true
change over time.

Method
Participants
Ninety-eight Swiss couples were recruited to participate in
a 1 week diary study either in undergraduate psychology
classes or through private acquaintances of graduate students. Because of technical failures of the handheld computers, data of nine persons were lost. Thus, data of 93
women and 94 men from 97 heterosexual couples could be
analyzed. Age of participants ranged between 19 and 36
years (M = 25.6, SD = 3.2); half of them were students.

Electronic Diary Procedure


Four times a day over the course of a week, participants were
asked to rate their current mood and a series of other questions
not relevant to this paper. The diary questions were implemented on Palm Tungsten T and T5, programmed with a pilot
version of IzyBuilder (http://www.izybuilder.com). The
questions could be answered by using a stylus on a touchscreen. Around 11 a.m., 2:30 p.m., 6 p.m., and 9:30 p.m. the
computer gave an acoustic signal. Signal time points were
randomized in a time window of 20 min around the intend 2007 Hogrefe & Huber Publishers

P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life

ed times to prevent participants from anticipating the exact


beginnings of the report.

Measures
Mood
To measure the basic mood-dimensions V, C, and E in peoples daily life, we developed a six-item short scale that
relied on the Multidimensional Mood Questionnaire
(MDMQ), a German-language mood scale (Steyer et al.,
1997). The MDMQ provides consistent four-item scales to
measure each dimension (Cronbachs s of the three scales
ranged from .73 to .89 over four repeated measures). During each observation participants responded to the statement At this moment I feel: by means of six bipolar
items, which were presented in the following order on one
display: tiredawake [mdewach] (E+), contentdiscontent [zufriedenunzufrieden] (V), agitatedcalm [unruhigruhig] (C+), full of energywithout energy [energiegeladenenergielos] (E)1, unwellwell [unwohlwohl]
(V+), relaxedtense [entspanntangespannt] (C). The
scales had seven steps. Their endpoints 0 and 6 were associated with the label very. Answers were given by moving a slider from the start position 0, at the left end of a
scale, to the position which corresponded best to the current
state. To make sure that participants responded by moving
the slider rather than browsing through the allocation, at
least one of the two items belonging to a dimension had to
be moved to proceed to the next question. Prior to the analyses, data from three items were reverse coded, to ensure
that higher scores indicate higher positive V, higher E, or
higher C.

Data Analysis
We used multilevel analyses (e.g., Raudenbush & Bryk,
2002; Goldstein, 2003) to investigate the variance and covariance of the mood items. With MLMs, confirmatory factor analyses (CFA) and regression models can be computed
simultaneously for the within- and the between-person part
of the data. Compared with SEMs, they are better suited to
analyze hierarchically structured, unbalanced data sets
with missing observations, such as are typically obtained
in ambulatory assessment. A shortcoming of MLMs is that
unlike SEM, they do not provide established fit indices.
Recently Bauer (2003) and Curran (2003) have demonstrated that nested structures of unbalanced data can also
be modeled with SEMs. However, the treatment of such
1
2

261

data is computationally easier with MLMs. We, therefore,


used an MLM approach and the program MLwiN 2.02
(Rasbash, Steele, Browne, & Prosser, 2005) to analyze the
data. MLwiN provides an iterative generalized least square
algorithm to obtain parameter-estimates. At convergence,
these estimates are maximum likelihood. The procedure
yields a deviance-statistic (2 log likelihood) that indicates
how well the specified model fits the data. If two models
are nested, the difference of their deviances has a distribution, with degrees of freedom equal to the difference in
the number of parameters estimated in the models. This
statistic can be used to test whether two models significantly differ in their fit.
Because of the large number of cases in our data set the
power was high to reject a more constrained model although its fit was not substantially worse. Therefore, the
level to evaluate the fit-difference of two models was set
to p = .001.

Results
The raw data consisted of 4,577 observations provided by
187 persons. Because of technical problems, the percentage
of missing observations during the first 7 consecutive days
was high (on average 20.4%, SD = 31.7). However, many
participants compensated for these technical failures by extending the observation period, resulting in a satisfying average number of 24.5 observations per participant (SD =
5.9; range 6 to 44). Ten observations were excluded because they contained contradictory extreme responses, and
therefore, a total of 4,567 observations were analyzed.

Item Variances and Covariances Between


and Within Persons
In a first step, the item covariation at the within- and the
between-person level was explored. A model with three
levels was set up, in which mood-items (Level 1) were nested within observations (Level 2), which were nested within
persons (Level 3).2 In the basic model, each of the six mood
items was identified by a dummy-coded indicator variable
for which a fixed effect and random effects at Level 2 and
Level 3 were estimated, according to the following equation:
yijk = 1 (item1) + 1k (item1) + u1jk (item1) + . . . +
6(item 6) + 6k (item 6) + u6jk (item 6)

(1)

This item is not part of the MDMQ. It was included because of positive characteristics in previous diary studies of our research group (Perrez
et al., 2000, Wilhelm, 2004).
To keep the models as simple as possible, we do not take into account that feeling states reported by romantic partners are positively
correlated. The consequence of not modeling the similarity between partners is that significance tests are too liberal at the between-person
level. However, this bias is marginal when the number of couples is rather large as in our study (see Kenny, Kashy, & Cook, 2006) and,
therefore, does not compromise our conclusions.

2007 Hogrefe & Huber Publishers

European Journal of Psychological Assessment 2007; Vol. 23(4):258267

262

P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life

Table 2. Random part of Model 1: Variances, covariances, and correlations of the mood items at the between and withinperson level
Between-person variation (Level 3)
1 content

(SE)

(SE)

(SE)

(SE)

(SE)

(SE)

0.433

(0.051)

0.370

(0.046)

0.348

(0.047)

0.421

(0.054)

0.224

(0.044)

0.277

(0.043)
(0.043)

2 well

0.87

***

0.422

(0.049)

0.397

(0.049)

0.425

(0.054)

0.261

(0.044)

0.270

3 calm

0.74

***

0.86

***

0.508

(0.059)

0.467

(0.059)

0.248

(0.047)

0.240

(0.044)

4 relaxed

0.81

***

0.83

***

0.83

***

0.629

(0.072)

0.289

(0.052)

0.328

(0.051)

5 awake

0.48

***

0.56

***

0.49

***

0.51

***

0.514

(0.064)

0.402

(0.054)

6 full of energy

0.62

***

0.61

***

0.50

***

0.61

***

0.83

***

0.456

(0.056)

Within-person variation (Level 2)


1 content

1.442

(0.031)

0.736

(0.023)

0.582

(0.023)

0.752

(0.026)

0.234

(0.028)

0.344

(0.025)

2 well

0.54

***

1.267

(0.027)

0.625

(0.022)

0.738

(0.024)

0.345

(0.027)

0.410

(0.024)

3 calm

0.41

***

0.48

***

1.370

(0.029)

0.794

(0.025)

0.033

(0.027)

0.083

(0.024)

4 relaxed

0.50

***

0.52

***

0.54

***

1.603

(0.034)

0.034

(0.029)

0.202

(0.026)

5 awake

0.13

***

0.20

***

0.02

2.351

(0.050)

0.02

6 full of energy
0.21
***
0.27
***
0.05
***
0.12
***
0.63
***
*p < .05; **p < .01; ***p < .001
Note: In the diagonals variances are presented, above the diagonals covariances and below correlations are shown.

Thus, the response yijk given on a particular item (subscript


i) at a particular time (subscript j) by a particular person
(subscript k) was modeled as a function of each items overall mean i, from which deviation was allowed. The estimate for ik captures the extent to which a persons average
response k on item i deviates from the overall mean of this
item (variation between persons). The estimate for the random effect at Level 2, uijk, captures the extent to which
responses given at different times j deviate from each persons average response k on a particular item i. Thus, this
estimate captures variation within persons, reflecting differences between observations over time. The random coefficients of the six items were allowed to covary at each
level.
The fixed coefficients of Model 1 were 4.56 for content,
4.53 for well, 4.41 for calm, 4.30 for relaxed, 3.42 for
awake, and 3.51 for full of energy, indicating that on average, participants were in a positive and relaxed state and
above a medium energy level. Results of the random part
of Model 1 are shown in Table 2. As can be seen from the
diagonals, the variances between observations (Level 2)
are approximately 3 to 4 times larger than the variances
between persons (Level 3). This indicates that the biggest
part of the total variation in each item is the result of differences between observations and error. Below the diagonals, the correlation coefficients between items are shown.
At Level 2, the pattern of correlations indicates that the
items that belong to a common factor show the highest as3
4

1.330

(0.038)

1.891

(0.040)

sociations. However, the contrast between items that belong to the same factor and items that belong to different
factors was substantial only for the items full of energy and
awake (which form the factor E). For the other items this
difference was small. At Level 3, correlations were higher
than they were at Level 2, but the pattern was quite similar.

Factor Structure Between and Within


Persons
In the next step, a model was specified in which the variances and covariances of the three postulated factors were
estimated at Level 2 and Level 3 (Model 2). Each factor
was identified by a dummy variable.3 At Level 2 and Level
3, the variances and covariances of these factor variables
were estimated. In addition, each single item was allowed
to vary, but the covariances between items and the covariances between factors and items were constrained to be
zero. As before, fixed effects were estimated for each item.4
Compared with the saturated Model 1, Model 2 fit the
data significantly worse, (18) = 179.4, p < .001. We,
therefore, tested a modified model in which item residuals
were allowed to be correlated. In order to keep this model
simple, correlations between residual item variances were
only allowed when their corresponding Wald-test was p <
.01. The resulting Model 2r was not significantly different
from the saturated Model 1, (9) = 14.0, p = .122.

For example, the dummy variable of the factor C was coded 1 if a response corresponded to the items calm or relaxed and 0 if it corresponded
to other items.
Conceptually Model 2 is equal to the estimation of two CFAs in a SEM framework: one for the between-person data (the 187 individuals
means computed over time for each item) and another for the within-person data (4,567 cases in which the variables were centered around
each individuals mean). In each CFA the variances and covariances of the three latent factors were estimated, as well as the residual variance
of each item. The loadings of the factors on the items were either constrained to 1 or 0. Each of the two CFA would then have 9 df.

European Journal of Psychological Assessment 2007; Vol. 23(4):258267

2007 Hogrefe & Huber Publishers

P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life

263

Table 3. Random part of Model 2r: Estimated variances (diagonals) and correlations (below the diagonals) of the three
mood factors and the residuals at the between-person Level 3 and the within-person Level 2
Between-person variation (Level 3)
Factor variances and correlations

1 Valence

0.363***

2 Calmness

0.99***

0.461***

3 Energetic Arousal

0.70***

0.59***

0.394***

Residual item variances and correlations

1 content

0.062***

2 well
3 calm

0.057***
0.83***

0.061***

4 relaxed
5 awake

0.138***
0.57***

0.124***

6 full of energy

0.61***

0.046*

Within-person variation (Level 2)


Factor variances and correlations

1 Valence

0.738***

2 Calmness

0.79***

3 Energetic arousal

0.25***

0.06**

1.332***

Residual item variances and correlations

1 content

0.725***

2 well

0.791***

0.19***

0.14***

0.28***

0.809***

0.15***

The estimates of the random part of Model 2r are displayed in Table 3. Between persons (Level 3) V and C were
almost perfectly correlated with each other and were both
highly correlated with E. This indicates that persons who
reported a high average level of pleasure during the observation week also reported a high average level of C and E.
High correlations existed also between the three residual
item variances, probably because of stable response patterns in the use of items. Within persons (Level 2), V was
highly correlated with C and moderately with E, but the
correlation between C and E was close to zero. This pattern
of correlations indicates that changes over time were highly
synchronized between V and C, and slightly between V and
E. Correlations between residual item variances were all
positive and of small to moderate size.5
In the next steps, we tested whether the three-factor
model above could be simplified to a more parsimonious
two-factor model (at each level). We first forced the factors
of V and C at Level 2 to form a common factor. This model
(Model 3a) fit the data significantly worse than Model 2r,
(3) = 197.2, p < .001, and was rejected. The same procedure was then applied to the random part of Level 3. The
5

0.579***
0.21***

5 awake
6 full of energy
*p < .05; **p < .01; ***p < .001

0.512***

3 calm
4 relaxed

1.021***
0.20***

0.559***

fit of this model (Model 3b) was not much worse than the
fit of Model 2r, (3) = 12.0, p = .007. The variance of the
common V-C factor was 0.398 and its correlation with E
was r = .65. We also tested whether a one-factor solution
would be appropriate at Level 3, but the fit of this one-factor model was clearly worse, (2) = 104.3, p < .001.
In summary, the results show that the theoretically postulated three correlated factors are necessary to describe the
within-person variations of mood over time (Level 2).
However, to describe rather stable differences in the weekly averages of mood between persons (Level 3), two correlated factors appear to be sufficient.

Sensitivity to Change
To evaluate sensitivity to change, one can directly compare
the relative size of the within-person variances of the mood
factors with the between-person variances (Table 3). In Table
4, these variances are shown again in the first column. However, we further decomposed the within-person variance because temporal patterns within days differ from temporal pat-

If the residual item variances were not allowed to covary (Model 2) the Level 2 correlations were slightly higher (r V-C = .88, r V-E = .36, r
= .10) than in Model 2r.

C-E

2007 Hogrefe & Huber Publishers

European Journal of Psychological Assessment 2007; Vol. 23(4):258267

264

P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life

Table 4. Decomposition of the total variance into latent variance and error variance between persons and within persons
Latent variance

Error variance

Total variance

Latent/total var.

Estim.

Estim.

Estim.

(Reliability)

Valence
Between-personsa
Within-persons

0.363

33

0.030

0.393

27

0.92
0.70

0.738

67

0.309

91

1.047

73

Between days, within-personsb

0.182

17

0.026

0.208

14

0.88

Between observations, within daysb

0.573

52

0.286

85

0.860

60

0.67

Total

1.101

100

0.339

100

1.440

100

0.76

Between-personsa

0.461

37

0.050

13

0.511

31

0.90

Within-personsa

0.791

63

0.347

87

1.138

69

0.70

Between days, within-personsb

0.188

15

0.019

0.207

13

0.91

Between observations, within daysb

0.626

50

0.330

83

0.956

58

0.66

Total

1.252

100

0.397

100

1.649

100

0.76

Between-personsa

0.394

23

0.043

10

0.437

20

0.90

Within-personsa

1.332

77

0.395

90

1.727

80

0.77

0.094

0.022

0.115

0.81

1.247

72

0.376

86

1.623

75

0.77

Calmness

Energetic-arousal

Between days, within-persons

Between observations, within daysb

Total
1.726
100
0.438
100
2.164
100
0.80
Notes: aVariance components were obtained from the Model 2r; bA four-level model was estimated to obtain the variance components between
days, within-persons and between observations, within days. In this model the between-persons variance was estimated to be slightly smaller
than in Model 2r.

terns between days. This was done by computing a four-level


model: Level 1 was constituted by the mood items, answered
at each observation (Level 2), which were repeated at different days of the week (Level 3) by different persons (Level 4).
In analogy to Model 1, the full covariance matrix was estimated at Level 2, 3, and 4. The covariances between the items
belonging to the same factor were taken as estimates of the
latent factor variances. (Alternatively the factor Model 2r
could have been extended to four levels.)
As can be seen from Table 4, between two thirds and
three quarters of the total latent variance was the result of
variation within persons. This clearly demonstrates that
the measures of the three mood dimensions are highly sensitivity to capture change over time. A closer examination
of the within-person variance reveals that for V and C,
half of the true variation was within days and about 15%
was the result of changes between days. The proportions
were different for E, indicating that this dimension
changed predominantly within days, whereas changes between days were small. The distributions of the error variances at the different levels6 show that measurement error
was concentrated within days. As a consequence, sensitivity to change would be overestimated when measurement error was not controlled (see Table 4).
6

Level-Specific Reliability Estimates


To obtain level-specific reliability coefficients, the proportion
of latent variation to total variation was computed (last column of Table 4).7 At the between-person level, reliability was
.92 for V, and .90 for C and E. Thus, the internal consistency
of the average (across observations) of each of the three mood
dimensions was very satisfactory. As was suggested by Model 3b, V and C can be merged to a common dimension at
Level 3. The reliability coefficient of this common score was
.95. The estimated within-person reliability was .70 for V and
C and .77 for E. This suggests that the internal consistency is
also sufficient when the score of a mood dimension is computed for a single observation from which the stable betweenperson variation (each persons average over the week) has
already been removed. The reliability for measuring the average mood at a given day (based on four observations, from
which the persons average has been removed) was very satisfactory for V and C (= .88) and good for E (.81). Even a
score obtained at a single observation, from which the days
and the persons average have been removed, still has an
acceptable to satisfactory reliability (V and C = .66, E = .77).
For a score based on a single observation, which has not been
decomposed, reliability was .76 to .80.

The error variance is the mean of the residual item variances divided by the number of items. To obtain, for example, the within-person
error of V the mean of the residual variation of the items content and well from Table 3 (Level 2) is computed (0.725 + 0.512)/2 = 0.618
and divided by 2 = 0.309.
The proportion of latent to total variance leads to results equivalent to the computation of Cronbachs for each level.

European Journal of Psychological Assessment 2007; Vol. 23(4):258267

2007 Hogrefe & Huber Publishers

P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life

Discussion
In this article, we evaluated the psychometric properties of a
short mood scale to assess fluctuation of mood states in daily
life. We did this by investigating the within- and between-person variance and covariance of six bipolar items that were
chosen to measure three basic dimensions of mood, namely,
valence (V), calmness (C), and energetic arousal (E).
Examination of the factor structure revealed that there was
evidence for the three-dimensional model proposed by Matthews et al. (1990), Steyer et al. (1997), and Schimmack and
colleagues (Schimmack & Grob, 2000; Schimmack & Reisenzein, 2002) at the within-person level. The correlations
between the dimensions indicated that fluctuations of V and
C over the course of a week were highly synchronized,
whereas E was moderately associated with V, but not remarkably with C. The comparison of our results with the correlations reported in other studies (see Table 1 and Table 3) suggests differences in the size of single coefficients rather than
a different pattern of associations. The current study shows
that the three-dimensional model holds when correlations
were computed within persons across time. At the betweenperson level, a different pattern of correlations was found. C
and V converged into a common well-being factor, which
was highly correlated with E. Thus, the three-factor model
could not be confirmed for persons average scores.
High correlations between affect measures, which were
aggregated over many observations, have repeatedly been
found in diary studies (e.g., Schimmack, 2003; Wilhelm,
2004; Zelenski & Larsen, 2000). Watson and Vaidya (2003)
attributed such correlations mainly to systematic response
biases, in particular to the tendency to respond similarly to
different items. They concluded that general ratings ultimately appear to have superior construct validity, and
therefore should continue to be viewed as the gold standard
in trait affect assessment (p. 371). Although we agree that
high correlations in the aggregated affect measures are, in
part, the result of stable response styles, we do not see evidence for Watson and Vaidyas conclusion. First, response
styles operate in conventional trait-affect scales, too. Second, and more important, there is striking evidence that
reports of feelings experienced in general or during a longer
time period (e.g., last year), which are assessed in trait-affect questionnaires, are prone to many sources of distortion,
like retrospective recall biases, and mood congruent- and
autobiographic memory effects (e.g., Gorin & Stone, 2001;
Fahrenberg, Myrtek, Pawlik, & Perrez, 2007; Robinson &
Clore, 2002). Hence, these results question the validity of
trait-affect questionnaires to measure the actually experienced general state during a certain time period. Trait-affect scales may validly assess the participants current concept about their general affective state, yet this is a different
theoretical concept (Perrez, 2006).
Mood averages are stable over time (Buse & Pawlik,
1996) and are substantially correlated with affect-related
traits, like e.g., neuroticism (Fahrenberg et al., 2001;
2007 Hogrefe & Huber Publishers

265

Klumb, 2004; Schimmack, 2003). We, therefore, assume


that besides response styles, mood averages contain a large
portion of valid information about the affective disposition
of a person. The high correlations that we obtained between
the three mood dimensions can, thus, be meaningfully interpreted. They indicate that people who are in a bad mood
overall generally also feel more tension and less energy.
Such a pattern is quite plausible and would be typical for
longer lasting dysphoric or depressed episodes.
In sum, the results of the structural analysis show that different constructs are measured at the between- and withinperson level: On the between-person level, V and C are equal
indicators of well-being, which is highly associated with the
subjectively experienced general level of energy. Within persons, the scores of V, C, and E capture three differently synchronized mood states. The state quality of each dimension
was clearly demonstrated, given that at least about two thirds
of the total latent variation was the result of fluctuations over
time. Thus, the three measures are highly sensitive to capture
changes in mood states, which is an essential requirement for
the assessment of mood.
Our approach allowed us to decompose the variance at each
level of the data into latent variance and error variance. We
could, therefore, compute level-specific reliability coefficients. They show that weekly aggregates provide highly reliable measures of well-being and general E. Also, daily averages were reliable. Moreover, even the obtained reliability of
scores computed at single observations was still in an acceptable range. These coefficients are especially satisfying when
we take into account that they are based on two items only.
We used bipolar items to assess mood states. This operationalization corresponds to our theoretical conceptualization
of mood as an ongoing affective coloring of the current experience, which can be described on basic bipolar dimensions. In doing so, we are in accordance with Russell and
Carroll (1999a, 1999b), who concluded that a bipolar model
of the valence dimension of affect fits well to the data reported in the literature, when response formats of the items and
measurement error are adequately taken into account. Correspondingly, Steyer and Riedl (2004) confirmed the bipolarity
assumption for the dimensions of the three-factorial mood
model. Using an LST approach for ordinal data, they showed
that the latent-state residuals of the corresponding unipolar
items of the MDMQ which we combined to obtain our
bipolar measures were almost perfectly negatively correlated. However, whether affect dimensions are uni- or bipolar
and whether feelings of pleasure and displeasure are mutually
exclusive or can both be experienced at the same time has
been intensively debated and is not yet resolved (e.g., Russell
& Carroll, 1999a, 1999b; Watson & Tellegen, 1999; see
Schimmack, 2005, for arguments in favor of a two-dimensional conceptualization of pleasure and displeasure that affords unipolar response formats).
Some limitations of this study deserve attention. First, although we found that the three-dimensional model fit the
within-person data much better than a more parsimonious
two-dimensional model, the latent factors of V and C were
European Journal of Psychological Assessment 2007; Vol. 23(4):258267

266

P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life

Bauer, D.J. (2003). Estimating multilevel linear models as structural equation models. Journal of Education and Behavioral
Statistics, 28, 135167.

Bolger, N., Davis, A., & Rafaeli, E. (2003). Diary methods: Capturing life as it is lived. Annual Review of Psychology, 54,
579616.
Buse, L., & Pawlik, K. (1996). Ambulatory behavioral assessment
and in-field performance testing. In J. Fahrenberg & M. Myrtek (Eds.), Ambulatory assessment (pp. 2950). Kirkland, WA:
Hogrefe & Huber.
Buse, L., & Pawlik, K. (2001). Computer-assisted ambulatory
performance-tests in everyday situations: Construction, evaluation, and psychometric properties of a test battery measuring
mental activation. In J. Fahrenberg & M. Myrtek (Eds.), Progress in ambulatory assessment (pp. 323). Kirkland, WA: Hogrefe & Huber.
Cranford, J.A., Shrout, P.E., Iida, M., Rafaeli, E., Yip, T., & Bolger, N. (2006). A procedure for evaluating sensitivity to within-person change: Can mood measures in diary studies detect
change reliably? Personality and Social Psychology Bulletin,
32, 917929.
Curran, P.J. (2003). Have multilevel models been structural equation models all along? Multivariate Behavioral Research, 38,
529569.
Davidson, R.J. (1994). On emotion, mood, and related affective
constructs. In P. Ekman & R.J. Davidson (Eds.), The nature of
emotion. Fundamental questions (pp. 5155). New York: Oxford University Press.
Ebner-Priemer, U.W., & Sawitzki, G. (2007). Ambulatory assessment of affective instability in borderline personality disorder:
The effect of the sampling frequency. European Journal of
Psychological Assessment, 23, 238247.
Ekman, P., & Davidson, R.J. (Eds.). (1994). The nature of emotion: Fundamental questions. New York: Oxford University
Press.
Fahrenberg, J. (2006). Self-reported subjective state Single
items or scales like AD-ACL and PANAS? European Network
of Ambulatory Assessment. Statements and Open Commentaries. Retrieved June 21, 2007, from http://www.ambulatoryassessment.org.
Fahrenberg, J., Httner, P., & Leonhart, R. (2001). MONITOR:
Acquisition of psychological data by a hand-held PC. In J. Fahrenberg & M. Myrtek (Eds.), Progress in ambulatory assessment (pp. 93112). Kirkland, WA: Hogrefe & Huber.
Fahrenberg, J., Leonhart, R., & Foerster, F. (2002). Alltagsnahe
Psychologie: Datenerhebung im Feld mit hand-held PC und
physiologischem Mess-System [Psychological assessment in
everyday life: Data gathering in the field by handheld personal
computers and physiological monitoring systems]. Bern:
Huber.
Fahrenberg, J., Myrtek, M., Pawlik, K., & Perrez, M. (2007). Ambulatory assessment Monitoring behavior in daily life settings: A behavioral-scientific challenge for psychology. European Journal of Psychological Assessment, 23, 206213.
Goldstein, H. (2003). Multilevel statistical models (3rd ed.). London, UK: Edward Arnold.
Gorin, A.A., & Stone, A.A. (2001). Recall biases and cognitive
errors in retrospective self-reports: A call for momentary assessments. In A. Baum, T.A., Revenson, & J.E. Singer (Eds.),
Handbook of health psychology (pp. 405413). Mahwah, NJ:
Erlbaum.
Kenny, D.A., Kashy, D.A., & Cook, W.L. (2006). Dyadic data
analysis. New York: Guilford.
Kenny, D.A., & Zautra, A. (2001). Trait-state models for longitu-

European Journal of Psychological Assessment 2007; Vol. 23(4):258267

2007 Hogrefe & Huber Publishers

highly correlated. It would, therefore, be worthwhile to investigate whether there are other items that are appropriate indicators of V and C, respectively, but discriminate these two
dimensions better than the ones used here. Second, although
we could demonstrate high sensitivity to change, which is a
basic requirement for a mood scale, it is necessary to consider
that sensitivity to change only indicates that there is change
over time, but it does not indicate that such change is in accordance with theoretical assumptions. Therefore, different
facets of criterion validity need to be demonstrated. For example, E should increase until midday and decline in the
evening, tense arousal should be higher during a conflict situation, and V should be more positive when leisure activities
are performed together with friends. Finally, the results are
based on a 1 week assessment of a nonrepresentative sample
of young Swiss couples. Because partners moods are usually
correlated (Cranford et al., 2006; Wilhelm, 2004, Wilhelm &
Perrez, 2004), one might argue that the between-person variance is probably underestimated compared to a sample of
single individuals. Analyses computed separately for women
and men revealed that compared to the whole sample (see
Table 4) the latent between-person variance was slightly larger for women but smaller for men. This suggests that sensitivity to change might only be slightly overestimated if at
all. We, therefore, believe that the results can be generalized
to young German-speaking adults. However, it remains to be
investigated whether results would differ in other populations
(adolescents, older people, patients), when other time schedules are used, or when participants experience demanding or
stressful circumstances (e.g., see Cranford et al., 2006).
Nevertheless, the current article illustrates that a sensitive
and reliable measurement of the basic dimensions of daily
mood is possible with a short set of only six bipolar items.

Acknowledgments
The preparation of parts of this article was supported by a
fellowship grant of the Alfried Krupp Wissenschaftskolleg,
Greifswald, Germany, to Peter Wilhelm. Dominik Schoebis work on this article was supported by fellowship
PA001-108998 from the Swiss National Science Foundation. We are grateful to Andrea Conrad, Lukas Erpen, Carmen Faustinelli, Miriam Knzli, Annette Meier, Jacqueline
Nagel, Adelaide Notter, and Melanie Sarbach for their engagement in recruiting, instructing, and coaching the participants of our study. We thank Ulrich Ebner-Priemer and
Thomas Kubiak for their suggestions on this article, and
Siegfried Macho for his comments on the analytical strategy and the conceptual interpretation of our results.

References

P. Wilhelm & D. Schoebi: Assessing Mood in Daily Life

dinal data. In L.M. Collins & A.G. Sayer (Eds.), New methods
for the analysis of change (pp. 243263). Washington, DC:
American Psychological Association.
Klumb, P.L. (2004). Benefits form productive and consumptive
activities: Results from the Berlin Aging Study. Social Indicators Research, 67, 107127.
Kubiak, T., & Jonas, C. (2007). Applying circular statistics to the
analysis of monitoring data: Patterns of social interactions and
mood. European Journal of Psychological Assessment, 23,
227237.
Lewis, M., & Haviland-Jones, J.M. (Eds.). (2000). Handbook of
emotions (2nd ed.). New York: Guilford.
Lucas, R.E., & Baird, B.M. (2006). Global self-assessment. In M.
Eid & E. Diener (Eds.), Handbook of multimethod measurement in psychology (pp. 2942). Washington, DC: American
Psychological Association.
Matthews, G., Jones, D, M., & Chamberlain, A.G. (1990). Refining the measurement of mood: The UWIST Mood Adjective
Checklist. British Journal of Psychology, 81, 1742.
Myrtek, M. (2004). Heart and emotion. Ambulatory monitoring
studies in everyday life. Gttingen, Germany: Hogrefe & Huber.
Perrez, M. (2006). Pldoyer fr theorieadquate Methoden in gewissen Domnen der Psychologie [Plea for a theory-adequate
methodology in certain domains of psychology]. Verhaltenstherapie und Psychosoziale Praxis, 38, 319330.
Perrez, M., Schoebi, D., & Wilhelm, P. (2000). How to assess
social regulation of stress and emotions in daily family life? A
computer-assisted family self-monitoring-system (FASEMC). Clinical Psychology and Psychotherapy, 7, 326339.
Rasbash, J., Steele, F., Browne, W., & Prosser, B. (2005). A users
guide to MLwiN version 2.0. Bristol, UK: Centre for Multilevel
Modelling, University of Bristol.
Raudenbush, S.W., & Bryk, A.S. (2002). Hierarchical linear
models: Applications and data analysis methods (2nd ed.).
Thousand Oaks, CA: Sage.
Reicherts, M., Salamin, V., Maggiori C., & Pauls, K. (2007). The
learning affect monitor (LAM): A computer-based system integrating dimensional and discrete assessment of affective
states in daily life. European Journal of Psychological Assessment, 23, 268277.
Robinson, M.D., & Clore, G.L. (2002). Belief and feeling: Evidence for an accessibility model of emotional self-report. Psychological Bulletin, 128, 934960.
Russell, J.A. (2003). Core affect and the psychological construction of emotion. Psychological Review, 110, 145172.
Russell, J.A., & Carroll, J.M. (1999a). On the bipolarity of positive and negative affect. Psychological Bulletin, 125, 330.
Russell, J.A., & Carroll, J.M. (1999b). The phoenix of bipolarity:
Reply to Watson and Tellegen (1999). Psychological Bulletin,
125, 611617.
Russell, J.A., Weiss, A., & Mendelsohn, G.A. (1989). Affect grid:
A single-item scale of pleasure and arousal. Journal of Personality and Social Psychology, 57, 493502.
Scherer, K.R. (2005). What are emotions? And how can they be
measured? Social Science Information, 44, 695729.
Schimmack, U. (1999). Strukturmodelle der Stimmungen: Rckschau, Rundschau und Ausschau [Structural models of mood:
Review, overview, and outlook]. Psychologische Rundschau,
50, 9097.
Schimmack, U. (2003). Affect measurement in experience sampling research. Journal of Happiness Studies, 4, 79106.
2007 Hogrefe & Huber Publishers

267

Schimmack, U. (2005). Response latencies of pleasure and displeasure ratings: Further evidence for mixed feelings. Cognition and Emotion, 19, 671691.
Schimmack, U., & Grob, A. (2000). Dimensional models of core
affect: A quantitative comparison by means of structural equation modeling. European Journal of Personality, 14, 325345.
Schimmack, U., & Reisenzein, R. (2002). Experiencing activation: Energetic arousal and tense arousal are not mixtures of
valence and activation. Emotion, 2, 412417.
Steyer, R., & Riedl, K. (2004). Is it possible to feel good and bad
at the same time? New evidence on the bipolarity of moodstate dimensions. K. van Montfort, J. Oud, & A. Satorra (Eds.),
Recent developments on structural equation models: Theory
and applications (pp. 197220). Dordrecht, The Netherlands:
Kluwer.
Steyer, R., Schmitt, M., & Eid, M. (1999). Latent state-trait theory
and research in personality and individual differences. European Journal of Personality, 13, 389408.
Steyer, R., Schwenkmezger, P., Notz, P., & Eid, M. (1997). Der
Mehrdimensionale Befindlichkeitsfragebogen. Handanweisung [The Multidimensional Mood Questionnaire (MDMQ)].
Gttingen, Germany: Hogrefe.
Thayer, R.E. (1989). The biopsychology of mood and activation.
New York: Oxford University Press.
Watson, D., & Tellegen, A. (1985). Toward a consensual structure
of mood. Psychological Bulletin, 98, 219235.
Watson, D., & Tellegen, A. (1999). Issues in the dimensional
structure of affect Effects of descriptors, measurement error,
and response formats: Comment on Russell and Carroll
(1999). Psychological Bulletin, 125, 601610.
Watson, D., & Vaidya, J. (2003). Mood measurement: Current
status and future directions. In I.B. Weiner (Series Ed.) & J.A.
Schinka & W.F. Velicer (Volume Eds.), Handbook of psychology, Vol. 2: Research methods in psychology (pp. 351375).
Hoboken, NJ: Wiley.
Wilhelm, P. (2004). Empathie im Alltag von Paaren. Akkuratheit
und Projektion bei der Einschtzung des Befindens des Partners [Empathy in couples everyday lives: Accuracy and projection in judging the partners feelings]. Bern: Huber.
Wilhelm, P., & Perrez, M. (2004). How is my partner feeling in
different daily-life settings? Accuracy of spouses judgments
about their partners feelings at work and at home. Social Indicators Research, 67, 183246.
Zelenski, J.M., & Larsen, R.J. (2000). The distribution of basic
emotions in everyday life: A state and trait perspective from
experience sampling data. Journal of Research in Personality,
34, 178197.
Zevon, M.A., & Tellegen, A. (1982). The structure of mood
change: An ideographic/nomothetic analysis. Journal of Personality and Social Psychology, 43, 111122.

Peter Wilhelm
Department of Psychology
University of Fribourg
Rue de Faucigny 2
CH-1700 Fribourg
Switzerland
E-mail peter.wilhelm@unifr.ch
European Journal of Psychological Assessment 2007; Vol. 23(4):258267

Das könnte Ihnen auch gefallen