Sie sind auf Seite 1von 5

TRENDS in Cognitive Sciences Vol.6 No.

10 October 2002
http://tics.trends.com 1364-6613/02/$ see front matter 2002 Elsevier Science Ltd. All rights reserved. PII: S1364-6613(02)01964-2
421 Opinion
The expl osi on of i nterest i n model i ng cogni ti ve
processes over the past 20 years has fuel ed the
cogni ti ve sci ences i n many ways. Not onl y has i t
opened up new ways of thi nki ng about research
probl ems and possi bl e sol uti ons, but i t has al so
enabl ed researchers to gai n a better understandi ng of
thei r theori es by si mul ati ng a computati onal
i nstanti ati on of i t. Model i ng i s now suffi ci entl y
mai nstream that one can get the i mpressi on that the
model s themsel ves are repl aci ng the theori es from
whi ch they evol ved.
What has not kept pace wi th the advances and
i nterest i n model i ng i s the devel opment of methods
for eval uati ng and testi ng the model s themsel ves.
A model i s not i nterchangeabl e wi th a theory, but
onl y one of many possi bl e quanti tati ve representati ons
of i t. A thorough eval uati on of a model requi res
methods that are sensi ti ve to i ts quanti tati ve form.
Cri teri a used for eval uati ng theori es [1], such as
testi ng thei r performance i n an experi mental setti ng,
do not speak to the qual i ty of the choi ces that are
made i n bui l di ng thei r quanti tati ve counterparts
(i .e. choi ce of parameters, how they are combi ned)
or thei r rami fi cati ons. The pauci ty of such model
sel ecti on methods i s surpri si ng gi ven the central i ty of
the probl em i tsel f. What coul d be more fundamental
than deci di ng between two al ternati ve expl anati ons
of a cogni ti ve process?
How not to compare models
Mathemati cal model are frequentl y tested agai nst
one another by eval uati ng how wel l each fi ts the
data generated i n an experi ment or si mul ati on.
Such a test makes sense gi ven that one cri teri on of
model performance i s that i t reproduce the data.
A goodness-of-fi t measure (GOF; see Gl ossary) i s
i nvari abl y used to measure thei r adequacy i n
achi evi ng thi s goal . What i s measured i s how much a
models predictions deviate from the observed data [2,3].
The model that provi des the best fi t (i .e. smal l est
devi ati on) i s favored. The l ogi c of thi s choi ce rests on
the assumpti on that the model that provi des the best
fi t to al l data must be a cl oser approxi mati on to the
cogni ti ve process under i nvesti gati on than i ts
competi tors [4].
Such a concl usi on i s reasonabl e i f measurements
were made i n a noi se-free (i .e. errorl ess) system. One
of the bi ggest chal l enges faced by cogni ti ve sci enti sts
i s that human and ani mal data are noi sy. Error ari ses
from several sources, such as the i mpreci si on of our
measurement tool s, vari ati on i n parti ci pants and
thei r performance over ti me. The probl em of random
When a good fit can
be bad
Mark A. Pitt and In Jae Myung
How should we select among computational models of cognition? Although it
is commonplace to measure how well each model fits the data, this is
insufficient. Good fits can be misleading because they can result from
properties of the model that have nothing to do with it being a close
approximation to the cognitive process of interest (e.g. overfitting). Selection
methods are introduced that factor in these properties when measuring fit.
Their success in outperforming standard goodness-of-fit measures stems from
a focus on measuring the generalizability of a models data-fitting abilities,
which should be the goal of model selection.
Mark A. Pitt*
In Jae Myung
Dept of Psychology, Ohio
State University, 1885
Neil Avenue, Columbus,
Ohio 43210-1222, USA.
*e-mail: pitt.2@osu.edu
hi ppocampus and neocortexI nsi ghts from the
successes and fai l ures of connecti oni st model s of
l earni ng and memory. Psychol. Rev. 102, 419457
10 Petersen, S.E. et al. (1988) Posi tron emi ssi on
tomographi c studi es of the corti cal anatomy of
si ngl e word processi ng. Nature331, 585589
11 Penfi el d, W. and Roberts, L. (1959) Speech and
Brain Mechanism, Pri nceton Uni versi ty Press
12 Ojemann, G.A. (1979) I ndi vi dual vari abi l i ty i n
corti cal l ocal i zati on of l anguage. J . Neurosurg.
50, 164169
13 Nobre, A.C. et al. (1994) Word recogni ti on i n the
human i nferi or temporal l obe. Nature372,
260263
14 Wada, J. and Rasmussen, T. (1960) I ntracaroti d
i njecti on of sodi um amytal for the l ateral i zati on of
speech domi nance: experi mental and cl i ni cal
observati ons. J . Neurosurg. 17, 266282
15 Pascual -Leone, A. et al. (1999) Transcrani al
magneti c sti mul ati on: studyi ng the
brai n-behavi our rel ati onshi p by i nducti on of
vi rtual l esi ons. Philos. Trans. R. Soc. Lond. B
Biol. Sci. 354, 12291238
16 Pascual -Leone, A. et al. (2000) Transcrani al
magneti c sti mul ati on i n cogni ti ve
neurosci encevi rtual l esi on, chronometry, and
functi onal connecti vi ty. Curr. Opin. Neurobiol. 10,
232237
17 Mari e, P. (1906a) Revi si on de l a questi on de
l aphasi e: La troi si eme convol uti on frontal e gauche
ne joue aucun tol e speci al e dans l a foncti on du
l angage. SemaineMedicale21, 241247.
(Reprei nted i n Col e, M.F. and Col e, M. eds., (1971),
Pi erre Mari es papers on speech di sorders.
New York: Hafner).
18 Mari e, P. (1906b) Revi si on de l a questi on de
l aphasi e: Que faut-i l penser des aphasi es
sous-corti cal es (aphasi es pures)? Semaine
Medicale26, 493500
19 Lashl ey, K.S. (1929) Brain Mechanisms and
I ntelligence, Uni versi ty of Chi cago Press
20 Lashl ey, K.S. (1950) I n search of the engram. I n
Symposia for theSociety for Experimental Biology,
No, 4, Cambri dge Uni versi ty Press
21 Fodor, J.A. (1983) TheModularity of Mind,
MI TPress
22 Vandenberghe, R. et al. (1996) Functi onal
anatomy of a common semanti c system for words
and pi ctures. Nature383, 254256
23 Mummery, C.J. et al. (1999) Di srupted temporal
l obe connecti ons i n semanti c dementi a. Brain
122, 6173
24 Pri ce, C.J. et al. (1999) Del i neati ng necessary and
suffi ci ent neural systems wi th functi onal i magi ng
studi es of neuropsychol ogi cal pati ents. J . Cogn.
Neurosci. 11, 43714382
25 Pri ce, C.J. and Fri ston, K.J. (1999) Scanni ng
pati ents on tasks they can perform. Hum. Brain
Mapp. 8, 102108
26 Mummery, C.J. et al. (2000) A voxel based
morphometry study of semanti c dementi a. The
rel ati on of temporal l obe atrophy to cogni ti ve
defi ci t. Ann. Neurol. 47, 3645
27 Fri ston, K.J. et al. (1999) Mul ti -subject fMRI
studi es and conjuncti on anal ysi s. NeuroI mage10,
385396
28 Pri ce, C.J. and Fri ston, K.J. (1997) Cogni ti ve
conjuncti ons: a new approach to brai n acti vati on
experi ments. Neuroimage5, 261270
29 Ashburner, J. and Fri ston, K.J. (2000)
Voxel -based morphometry the methods.
NeuroI mage11, 805821
30 Howard, D. and Patterson, K. (1992) Pyramids and
Palm Trees: ATest of Semantic Access from Pictures
and Words, Thames Val l ey, Bury St Edmunds
error and the l engths researchers go to combat i t are
evi dent i n the experi mental and stati sti cal methods
used i n the fi el d (e.g. repeated measurement,
i nferenti al stati sti cs). They serve to hol d error i n
check so that vari ati on due to the mental process of
i nterest, what we are real l y i nterested i n, wi l l be
vi si bl e i n the data.
Noi sy data make GOF by i tsel f a poor method of
model sel ecti on. As the si mul ati on i n Box 1
i l l ustrates, a GOF measure such as the Root Mean
Squared Error (RMSE) i s i nsensi ti ve to the di fferent
sources of vari ati on i n the data, whether i t i s random
error or due to the cogni ti ve process of i nterest. Thi s
coul d resul t i n the sel ecti on of a model that overfi ts
the data, whi ch may not be the model that best
approxi mates the cogni ti ve process under study.
How to compare models
Because i t i s i mpossi bl e to el i mi nate error from data,
efforts have focused on i mprovi ng model sel ecti on i n
other ways. The preferred sol uti on has been to
redefi ne the probl em as one of assessi ng how wel l a
model s fi t to one data sampl e general i zes to future
sampl es generated by that same process [5].
GENERALI ZABI LI TY (see Gl ossary) uses the data and
i nformati on about the model i tsel f to make a best-
guessesti mate as to how l i kel y i t i s the model coul d
have generated the data sampl e i n hand. I n thi s
approach, a good fi t i s a necessary but not suffi ci ent
condi ti on for a model because many model s are
capabl e of fi tti ng a dataset reasonabl y wel l . Because
the set of such candi date model s i s potenti al l y qui te
l arge, we can onl y ever i nfer the l i kel i hood wi th whi ch
each model under consi derati on generated the data.
I n thi s regard, general i zabi l i ty i s a stati sti cal
i nference probl em. I t i s no di fferent conceptual l y from
esti mati ng the repl i cabi l i ty of an experi mental resul t
usi ng i nferenti al stati sti cs or i nferri ng the
characteri sti cs of a popul ati on from a sampl e.
A handful of general i zabi l i ty measures have been
proposed i n the l asts 30 years [6]. By necessi ty, al l
i ncl ude a measure of GOF to assess a model s fi t to the
data (see Box 2). Terms encodi ng i nformati on about
the model i tsel f are i ncl uded to l evel the pl ayi ng fi el d
among model s so that one model , by vi rtue of i ts desi gn
choi ces (i .e. mathemati cal i nstanti ati on) does not have
an i nherent advantage over i ts competi tors i n fi tti ng
the data best, and thus bei ng sel ected. By nul l i fyi ng
such model -speci fi c properti es, the model that i s the
best approxi mati on to the cogni ti ve process under
study, and not si mpl y the one that absorbs the most
vari ati on i n the data, wi l l be sel ected.
Measures of generalizability
Earl y measures of general i zabi l i ty such as the Akai ke
i nformati on cri teri on (AI C) [7,8] and Bayesi an
i nformati on cri teri on (BI C) [9] addressed the most
sal i ent di fferences among model s: the number of free
parameters. As i s general l y wel l known, a model wi th
many free parameters can provi de a better fi t to a
data sampl e than a model wi th few parameters, even
i f the l atter generated the data. The second term i n
these measures i ncl udes a count of the number of
parameters (k). AI C and BI C penal i ze a model more
as the number of parameters i ncreases. To be
sel ected, the model wi th more parameters must
overcome thi s penal ty and provi de a substanti al l y
better fi t than a model wi th fewer parameters. That
i s, the superi or fi t obtai ned wi th the extra parameters
must justi fy the necessi ty of those parameters i n ful l y
capturi ng the cogni ti ve process.
An equal l y sal i ent but much l ess tangi bl e
di mensi on al ong whi ch model s al so di ffer i s i n thei r
functi onal form, whi ch refers to the way i n whi ch the
parameters are combi ned i n a model s equati on. More
sophi sti cated sel ecti on methods, such as Bayesi an
model sel ecti on (BMS) [10] and MI NI MUM DESCRI PTI ON
LENGTH (MDL) [11,12], are sensi ti ve to a model s
functi onal form as wel l as to the number of parameters.
Functi onal form i s taken i nto account i n the thi rd
term i n the MDL measure. I n BMS, both are hi dden
i n the i ntegral . (Cross Val i dati on, al though not l i sted,
i s another sel ecti on method that i s thought to be
sensi ti ve to both di mensi ons of model COMPLEXI TY. I t
i nvol ves appl yi ng GOF i n a non-standard way. [13])
The second and thi rd terms i n MDL together
provi de a measure of a model s compl exi ty.
Conceptual l y, compl exi ty refers to that characteri sti c
TRENDS in Cognitive Sciences Vol.6 No.10 October 2002
http://tics.trends.com
422 Opinion
Complexity: the property of a model that enables it to fit diverse patterns of data; it is the
flexibility of a model. Although the number of parameters in a model and its functional form can
be useful for gauging its complexity, a more accurate and intuitive measure is the number of
distinct probability distributions that the model can generate by varying its parameters over
their entire range. Details of this geometric complexity measure can be found in [a].
Functional form: the way in which the parameters () and data (x) are combined in a models
equation: y = x and y = + x have the same number of parameters but different functional forms
(multiplicative versus additive).
Generalizability: the ability of a model to fit all data samples generated by the same cognitive
process, not just the currently observed sample (i.e. the models expected GOF with respect to
new data samples). Generalizability is estimated by combining a models GOF with a measure of
its complexity.
Goodness of fit (GOF): the precision with which a model fits a particular sample of observed
data. The predictions of the model are compared with the observed data. The discrepancy
between the two is measured in a number of ways, such as calculating the root mean squared
error between them.
Minimum Description Length (MDL): a versatile measure of generalizability. MDL was
developed within algorithmic coding theory in computer science [b], where the goal of model
selection is to choose the model that permits the greatest compression of data in its description.
Regularities in the data are assumed to imply redundancy. The more the data can be
compressed by the model by extracting this redundancy, the more that is learned about the
cognitive process.
Overfitting: the case where, in addition to fitting the main trends in the data, a model also fits the
microvariation from this main trend at each data point. Compare the middle and right graph
inserts in Fig. 1.
Parameters: variables in a models equation that represent mental constructs or processes;
they are adjusted to improve a models fit to data. For example, in the model y = x, is a
parameter.
Probability density function: a function that specifies the probability of observing each outcome
of a random variable given the value of a models parameter.
References
a Myung, I .J. et al. (2000) Counti ng probabi l i ty di stri buti ons: di fferenti al geometry and
model sel ecti on. Proc. Natl. Acad. Sci. U. S. A. 97, 1117011175
b Li , M. and Vi tanyi , P. (1997) An I ntroduction to Kolmogorov Complexity and its
Applications, Spri nger-Verl ag
Glossary
TRENDS in Cognitive Sciences Vol.6 No.10 October 2002
http://tics.trends.com
423 Opinion
The ability of a model to fit data is a necessary condition
that all models must satisfy to be taken seriously. GOF
measures such as RMSE are inappropriate as model
selection methods because all they do is assess fit. This
myopic focus is problematic when variation in the data
can be due to other factors (e.g. random sampling,
individual differences) as well as the cognitive process
under study.
The severity of the problem is shown in Table I, which
contains the results of a model recovery simulation using
RMSE. Four datasets were generated from a combination
of the two models (M
A
and M
B
), defined as follows:
M
A
: y = (1+t)
a
, M
B
: y = (b+ct)
a
where a, b, c > 0. Datasets
were generated from each in the frequencies shown in
the four conditions (rows). In the first condition, all
100 samples were generated by M
A
with a = 0.4 and only
sampling error introduced as noise. In the second
condition, variation due to individual differences was
also added to the data by using a different parameter value (a = 0.6) half
of the time. In the third condition, half of the data were generated by
model M
A
and half by M
B
. Condition four is the reverse of condition one,
with the data plus sampling error being generated by M
B
. Models M
A
and
M
B
were then fitted to the data in each condition using RMSE. The mean
RMSE fits, along with the percentage of time each model provided the
best fit, are shown on the two right-most columns of the Table.
A good model selection method must be able to ignore irrelevant
variation in the data (e.g. sampling error, individual differences that are
not being modeled) and recover the model that generated the data.
That is, the selection method must be capable of differentiating between
the variation that the model was designed to capture and the variation
due to noise. RMSE fails miserably at this task. M
B
was chosen 100% of
the time in all four conditions and the mean fit is substantially better than
that of M
A
. M
A
never provided a better fit than M
B
, even when some or all
of the data were generated by M
A
(conditions 13). This is why a good fit
can be bad.
Readers who have some familiarity with modeling might not be
surprised by the results given that M
B
has two more parameters than M
A
.
A model with more parameters will always provide a better fit, all other
things being equal [a]. The typical solution to this problem is to control
for the number of parameters, but there are at least two reasons why this
fix is unsatisfactory. The most obvious is that it limits model comparison
to those situations in which the number of parameters is equated across
models. The diversity of models in the cognitive sciences can make
this a significant impediment in doing research. Less obvious although
even more important is that a models data-fitting abilities are also
affected by other properties of the model, such as its functional form
[b,c,d]. Unless they are taken into account by the selection method,
simply equating for number of parameters will not place the models on
an equal footing.
In summary, it may make perfect sense to use GOF to determine
whether a given model can even pass the test of fitting a dataset
reasonably well (i.e. capturing the main trends), but going beyond
this and comparing such fits between models, although intuitive,
is risky.
References
a Li nhart, H. and Zucchi ni , W. (1986) Model Selection, Wi l ey
b Myung, I .J. et al. (2000) Speci al I ssue on model sel ecti on. J . Math.
Psychol. 44 (12)
c Li , S.C. et al. (1996) Usi ng parameter sensi ti vi ty and i nterdependence
to predi ct model scope and fal si fi abi l i ty. J . Exp. Psychol. Gen. 125,
360369
d Myung, I .J. and Pi tt, M.A. (1997) Appl yi ng Occams razor i n model i ng
cogni ti on: a Bayesi an approach. Psychon. Bull. Rev. 4, 7995
Box 1. Why GOF alone is a poor measure of model selection
Listed in Table II are two GOF measures (RMSE, PVAF) and four
generalizability measures (AIC, BIC, BMS, MDL). Except for BMS, in
which the likelihood function is integrated over the parameter space, the
measures of generalizability use the maximized log-likelihood, that is,
ln(f(y|
0
)), as a GOF measure, the minus of which represents lack of fit.
This fit index is combined with a measure of model complexity to yield a
generalizability measure. The four generalizability criteria differ from
one another in the conceptualization of model complexity. In AIC, the
number of parameters (k) is the only dimension of complexity that is
considered, whereas BIC also considers sample size (n). BMS and MDL
go one step further and also take into account the functional form of a
models equation. In MDL, this is reflected through the third term of the
criterion equation whereas in BMS it is hidden in the integral. These
selection methods, except for PVAF, prescribe that the model that
minimizes the given criterion should be chosen.
Reference
a Schervi sh, M.J. (1995) TheTheory of Statistics, pp. 11115, Spri nger-Verl ag
Box 2. Model selection criteria as measures of generalizability
Table II. Two GOF Measures, four generalizability measures, and the dimensions of complexity to which each is sensitive
Selection method Criterion equation Dimensions of complexity considered
Root Mean Squared Error RMSE = (SSE/N)
1/2
None
Percent Variance Accounted For PVAF=100(1-SSE/SST) None
Akaike Information Criterion AIC = -2 ln((y|
0
)) + 2k Number of parameters
Bayesian Information Criterion BIC = -2 ln((y|
0
)) + kln(n) Number of parameters, sample size
Bayesian Model Selection BMS=ln ( y|)()d Number of parameters, sample size, functional form
Minimum Description Length MDL=ln (( y|
0
)) + (k/2)ln(n/2)+ln d I det )) ( ( Number of parameters, sample size, functional form
In the equations above, y denotes observed data, is the models parameter,
0
is the parameter value that maximizes the likelihood function f(y|), k is the number
of parameters, n is the sample size, N is the number of data points fitted, SSE is the minimized sum of the squared errors between observations and predictions,
SST is the sum of the squares total, () is the parameter prior density, I() is the Fisher information matrix in mathematical statistics [a], det denotes the
determinant of a matrix, and ln denotes the natural logarithm of base e.
Table I. Results of a model recovery simulation in which a GOF measure
(RMSE) was used to discriminate models when the source of the error was
varied.
Condition (sources of
variation)
Model the data were
generated from
Model fitted
M
A
a = 0.4
M
A
a = 0.6
M
B
M
A
M
B
(1) Sampling error 100 0.040 (0%) 0.029 (100%)
(2) Sampling error +
individual differences
50 50 0.041 (0%) 0.029 (100%)
(3) Different models 50 50 0.075 (0%) 0.029 (100%)
(4) Sampling error 100 0.079 (0%) 0.029 (100%)
of a model that makes i t fl exi bl e and easi l y abl e to fi t
di verse patterns of data, usual l y by a smal l
adjustment i n one of i ts parameters. I n Fi g. 1, i t i s
what enabl es the model (wavy l i ne) i n the l ower ri ght
graph to provi de a better fi t to the data (dots) than
that i n the mi ddl e and l eft graphs. Both FUNCTI ONAL
FORM and the number of PARAMETERS contri bute to
model compl exi ty.
How complexity is related to generalizability and GOF:
the dilemma of Occams razor
The noti on of model compl exi ty i s a good vehi cl e wi th
whi ch to i l l ustrate further the goal of general i zabi l i ty
and di sti ngui sh i t from GOF. As demonstrated i n Box 1,
fi t i s maxi mi zed wi th GOF. Because addi ti onal
compl exi ty wi l l i mprove fi t, the two are posi ti vel y
rel ated; thi s i s depi cted by the top functi on i n Fi g. 1.
General i zabi l i ty, on the other hand, i s not rel ated to
compl exi ty so strai ghtforwardl y. Rather, i ts functi on
fol l ows the same trajectory as that of GOF up to a
certai n poi nt, after whi ch fi t decl i nes as compl exi ty
i ncreases.
Why do the two functi ons di verge? The data bei ng
fi tted have a certai n degree of compl exi ty that refl ects
the operati on of the cogni ti ve process. Thi s poi nt
corresponds to the peak of the generalizability function.
Any addi ti onal compl exi ty beyond that needed to
capture the underlying process (to the right of the peak)
wi l l cause the model to overfi t the data by al so
capturi ng the mi crovari ati on due to random error,
and thus reduce general i zabi l i ty. The reason both
functi ons overl ap to the l eft hal f of the peak i s that the
model i tsel f must be suffi ci entl y compl ex to fi t the
data wel l . The model wi l l underfi t the data i f i t l acks
the necessary compl exi ty (i .e. i t i s too si mpl e), as
illustrated in the lower left graph in Fig. 1. The dilemma
that i s faced i n tryi ng to maxi mi ze general i zabi l i ty
shoul d be cl ear: a del i cate bal ance must be struck
between suffi ci ent compl exi ty on the one hand and
good general i zabi l i ty on the other. MDL achi eves thi s
bal ance by choosi ng the model whose compl exi ty i s
most justi fi ed by consi deri ng the compl exi ty of the
data rel ati ve to the compl exi ty of the model .
Selection methods at work: the proof is in the pudding
General i zabi l i ty measures l i ke BMS and MDL have
thus far been devel oped for testi ng onl y those model s
that can be descri bed i n terms of a PROBABI LI TY DENSI TY
FUNCTI ON. Exampl es of such stati sti cal model s are
model s of categori zati on and model s of i nformati on
i ntegrati on [14,15].
The fol l owi ng model -recovery tests demonstrate
the rel ati ve performance of the three cl asses of
sel ecti on methods shown i n Tabl e 1. The three model s
descri bed (see Tabl e footnote) were compared. I n each
si mul ati on, 1000 datasets were generated from each
model . Each sel ecti on method was then tested on i ts
abi l i ty to determi ne whi ch of the three model s di d i n
fact generate the 3000 datasets. A good sel ecti on
method shoul d be abl e to di scern correctl y the model
that generated the data. The i deal outcome i s one i n
whi ch each model general i zes best onl y to data
TRENDS in Cognitive Sciences Vol.6 No.10 October 2002
http://tics.trends.com
424 Opinion
TRENDS in Cognitive Sciences
Goodness of fit
Generalizability
Model complexity
Overfitting
P
o
o
r
G
o
o
d
M
o
d
e
l

f
i
t
Fig. 1. Relationship between goodness of fit and generalizability as a function of model complexity.
The y-axis represents any fit index, where a larger value indicates a better fit (e.g. percent variance
accounted for). The three smaller graphs along the x-axis show how fit improves as complexity
increases. In the left graph, the model (represneted by the line) is not complex enough to match the
complexity of the data (dots). The two are well matched in complexity in the middle graph, which is
why this occurs at the peak of the generalizability function. In the right graph, the model is more
complex than the data, fitting random error. It has better goodness of fit, but is overfitting the data.
Table 1. Model recovery performance (percentage of
correct recoveries) for three models using three
selection methods
Selection
method
Model
fitted
Model the data were generated
from
M
1
M
2
M
3
PVAF M
1
0 0 0
M
2
38 97 30
M
3
62 3 70
AIC M
1
79 0 0
M
2
9 97 30
M
3
12 3 70
MDL M
1
86 0 0
M
2
1 92 8
M
3
13 8 92
Models M
1
, M
2
and M
3
were defined as follows: M
1
: y = (1+t)
-a
;
M
2
: y = (b+t)
-a
; M
3
: y = (1+bt)
-a
. In the model equations, a, b and c
are parameters that were adjusted to fit each model to the data,
which were generated using the same five points,
t = 0.1, 2.1, 4.1 6.1, 8.1. Each sample of five observations was
sampled from the binomially probability distribution of size n = 50.
One thousand samples were generated from each model and
served as the data to fit. Each selection method was then used to
determine which model generated each of the samples.
The percentage of time each model was chosen for each dataset
is shown.
Voice your Opinion in TICS
The pages of Trends in Cognitive Sciences provide a unique forum for debate for all cognitive scientists. Our Opinion section
features articles that present a personal viewpoint on a research topic, and the Letters page welcomes responses to any of the
articles in previous issues. If you would like to respond to the issues raised in this months Trends in Cognitive Sciences or,
alternatively, if you think there are topics that should be featured in the Opinion section, please write to:
The Editor, Trends in Cognitive Sciences
84 Theobalds Road, London,
UK WC1X 8RR,
or e-mail: tics@current-trends.com
generated by i tsel f. I n the 33 matri ces i n Tabl e 1,
thi s corresponds to perfect recovery i n the di agonal
goi ng from the upper l eft to the l ower ri ght. Errors
(cel l s i n the off di agonal s) reveal a bi as i n the sel ecti on
method toward ei ther the more or l ess compl ex model .
The top matri x shows the resul ts usi ng PVAF, wi th
the percentage of correct recoveri es i n each cel l . The
probl em wi th usi ng a GOF measure shows up i n the
fi rst col umn of the matri x, where the data were
generated by the one-parameter model M
1
. M
1
never
recovered i ts own data, wi th one of the two-parameter
model s al ways fi tti ng the data best. Compari son of
these data wi th the mi ddl e matri x shows that usi ng
AI C recti fi es thi s probl em. Because of i ts sensi ti vi ty to
the number of parameters i n a model , i t does a
reasonabl y good job of di sti ngui shi ng between data
generated by M
1
or M
2
. By contrast, note how model
recovery performance remai ns constant across these
two matri ces i n col umns 2 and 3. Thi s i s not
surpri si ng because AI C, l i ke PVAF, i s i nsensi ti ve to
functi onal form, whi ch i s the di mensi on al ong whi ch
M
2
and M
3
di ffer. Onl y when MDL i s used i s an
i mprovement i n model recovery performance
observed i n these col umns (bottom matri x).
Why di d MDL not perform perfectl y, and why di d i t
perform sl i ghtl y worse than AI C and PVAF when the
data came from M
2
(mi ddl e col umn)? Recal l that, l i ke
a stati sti cal test, model sel ecti on i s an i nference
probl em. The qual i ty of the i nference depends on the
i nformati on that the sel ecti on method uses. Even
though MDL makes use of al l of the i nformati on
avai l abl e (data and the model ), thi s does not
guarantee success, but i t greatl y i mprove the chances
of the i nference bei ng correct [16]. MDL wi l l
outperform sel ecti on methods such as AI C and PVAF
most of the ti me, but nei ther i t nor i ts Bayesi an
counterpart, BMS, are i nfal l i bl e.
The i nferenti al nature of model sel ecti on makes i t
i mperati ve to i nterpret model recovery resul ts i n the
context of the data that were used i n the test i tsel f.
Poor performance mi ght not i ndi cate a probl em wi th
the sel ecti on method, but rather a constrai nt on the
resol vi ng power of the sel ecti on method i n
di scri mi nati ng whi ch model coul d have generated the
data. Conversel y, bi ases i n the sel ecti on method i tsel f
can masquerade as good model -recovery
performance. I n a recent compari son of BMS and
RMSD, we demonstrated both of these outcomes [17].
Conclusion
Methods l i ke AI C and MDL can gi ve the i mpressi on
that model sel ecti on can be automated and requi re
mi ni mal thought. However, choosi ng between
competi ng model s i s no easi er or l ess subjecti ve than
choosi ng between competi ng theori es. Sel ecti on
methods shoul d be vi ewed as a tool that researchers
can use to gai n addi ti onal i nformati on about the
model s under consi derati on. These tool s, however, are
i gnorant of other properti es of the model s, such as
thei r pl ausi bi l i ty or the qual i ty of the data, so i t i s
i nappropri ate to deci de between them sol el y on what
i s l earned from a test of general i zabi l i ty.
TRENDS in Cognitive Sciences Vol.6 No.10 October 2002
http://tics.trends.com
425 Opinion
References
1 Jacobs, A.M. and Grai nger, J. (1994) Model s of
vi sual word recogni ti onsampl i ng the state of the
art. J . Exp. Psychol. Hum. Percept. Perform. 29,
13111334
2 Smi th, J.D. and Mi nda, J.P. (2000) Thi rty
categori zati on resul ts i n search of a model . J . Exp.
Psychol. Learn. Mem. Cogn. 26, 327
3 Wi chman, F.A. and Hi l l , N.J. (2001) The
psychometri c functi on: I . Fi tti ng, sampl i ng and
goodness of fi t. Percept. Psychophys. 63, 12931313
4 Roberts, S. and Pashl er, H. (2000) How persuasi ve
i s a good fi t? A comment on theory testi ng.
Psychol. Rev. 107, 358367
5 Bi shop, C.M. (1995) Neural Networks for Pattern
Recognition, (Chapters 1 and 9), Oxford
Uni versi ty Press
6 Myung, I .J. et al. (2000) Speci al I ssue on model
sel ecti on. J . Math. Psychol. 44, 12
7 Akai ke, H. (1973) I nformati on theory and an
extensi on of the maxi mum l i kel i hood pri nci pl e. I n
Second I nternational Symposium on I nformation
Theory(Petrox, B.N. and Caski , F.), pp. 267281,
Akademi ai Ki ado, Budapest
8 Burham, K.P. and Anderson, D.R. (1998) Model
Selection and I nference. APractical I nformation-
Theoretic Approach (Chapter 2), Spri nger-Verl ag
9 Schwarz, G. (1978) Esti mati ng the di mensi on of a
model . Ann. Stat. 6, 461464
10 Kass, R.E. and Raftery, A.E. (1995) Bayes factors.
J . Am. Stat. Assoc. 90, 773795
11 Ri ssanen, J. (1996) Fi sher i nformati on and
stochasti c compl exi ty. I EEE Trans. I nform. Theor.
42, 4047
12 Hansen, M.H. and Yu, B. (2001) Model sel ecti on
and the pri nci pl e of mi ni mum descri pti on l ength.
J . Am. Stat. Assoc. 96, 746774
13 Myung, I .J. and Pi tt, M.A. (i n press). Model
eval uati on, testi ng and sel ecti on. I n Handbook of
Cognition(Lambert, K. and Goldstone, R. eds.), Sage
14 Nosofsky, R.M. (1986) Attenti on, si mi l ari ty and
the i denti fi cati on-categori zati on rel ati onshi p.
J . Exp. Psychol. Gen. 115, 3957
15 Oden, G.C. and Massaro, D.W. (1978) I ntegrati on
of featural i nformati on i n speech percepti on.
Psychol. Rev. 85, 172191
16 Pi tt, M.A. et al. (2002) Toward a method of
sel ecti ng among computati onal model s of
cogni ti on. Psychol. Rev. 109, 472491
17 Pi tt, M.A. et al. Fl exi bi l i ty versus general i zabi l i ty
i n model sel ecti on. Psychon. Bull. Rev. (i n press)
Acknowledgement
Both authors were
supported by NIH Grant
MH57472.

Das könnte Ihnen auch gefallen