Beruflich Dokumente
Kultur Dokumente
www.elsevier.com/locate/foodqual
Abstract
Paired comparisons can be a very eective way of performing measurements of preferences. The statistical analysis can be carried
out as an ANOVA although the explanatory variables are not categorical variables. Explanatory variables of subjects and objects
can be included in the ANOVA if the experiment is carefully designed. It is often feasible or necessary to reduce the number of
comparisons to be performed by subjects, however, such structure can be included in ANOVA if comparisons to be performed are
carefully selected. # 1999 Elsevier Science Ltd. All rights reserved.
Keywords: Analysis of variance; Designed experiments; Individual preferences; Paired comparisons; Sensory analysis; Scale dierences
1. Introduction
The origin of paired comparisons is Thurstone's law
of comparative judgement to items which have no physical properties as for example statements (Thurstone,
1927). The basic data collection for scaling items in the
Thurstone tradition is that of paired comparisons, while
the most common design for measurements requires
subjects to respond to items according to some positive
negative dichotomy or scale, such as ``agree'' or ``disagree''. The latter is often referred to as a direct
response design. The model for paired comparisons is
numerically equivalent to the BradleyTerryLuce
model and the simple logistic model (Bradley & Terry,
1952; Luce, 1959). In all of these models the response is
binary.
The method of paired comparisons can also be developed for continuous responses. However, beyond the
paper of Schee (1952), very little work seems to have
been done in this area. Also the method has not yet been
applied very extensively.
In general the method of paired comparisons is
expected to be more sensitive to dierences between
objects than direct response designs. If a judge in a
direct response design gives two objects the same score,
he might still nd a slight preference for one. This might
be an advantage but it may also distort the hypothesis
of subtraction discussed later.
* Tel.: +45-3815-3515; fax: +45-3815-3500.
0950-3293/99/$ - see front matter # 1999 Elsevier Science Ltd. All rights reserved.
PII: S0950-3293(99)00064-6
56
would be too many comparisons for a subject to perform, however, as the study was a pilot study and the
experiment was carried out under relaxed conditions it
was accepted as feasible.
The experiment took place in the afternoon in the
garden of the author given the context of ``having a
relaxed late afternoon beer among friends''. Each subject was placed at a table with two glasses of beer, one
to the left and one to the right of the subject and a piece
of paper between the glasses. The glasses were marked
``left'' and ``right'', respectively, otherwise they were
identical. Subjects were instructed to taste the left beer
rst but were otherwise free to retaste any of the beers if
desired. Subjects rated their judgement on a 22 cm linear scale labelled at left anchor. ``I prefer left beer very
much'' and at the other anchor ``I prefer right beer very
much''. Two new glasses of beer and a new piece of
paper with a rating scale were placed on the table and so
on until 12 comparisons were performed.
At the end of the experiment ratings were converted
into numbers. The distance from the middle of the scale
to the noted mark was measured. Positively to the right
and negatively to the left. The corresponding values of
the scores of subject p with beer i to the left and beer j to
the right was denoted yijp. In any event it is assumed
that the numerical score increases with the strength of
the preference for j over i, and that equal but opposite
preferences (j over i, and i over j) corresponds to equal
but opposite scores. Thus for subject p
3. The model
4. Statistical analysis
Yijp N ijp ; 2 ; i; j 1; 2; 3; 4; i 6; j; p 1; 2; :::8:
df
s.s.
m.s.
Model
Residual
56
40
2712.36
36.64
48.44
0.92
52.9
<0.001
Total
96
2749.00
28.64
Table 2
Analysis of variance. Order eects and preferences
Eect
df
s.s
m.s.
1
7
48
40
8.05
4.78
2699.53
36.64
8.05
0.68
56.24
0.92
8.8
0.7
61.4
0.005
0.672
<0.001
Total
96
2749.00
28.64
57
it is metric. If H1 can be supported an interesting question would be whether the rating scales are the same for
all subjects. These hypotheses correspond to an orthogonal decomposition such that the analysis of variance
table becomes Table 3. The s.s.'s of rows 3, 4 and 5
correspond to the total variation of preferences ('s)
which is decomposed into deviance from rating scales,
(ijp jp ip ), dierent rating scale (ip i ) and
degeneration of common rating scale (1 2 3 4
0). It is found that most of the variation in preferences is due to common rating scale (m.s. =823.06),
which is signicant (p<0.001), however, there is also
variation due to dierences in rating scales (m.s. =9.15)
and the dierences are found to be signicant
(p<0.001). On the other hand the variation due to
deviance from rating scales is insignicant (p =0.067).
Thus, one can conclude that for each subject there exists
a rating scale, however, the rating scales are dierent.
The estimated individual rating scales and the common
rating scale are shown in Table 4.
The hypothesis of subjects having rating scales (H1) is
of special importance. If the hypothesis is rejected several explanations are possible. A frequent explanation is
that more than one dimension is involved in the comparisons. In the present case several aspects of taste may
be involved. When comparing objects i and j focus may
be on ``sweetness'' while comparing j and k focus might
have changed to ``bitterness'' or even dierence in the
colours of objects. A related situation arises if one
object is very dierent from the others. This might bring
subjects in an uncertain situation concerning the context
of the comparisons to be performed.
A simple, but not very satisfying, way of handling
such problem is if possible to omit the deviating
object from the analyses. Another way is to extend the
experiment to a multivariate comparison i.e. letting
subjects express their preferences on more than one
scale. This can be arranged by having for example four
visual scales on the paper between the beers, and request
subjects to express their preferences with respect to e.g.
colour, bitterness, sweetness and ability to slake the
thirst. An analysis of multivariate comparisons can be
performed but will not be discussed in this paper.
Finally several alternative hypotheses of H1 can be
incorporated in the analysis. It is beyond the scope of
this paper to discuss these possibilities. It should, however, be emphasized that if H1 is rejected care should be
taken.
4.2. Structure of subjects
If as in the present case H1 is accepted but the
hypothesis that subjects have the same rating scale is
rejected the question arises whether there exists a
grouping of subjects, say into gender, such that all men
have the same rating scale and all women have the same
58
Table 3
Analysis of variance. Rating scales
Eect
df
s.s.
m.s.
1
7
3
21
24
40
8.05
4.78
2469.17
192.10
38.26
36.64
8.05
0.68
823.06
9.15
1.59
0.92
8.8
0.7
898.6
10.0
1.7
0.005
0.672
<0.001
<0.001
0.067
Total
96
2749.00
28.64
Table 4
Estimated rating scales
Subject
Table 5
Two by two layout of subjects
1
Subject
a
1
1
1
1
1
2
1
2
2
1
2
1
2
2
2
2
1
2
3
4
5
6
7
8
3.0
3.2
3.2
3.1
4.4
4.5
4.3
4.2
0.9
0.7
1.1
1.0
3.3
3.0
3.1
2.7
0.6
0.5
0.8
0.4
2.2
2.2
2.3
2.0
3.3
3.4
3.5
3.7
5.5
5.3
5.1
4.9
Gender
Lifestyleb
Common scale
3.7
2.0
1.4
4.3
Lifestyle 1
Lifestyle 2
3.1
4.4
0.9
3.0
0.6
2.2
3.4
5.2
rating scale but possible the rating scales for men and
women are dierent. Furthermore, if there exists an
additional grouping of subjects e.g. into lifestyle, it is
possible to combine two (or more) categorical variables
and also include possible eects of interactions, between
categorical explanatory variables. The hypothesis
becomes
H2: ip 0i;lifestylep 00i;genderp 000
i;lifestylegenderp
a
b
59
Table 7
Two by two layout of objects
Table 6
Analysis of variance. Gender and lifestyles
Eect
df
s.s.
m.s.
Objects
3
3
3
3
12
2469.17
0.25
185.22
3.35
3.28
823.06
0.08
61.74
1.12
0.27
898.6
0.1
67.4
1.2
0.3
<0.001
0.960
<0.001
0.322
0.986
Qualitya
Strengthb
0
0
0
1
1
0
1
1
Variation of scales
24
2661.27
110.89
a
b
Table 8
Analysis of variance. Structure of objects and lifestyle
Eect
df
s.s
m.s.
Quality
Strength
Qualitystrength
Qualitylifestyle
Strengthlifestyle
Qualitystrengthlifestyle
1
1
1
1
1
1
2093.05
353.91
22.21
179.58
1.79
3.85
2093.05
353.91
22.21
179.58
1.79
3.85
2285.2
386.4
24.2
196.1
2.0
4.2
<0.001
<0.001
<0.001
<0.001
0.165
0.047
Variation of scales
2654.39
442.40
60
Table 9
Variable created to perform regression analysis
Comparisons
Variables
t12
t23
t34
t41
t13
t24
a1
a2
a3
a4
qs
1
2
2
3
3
4
4
1
3
1
2
4
2
1
3
2
4
3
1
4
1
3
4
2
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
1
1
1
1
0
0
0
0
0
0
1
1
0
0
1
1
1
1
0
0
1
1
0
0
0
0
0
0
1
1
1
1
0
0
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
1
0
0
1
1
1
1
1
1
1
1
0
0
1
1
0
0
1
1
1
1
Table 10
Original observations measured on a (-11,11) scale
Comparisons
Subjects
1
2
2
3
3
4
4
1
3
1
2
4
2
1
3
2
4
3
1
4
1
3
4
2
2.6
1.3
2.0
0.8
3.8
2.4
5.5
6.6
3.3
4.5
4.6
3.5
3.8
2.3
2.4
1.3
3.2
1.8
6.3
7.5
3.8
1.7
4.6
3.5
2.6
1.4
2.8
1.3
3.6
2.3
5.8
6.8
4.2
4.6
5.3
3.8
3.3
1.7
2.6
1.8
3.9
2.4
7.2
7.5
4.0
1.2
5.2
3.4
2.4
1.3
6.9
5.7
4.6
3.8
10.2
7.8
6.4
7.4
9.3
8.3
0.7
3.1
5.7
4.7
4.6
3.4
9.8
7.7
7.1
7.9
9.1
8.1
2.6
1.5
7.6
5.7
4.6
3.7
9.3
7.2
6.1
7.4
8.4
7.3
0.8
2.7
5.4
4.4
4.6
3.4
9.3
6.9
6.5
7.8
7.9
7.3
6. Conclusions
Computations
Appendix
The ``.'' is the interaction syntax of GENSTAT (corresponding to ``*'' in other software, like SAS ), and one
can think of it as multiplication: the parentheses means
that the model includes all iteraction terms between
SUBJECT and each of the terms inside the parentheses.
This and the following model formulas have too
many parameters which means that some parameters
must be set equal to zero. This does not aect the analysis of variances, however, the estimated parameters
must be interpreted accordingly.
Tables 1 and 2 are obtained by tting appropriate
sub-model of (6) by leaving out parts of (6). The 8
parameters of SUBJECT is the order eects. If SUBJECT
is left out the constant will be the common order eect.
To estimate the scales we specify variables a1, a2, a3
and a4 by
a1 1 if i 1; a1 1 if j 1 and a1 0 otherwise,
and similarly for a2, a3 and a4.
SUBJECT a1 a2 a3 a4
GENDER STYLE GENDERSTYLE:
a1 a2 a3 a4
and appropriate sub-models.
To compute Table 8 we note that Model (5) states
that ai qualityi strengthi; i=1,2,3,4 and thereby
j i = quality(i) - quality(j) - [strength(i) - strength(j)].
We construct variables q and s by
qi; j qualityi qualityj
and
si; j strengthi strengthj
where quality and strength of object i is 0 or 1 according
to Table 7.
Adding a row, r, to Table 7 by r(i)=0 if i 1 or 4
and r(i) =1 if i 2 or 4, we can dene a variable qs as
the interaction between strength and quality by qs(i,j)
=r(i) r(j). From the statistical analysis it was found
that a scale for each lifestyle was required. The model
formula for tting these scales is
SUBJECT a1 a2 a3 a4
STYLE:a1 a2 a3 a4
10
61