Sie sind auf Seite 1von 51

Statistics used in sensory

and consumer research


Chantal Gilbert
14th Nordic Workshop in Sensory Science
5-6 October 2011
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Opening talk on the topic of Statistics

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Why sensory statistics?


Sensory data is unique
Uses human assessors to measure the perception of a wide
range of stimuli, as detected by the senses

Need to consider:

Physiology
Psychology
Motivation
Performance
Behaviour
Genetics...

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Psychological errors:
Stimulus error
Expectation error
Central tendency
Contrast and convergence
Habituation
Halo effect
Logical error

The importance of Experimental


Design: but not enough time!!!

Statistical methods and issues


worthy of discussion...

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Statistical methods and issues


Ive chosen to discuss...
1) Statistics for
discrimination
testing and
the issue of
similarity

2) ANOVA as a
method for
analysing
descriptive
profile data

3) Multivariate
methods, and
interpretation
pitfalls

... a drop in the ocean!


C.C. Gilbert
Nordic Workshop, 6 Oct 2011

1) Statistics for
discrimination testing and
the issue of similarity

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Something simple to start with:


Analysis of Triangle Tests
[In fact, its not so simple, see for example:
O'Mahony M (1995) Who told you the
triangle test was simple? FQP, Vol. 6, No. 4]
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Triangle Test
Each assessor simultaneously receives 3
samples, two of which are identical.
Asked to identify the odd one.

951
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

627

398

Some Scenarios
24 screened assessors perform triangle test task.
# correct

# incorrect
8
16

If result
then there is no difference
(expect 1/3 to be correct by chance).
# correct # incorrect
20
4

then it is clear that the result

# correct # incorrect
13
11

What do we conclude ?

If result
will be significant.
If result

We need the help of Statistics


C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example - Triangle test


A company has made a small change of ingredient
hoping to lead to an improved product. Prior to
consumer testing, the sensory analyst runs a triangle
test to see if the difference between the products is
perceivable.
Objective: Difference testing
Analysts sets alpha risk at 0.05 (5% chance of saying
theres a difference when theres not).
18 assessors perform the test.

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Hypothesis testing cont


Triangle test example:
= 0.05
H0: p=1/3 (there is NO difference)
H1: p> 1/3 (there is a difference)
AB

A=B

H0

H1

How many assessors out of 18 are required to identify


the odd sample to be confident of a difference? - Critical
value
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Binomial tables
(p=1/3, one-tailed test)
Number of
Assessors

5%

12
13
14
15
16
17
18
19
20
21
22
23
24

8
8
9
9
9
10
10
11
11
12
12
12
13

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Significance Level
1%
0.1%
9
9
10
10
11
11
12
12
13
13
14
14
15

10
11
11
12
12
13
13
14
14
15
15
16
16

If result 10
reject H0
& conclude
sig. diff

Triangle Example
Results: 11 assessors correctly identified the odd sample.

H0
Critical
Value (
=0.05)
= 10

H1
<
Calculated value (test statistic)
= 11

Analyst rejects H0 in favour of H1, and concludes there is


a perceivable difference between the samples (p0.05).
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Using discrimination tests to


demonstrate similarity

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Triangle tests for similarity...

A) How they used to do it


(WRONG!)
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example 2 - Triangle test


Similarity testing situation change of supplier
Triangle test ultimately want to show no
perceivable difference...
Imagine - analyst proceeds as usual (ignoring the
similarity issue):
=0.05; 18 assessors
critical value = minimum of 10 correct responses

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example 2 - Triangle test


Results: 8 assessors correctly identified the odd sample.

H0

H1
<

Calculated value
(test statistic)
=8

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Critical
Value (
=0.05)
= 10

Example 2 - conclusion
Analyst fails to reject H0 - not enough evidence to
suggest theres a difference between the two
suppliers.
Actually, thinking about it, the analyst would like to
demonstrate that the samples are the same.
The samples are not significantly different (p>0.05),
therefore they must be the same!

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Triangle tests for similarity...

B) How we currently do it
See ISO reference: Sensory analysis methodology triangle test. BS ISO 4120:2004.
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Similarity testing approach in BS


ISO 4120 (2004)
Traditional hypothesis testing format is not
the ideal tool for demonstrating similarity
between products.
Approach: Use same sensory test methods,
but control for a different type of error when
conducting tests where research objective is
demonstrating similarity.
Essentially ignoring standard rules for
hypothesis testing!
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Approach used
Focus on
(risk of missing a true difference); power = 1-
Pd (measure of the size of the difference)

Be very certain (1- ) that


few people (Pd) can detect a difference
For this, need to use many more assessors
E.g. To be 95% certain that only 20% of population
can detect a difference, need 147 assessors to do
the test (at =0.05)
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example - selecting sample size


Going back to Example 2
Triangle test between products made using the
ingredient from the two suppliers.

Because the company would like to show


that the new suppliers ingredient does not
change the overall perception, the analyst
knows the objective is to establish similarity.

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example - Selecting sample size


(compromising choice of and )

0,20
0,10
0,05
0,01
0,001
0,20
0,10
0,05
0,01
0,001
0,20
0,10
0,05
0,01
0,001
0,20
0,10
0,05
0,01
0,001

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Pd

50%

40%

30%

20%

0,20
7
12
16
25
36
12
17
23
35
55
20
30
40
62
93
39
62
87
136
207

0,10
12
15
20
30
43
17
25
30
47
68
28
43
53
82
120
64
89
117
176
257

0,05
16
20
23
35
48
25
30
40
56
76
39
54
66
97
138
86
119
147
211
302

0,01
25
30
35
47
62
36
46
57
76
102
64
81
98
131
181
140
178
213
292
396

0,001
36
43
48
62
81
55
67
79
102
130
97
119
136
181
233
212
260
305
397
513

Decide
want
Pd = 30%
n=30 is
max. due
to budget

Example cont
Analyst consults the sample size table. Decides to
use n=30 assessors, knowing risks are:
=20%, =10%, Pd=30%
Results show 10 of the 30 assessors identified the
odd sample.

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example - Interpreting results


n

Critical
number of
responses
table
(ISO)

18

24

30

36

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

0,001
0,01
0,05
0,10
0,20
0,001
0,01
0,05
0,10
0,20
0,001
0,01
0,05
0,10
0,20
0,001
0,01
0,05
0,10
0,20

10%
0
2
3
4
4
2
3
5
6
7
3
5
7
8
9
5
7
9
10
11

20%
1
3
4
5
6
3
5
6
7
8
5
7
9
10
11
7
9
11
12
13

Pd
30%
2
4
5
6
7
4
6
8
9
10
7
9
11
11
13
9
11
13
14
16

40%
3
5
6
7
8
6
8
9
10
11
9
11
13
14
15
11
14
16
17
18

50%
5
6
8
8
9
8
9
11
12
13
11
13
15
16
17
14
16
18
19
21

Example - conclusion
Table shows that the maximum number of
correct responses needed to conclude that
two samples are similar, based on a triangle
test, is 11.
Results show 10 correct responses.
Therefore, the analyst concludes the
samples are similar (that is, they are 90%
confident that no more than 30% of
discriminators can detect a difference).
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Triangle tests for similarity...

C) Some new approaches

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Testing for similarity whats new?


Interval hypothesis testing:
Specify an allowed or ignorable difference in terms of the
proportion of discriminators (Pd0)
Work out the probably of correct responses, Pc0,
corresponding to Pd0 (accounts for the chance of guessing).
H0: PcPc0 (i.e. there IS a difference)
HA: Pc<Pc0 (i.e. there is no difference, similar)
Follows standard rules of hypothesis testing, if p<, reject H0
and conclude the samples are similar (where similarity is
defined by the interval).
See: Bi J (2006) Sensory Discrimination Tests and
Measurements. Blackwell Publishing Professional.
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Other new approaches


Using more sensitive sensory methods
One example (among others...) - Tetrads
Present 4 samples
Group the stimuli into two groups of two (unspecified procedure)
6 possible outcomes: WWSS, WSWS, WSSW, SSWW, SWSW,
SWWS
Probability of guessing correctly is 1/3

Ennis JM (2010)
http://ifpressdelta.com/wpcontent/uploads/2011/03/A
STM_2010_Spring_John_
Ennis_New_Methods.pdf

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

2) ANOVA as a method for


analysing descriptive
profile data

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Link between sample presentation


design and analysis
Independent Samples Design
Assessor 1
Assessor 2
Assessor 3
Assessor 4
Assessor 5
Assessor 6
Assessor 7
Assessor 8

Product 1
X
X
X
X

Product 2

X
X
X
X

E.g. Independent samples T-test,


or one-way ANOVA for 3 samples
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Related Samples Design


Assessor 1
Assessor 2
Assessor 3
Assessor 4
Assessor 5
Assessor 6
Assessor 7
Assessor 8

Product 1
X
X
X
X
X
X
X
X

Product 2
X
X
X
X
X
X
X
X

E.g. Paired T-test, or two-way


ANOVA for 3 samples

Standard Analysis:
Two-Way ANOVA with Interaction
Tests of Between-Subjects Effects
Dependent Variable: Astringent
Source
Corrected Model
Intercept
Sample
Judge
Sample * Judge
Error
Total
Corrected Total

Type III Sum


of Squares
19328.625a
326041.875
2111.975
7935.375
9281.275
10198.500
355569.000
29527.125

df
59
1
5
9
45
60
120
119

Mean Square
327.604
326041.875
422.395
881.708
206.251
169.975

F
1.927
1918.175
2.485
5.187
1.213

a. R Squared = .655 (Adjusted R Squared = .315)

Assessor term scale usage (level effect)


Interaction measure of disagreement
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Sig.
.006
.000
.041
.000
.240

Example of Sample by Judge interaction


Interaction plot - Mean data for Sweet
Judge
1
2

Mean Sweet

80

3
4
5
6
7
8
9
10

60

40

20

Bardolino
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Parador

Rhone

Rioja

Sample

Solana

Ventoux

Recent adoptions - Mixed Model


ANOVA
Mixed Model:
Samples - Fixed effect
Assessors - Random effect

Example: Wine - Astringent (mixed model)


Tests of Between-Subjects Effects
Dependent Variable: Astringent
Source
Intercept
Sample
Judge
Sample
* Judge

Hypothesis
Error
Hypothesis
Error
Hypothesis
Error
Hypothesis
Error

Type III Sum


of Squares
326041.875
7935.375
2111.975
9281.275
7935.375
9281.275
9281.275
10198.500

a. MS(Judge)
b. MS(Sample * Judge)
c. MS(Error)
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

df
1
9
5
45
9
45
45
60

Mean Square
326041.875
881.708a
422.395
206.251b
881.708
206.251b
206.251
169.975c

F
369.784

Sig.
.000

2.048

.090

4.275

.000

1.213

.240

Whats new...
Expanding the ANOVA model to account for other
scaling effects
Brockhoff (2003) Statistical testing of individual differences
in sensory profiling. FQP, 14, 425-434
Romano et al. (2008) Correcting for different use of the
scale and the need for further analysis of individual
differences in sensory analysis. FQP 19, 197-209.

Most recently, O5.6 at Pangborn 2011, where Per


Brockhoff introduced an new mixed model ANOVA
that also accounts for scaling differences between
assessors (i.e. the range of the scale used).
Decomposes the interaction into scaling differences +
disagreement, and uses the specific disagreement term in
the F-ratio denominator
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

3) Multivariate methods,
and interpretation pitfalls

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Multivariate data analysis


The methodology applied to data that include
simultaneous measurements on many variables is
called multivariate analysis.
Because they analyse all variables together, multivariate
methods are inherently more difficult to understand than
univariate methods (such as ANOVA).
Well known multivariate methods of analysis include:

Principal Components Analysis (PCA)


Generalised Procrustes Analysis (GPA)
Partial Least Squares Regression (PLS)
Multiple Factor Analysis (MFA)
Preference Mapping: etc.

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Uses for multivariate analysis


The first important thing to stress is that generally
speaking, most multivariate methods are not used for
inference.
That is, most multivariate methods (e.g. PCA, GPA,
PLS, etc.,) are not interested in estimating population
parameters or determining significant differences
between samples.
Multivariate methods, such as PCA, are exploratory in
nature, used for data reduction and data interpretation.
Abuse of PCA: avoid applying it to any and all data
sets!
Be aware of the objectives of the method.
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example of a biplot
1.0

Sensory Biplot : PC1 vs PC2


Acid.At

Strawberry.Od
Strawberry.Fl

Summer

Raspberry.Od

Acid.Bt

0.5

Cranberry.Fl
Sweet.At
Thickness.Mf
Astringent.Mf
Bitter.Bt
Bitter.At
Cloves.Od
Cloves.Fl
Spiced

0.0

PC2: 28.7%

Sweet.Bt

Blue+Rasp
Cran+Cherry
Ribena

Black.leaf.Fl
Black.leave.Od

Medicinal.Fl
Medicinal.Od

-0.5

MixedBerries.Fl
MixedBerries.Od

-1.0

Blackcurrant.Od
Blackcurrant.Fl

-1.0
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

-0.5

0.0
PC1: 53.2%

0.5

1.0

Biplots: interpretation challenges


Cannot interpret as you would a scatterplot or graph
of the means.
Can easily be misinterpreted if taken at face value.
Interpretation can be reasonably subjective
requires experience
Looking at relative positions between objects
Beware that interpretation rules may be different
depending on the method.
Several interpretation pitfalls that users need to be
aware of:
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example: Pitfall # 1
1.0

Sensory Biplot : PC1 vs PC2


Acid.At

Strawberry.Od
Strawberry.Fl

Summer

Raspberry.Od

Acid.Bt

0.5

Cranberry.Fl
Sweet.At
Thickness.Mf
Astringent.Mf
Bitter.Bt
Bitter.At
Cloves.Od
Cloves.Fl
Spiced

0.0

PC2: 28.7%

Sweet.Bt

Blue+Rasp
Cran+Cherry
Ribena

Black.leaf.Fl
Black.leave.Od

Medicinal.Fl
Medicinal.Od

-0.5

MixedBerries.Fl
MixedBerries.Od

-1.0

Blackcurrant.Od
Blackcurrant.Fl

-1.0
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

-0.5

0.0
PC1: 53.2%

0.5

1.0

Example: Pitfall # 2
Attribute correlations
2

sw eet_fla
runny_txt

-1

3
8

6
-2

0.5

large.fruits_app
thin_app
bright_app
arti.straw _fla
pink_app
straw _od
dark.pink_app

0.0

Dim 2 (17.39 %)

straw_fla

thick_app

small.fruits_app
seedy.fruits_app
low _od fruits_appdairy_fla

-0.5

-3

pale.pink_app
acidic_fla
yellow .tints_app

creamy_fla

-3

-2

-1

Dim 1 (39.41 %)

-1.0

Dim 2 (17.39 %)

1.0

Sample scores

-1.0

-0.5

0.0

0.5

Dim 1 (39.41 %)
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

1.0

Pitfall # 3
3

Sample scores
2

Samples described similarly


by dimensions 1 and 2, but
can differ in dimension 3 or
4...

5
3
-1

Dim 2 (17.39 %)

-3

-2

-3

-2

-1

Dim 1 (39.41 %)

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example: Pitfall # 4

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example cont: Pitfall # 4

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example Pitfall # 5: random data


GPA Group Average : dimension 1 versus 2
1.35

object 6
object 4
object 2
attr 7
attr 6
attr 2 attr 1
attr 10
attr 8

-1.35

attr 5
attr 4

attr 9

1.35
object 1

attr 3
object 3

object 5

-1.35

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Example Pitfall # 5: PCA with


non-significant attributes
1.0

Sensory Biplot : PC1 vs PC2


Molass.F

0.5

2
Fruit.O

0.0

PC2: 25.8%

Sweet.F

Curry.O

-0.5

Fruit.F

-1.0

Bitter.AT

-1.0
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

-0.5

0.0
PC1: 61.2%

0.5

1.0

Other pitfalls...
What do the dimensions in the graph
represent?
E.g. Internal vs external preference map.
Underlying dimensions will be different: either
sensory or preference dimensions.
Difficult for clients to understand.

Filtering out the signal from the noise too


much information presented on the graph?
Simplify the graph; present the most appropriate
solution.
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Other approaches...
MFA Coordinates of the projected points (axes F1
and F2: 58.62 %)
4

Euthymol.Home
Use

F2 (28.50 %)

3
Euthymol.Expect
ation

Aquafresh
Extreme
Clean.Home Use
Arm & Hammer
Aquafresh
Enamel
BeverlyExtreme
Arm & Hammer
Care.Expectation
Hills.Home Use
EnamelBeverly Clean.Expectatio
Retardex.Expect
M entadent
ASDA
n
Sensodyne
Care.Home
Oral-B
Use
ASDA
Hills.Expectation
M
entadent
Sensodyne
ation Rembrandt
SR.Expectation
Pronamel.Home
Vitint
SafeM
& intfresh.Home
Blanx.Expectatio
M intfresh.Expect
SR.Home Use
Pronamel.Expect
Use
Use
Use Colgate 2 in 1
White.Expectatio
Oral-B
ation
n Plus.Home
Vitint Safe
& Whitening.Expect
ation
Colgate
nWhite.Home
2 inRembrandt
1 Use
Retardex.Home
Whitening.Home
Plus.Expectationation
Use Blanx.Home Use
Use

-1

-2
-4

-3

-2

-1

F1 (30.11 %)

Examples courtesy
of Anne Hasted,

C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Conclusions
Statistics is a necessary part of the field of sensory
and consumer sciences
Our science is improving
Increased knowledge and understanding of statistical
methods
Increased access to statistical software e.g. free R
programs
Better choices of sensory methods coupled with better
methods of statistical analysis

A basic understanding of statistics is beneficial


Statistics can be fun!
C.C. Gilbert
Nordic Workshop, 6 Oct 2011

Thank you for your attention!


Questions?
Chantal Gilbert
c.gilbert@campden.co.uk
+44 (0)1386 842256
Chipping Campden
Gloucestershire
GL55 6LD
England
www.campden.co.uk
C.C. Gilbert
Nordic Workshop, 6 Oct 2011