Sie sind auf Seite 1von 10

Program & Bibliographie

- 3(1,2): ~5 theory (301, B2) +10 practice (Comp. Chem. Lab by gro up)

TIN HC TRONG CNTP


Nguyn Ho Hong Dng, ng, PhD. Trng i hc Bch khoa Tp. Tp. HCM

- Website: www2.hcmut.edu.vn/~dzung / (available from Sep 15)

- R: www.rwww.r-project.org

NHDzung Lesson 1, slide 2

Problem
Foreign and Vietnamese Cheeses : Quality and Preference ? HowHow-to Conduct a research 1. Sampling 2. Measurement 3. Collect data * 4. Analysis and present your results *
Sensory practices

NHDzung Lesson 1, slide 3

NHDzung Lesson 1, slide 4

1-1. Samples and Populations


A population consists of the set of all measurements in which the investigator is interested. A sample is a subset of the measurements selected from the population. A census is a complete enumeration of every item in a population.

Simple Random Sample


Sampling from the population is often done randomly, randomly, such that every possible sample of equal size (n) will have an equal chance of being selected. A sample selected in this way is called a simple random sample or just a random sample. sample.

NHDzung Lesson 1, slide 5

NHDzung Lesson 1, slide 6

Samples and Populations

Problem
Foreign and Vietnamese Cheeses : Quality and Preference ? HowHow-to Conduct a research 1. Sampling 2. Measurement 3. Collect data * 4. Analysis and present your results *
Sensory practices

Population (N)
NHDzung Lesson 1, slide 7

Sample (n)

NHDzung Lesson 1, slide 8

Measurements
The assigning of numbers to the values of a variable (SS Stevens, Science 1946;103:677 -80) Rules specify procedures to assign numbers to values

The criteria of science


Science
Logic, experimental evidence Results are repeatable Falsiability* Falsiability* PeerPeer-reviewed journals Evolution / learn from mistakes

Pseudoscience
Belief, loyalty Results are not repeatable Not falsifiable Not in peer reviewed journals Constant, unchanged belief

*capable of being tested (verified or falsified) by experiment o r observation


NHDzung Lesson 1, slide 9 NHDzung Lesson 1, slide 10

Criteria of measurements
Validity measures what it purports to Accuracy - the degree of truthfulness truthfulness of an attribute that is
being measured.

Accuracy vs reliability (precision)

Reliability (consistency and repeatability) Sensitivity to important variation precision

accuracy Measurement error decreases the accuracy of measurement


NHDzung Lesson 1, slide 11 NHDzung Lesson 1, slide 12

Some important concepts: Data - Variables Scales


Qualitative - Categorical Frequency or Nominal: Examples areare Color Gender Nationality

Quantitative - Measurable or Countable:

THNG TIN CHUNG 1.1 M t ngi tr li phng vn 1.1.1 Gii tnh ca ngi c phng vn?1. Nam 1. c thn Tnh trng hn nhn: 1.1.2 Tui ca ngi c phng vn? Di 25 tui 25 30 tui 31 54 tui >55 tui 1.1.3 Xin ng/B ng/B cho bit ngh nghip hin nay ? Hc sinh, sinh, sinh vin Bc s/gi /gio vin Cng nhn/ nhn/ lao ng lm thu/b thu/bn hng Hu tr tr

2. N 2. C gia nh

Examples areare Temperatures Humidity Gross compounds Preference points scored on a 100 point

1.1.4 ng/B ng/B cho bit thu nhp ca gia nh ng/B ng/B mc no sau y 1 . Thp ( 2 triu ng v < 5 triu) 2 . Trung bnh ( 5 triu v <8 triu) 3 . Cao ( 8 triu)
NHDzung Lesson 1, slide 13 NHDzung Lesson 1, slide 14

Some important concepts: Data - Variables Scales


8 phomat , EdamH, phomat (EdamF (EdamF, EdamH, GoudaH, GoudaH, m1, m2, m3, m4, m5) m5) 11 ngi th (chuyn gia) 3 ln lp li 15 thut ng m t: sour bitterness umami salty greasiness butter_odor milk_odor acrid rancid lactic cheese_flavor acetic full flavor yellow hard Thang im khng cu tr trc t 0-100 mm
NHDzung Lesson 1, slide 15 NHDzung Lesson 1, slide 16

Variable Measurement scales Discrete variables Nominal scales ? (Label) Continuous variables Ordinal scales (Ranks in Army) Independent variables Inteval scales (Celsius, Dependent variables
Fahrenheit)

Ration scales (true zero


point, ratio)

Types of measurement
Qualitative Qualitative
(( nh t) nhch ch t)

Qualitative measurements
Nominal level Ordinal level
Classification + Ordering A set of numbers can be assigned rank values and nothing more. Ex: socio-economic status, education, levels of satisfaction, etc Classification A set of objects can be classified into exhaustive, mutually exclusive and unique symbol Ex: religion, sex, location, etc

Quantitative Quantitative
(( nh ng) nhl l ng)

Nominal Ordinal

Interval Ratio

NHDzung Lesson 1, slide 17

NHDzung Lesson 1, slide 18

Quantitative measurements
Interval level
Classification + Ordering + Standard distance A set of objects can be described by units that indicate how far one case is from another case Ex: temperature

Problem
Foreign and Vietnamese Cheeses : Quality and Preference ? HowHow-to Conduct a research 1. Sampling 2. Measurement 3. Collect data * 4. Analysis and present your results *
Sensory practices

Ratio level
Classification + Ordering + Standard distance + Natural zero Quantitative variable with natural zero Ex: income, age, weight, bone mineral density

NHDzung Lesson 1, slide 19

NHDzung Lesson 1, slide 20

1.2.2. ng/B ng/B cho bit loi pho mt cng no m ng/B ng/B thng s dng Cheddar Gouda Edam Emental Kh Khc (ghi r).. .. 1.2.4. ng/B thch chung i vi sn phm ph ph mt ng/B cho bit mc a th bn cng 1 2 3 4 5 6 7 8 9 1.2.5. Xin ng/B ph mt bn cng. ng. ng/B cho bit tn s s dng sn phm ph > 3 ln/tun 1 2 ln/tun 1-3 ln/th n/thng 1.2.6. Xin ng/B ng/B cho bit lng ph ph mt bn cng s dng trong tun ca ng/B ng/B < 100g 100 300g > 300g

1.2.7. Theo ng/B ng/B ph ph mt c ng n v i sn phm no? Bnh m Bnh sandwich Salad Bnh biscuit Ru vang Kh Khc (ghi r tn) 1.2.8. Khi chn mua sn phm ph ph mt c ng, ng, ng/B ng/B cho bit mc quan tm i vi nhng y u t sau y (1=r (1=rt khng quan tm, tm, 2=khng 2= khng quan tm, tm, 3=khng 3=khng kin, 4=quan 4=quan tm, tm, 5=r 5=rt quan tm) tm) Gi 1 2 3 4 5 Gi c Tnh cht cm quan ca sn phm 1 2 3 4 5 Mc quen thu c 1 2 3 4 5 Thun li khi s dng 1 2 3 4 5 C li cho sc kho 1 2 3 4 5 Khi lng sn phm 1 2 3 4 5

NHDzung Lesson 1, slide 21

NHDzung Lesson 1, slide 22

8 phomat , EdamH, phomat (EdamF (EdamF, EdamH, GoudaH, GoudaH, m1, m2, m3, m4, m5) m5) 11 ngi th (chuyn gia) 3 ln lp li 15 thut ng m t: sour bitterness umami salty greasiness butter_odor milk_odor acrid rancid lactic cheese_flavor acetic full flavor yellow hard Thang im khng cu tr trc t 0-100 mm
NHDzung Lesson 1, slide 23 NHDzung Lesson 1, slide 24

Summary Measures Population Parameters Sample Statistics


judge
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11

session
1 1 1 1 1 1 1 1 1 1 1

product
m1 m1 m1 m1 m1 m1 m1 m1 m1 m1 m1

sour
50 100 32 30 60 30 50 32 78 55 62

bitterness
18 65 11 10 23 35 32 23 27 30 21

umami
0 40 35 25 30 25 45 40 45 34 43

salty
40 100 4 1 29 50 64 40 21 18 32
l

Measures of Central Tendency

Measures of Variability

Median Mode Mean

Range Variance Standard Deviation


Other summary measures: Skewness Kurtosis
NHDzung Lesson 1, slide 26

NHDzung Lesson 1, slide 25

1-3. Measures of Central Tendency or Location


Median Middle value when sorted in order of magnitude 50th percentile Most frequentlyoccurring value Average
NHDzung Lesson 1, slide 27

Arithmetic Mean or Average


The mean of a set of observations is their average - the sum of the observed values divided by the number of observations. Population Mean
=

Mode Mean

Sample Mean
x=

x
i =1

x
i =1

NHDzung Lesson 1, slide 28

Arithmetic Mean or Average


Affected by outliers

Median
Robust parameter of central tendency Non affected by outliers

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

0 1 2 3 4 5 6 7 8 9 10 12 14

Means = 5

Means = 6

Median = 5

Median = 5

NHDzung Lesson 1, slide 29

NHDzung Lesson 1, slide 30

Mode

Measures of Central Tendency or Location


Mean :

x =

1 n
i

x
i =1

x1 + x 2 + K + x n n

x =
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 n

nx
i i =1

n1 x1 + n2 x 2 + K + nk x k n
Sample size

0 1 2 3 4 5 6

Median :

Mode = 9

Without Mode

med ( x ) = x ( p + 1) = x ( p ) + x ( p + 1) 2

si si

n = 2p + 1 n = 2p

NHDzung Lesson 1, slide 31

NHDzung Lesson 1, slide 32

Mean or Median ?
Outliers : median Many of ex aequo (variable discrete) : mean

Quartiles
The value of the boundary at the 25th, 50th, or 75th percentiles of a frequency distribution divided into four parts, each containing a quarter of the population

25%

25%

25%

25%

Position of ith quartile

( Q1 )

( Q2 )

( Q3 )
( Qi ) =

Position Position of Q1 =

1 ( 9 + 1) 4

= 2.5

Q1 =

(12 + 13 ) = 12.5
2

i ( n + 1) 4

Data classified in increasing order : 11 12 13 16 16 17 18 21 22


NHDzung Lesson 1, slide 33 NHDzung Lesson 1, slide 34

1-4. Measures of Variability or Dispersion


Range Difference between maximum and minimum values Variance Mean* squared deviation from the mean Standard Deviation Square root of the variance

Dispersion
Range :

Range ( x ) = x( n ) x (1)
Range = 12 - 7 = 5

Range = 12 - 7 = 5

10

11

12

10

11

12

q0.75 q0.25
Intervalle interquartile :

Definitions of population variance and sample variance differ slightly .


NHDzung Lesson 1, slide 35 NHDzung Lesson 1, slide 36

Mean (average)
Given a series of values xi (i = 1, , n): x1, x2, , xn, the mean is: 1 n
x= n

Variation
xi
i =1

Study 1: the color scores of 6 consumers are: 6, 7, 8, 4, 5, and 6. The mean is: n

The mean does not adequately describe the data. We need to know the variation in the data. An obvious measure is the sum of difference from the mean:
For study 1, the scores 6, 7, 8, 4, 5, and 6, we have: (6(6-6) + (7(7-6) + (8(8-6) + (4(4-6) + (5(5-6) + (6(6-6) =0+1+221+0 =0 NOT SATISFACTORY!
NHDzung Lesson 1, slide 38

x=

1 6 + 7 + 8 + 4 + 5 + 6 36 = =6 xi = 6 6 n i =1

Study 2: the color scores of 4 consumers are: 10, 2, 3, and 9. The mean is: 1 n 10 + 2 + 3 + 9 24

x=

xi = n i =1

=6

NHDzung Lesson 1, slide 37

Sum of squares
We need to make the difference positive by squaring them. This is called Sum of squares (SS) For study 1: 6, 7, 8, 4, 5, 6, we have: SS = (6(6-6)2 = (5-6)2 + (6(4-6)2 + (5(8-6)2 + (4(7-6)2 + (8(6-6)2 + (710 For study 2: 10, 2, 3, 9, we have: SS= (10(9-6)2 = 50 (3-6)2 + (9(2-6)2 + (3(10-6)2 + (2This is better! But it does not take into account sample size n.
NHDzung Lesson 1, slide 39

Variance
We have to divide the SS by sample size n. But in each square we use the mean to calculate the square, so we lose 1 degree of freedom. Therefore the correct denominator is n-1. This is called variance (denoted by s2)

s2 =

(x1 x )2 + (x 2 x )2 + ... + (x n x )2
n 1

Or, in the sum notation:


s2 = 1 n 2 ( xi x ) n 1 i =1
NHDzung Lesson 1, slide 40

1-5. Variance and Standard Deviation


Population Variance Sample Variance

Variance - example
For study 1: 6, 7, 8, 4, 5, and 6, the variance is:
s2 =

2 =

(x )2
i =1

s =
2

(x x)
i =1

(6 6 )2 + (7 6 )2 + (8 6 )2 + (5 6 )2 + (6 6 )2
6 1

(n 1)
2

10 =2 5

= =

x
i =1

( x)

N i =1

2
n

= s=
NHDzung Lesson 1, slide 41

x
i =1 2

( )
n x i =1

For study 2: 10, 2, 3, 9, the variance is:


s2 =

(10 6 )2 + (2 6 )2 + (3 6 )2 + (9 6 )2
4 1

(n 1)
s

50 = 16 .7 3

The scores in study 2 were much more variable than those in study 1.
NHDzung Lesson 1, slide 42

Standard deviation
The problem with variance is that it is expressed in unit squared, whereas the mean is in the actual unit. We need a way to convert variance back to the actual unit of measurement. We take the square root of variance this is called standard deviation (denote by s) For study 1, s = sqrt(2) = 1.41 For study 2, s = sqrt(16.7) = 4.1
NHDzung Lesson 1, slide 43

Standard Deviation
Data A
11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 3.338

Data B
11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = .9258 Mean = 15.5 s = 4.57

Data C
11 12 13 14 15 16 17 18 19 20 21

NHDzung Lesson 1, slide 44

1-6 Form indicators: Skewness & Kurtosis


Skewness
Measure of asymmetry of a frequency distribution

Skewness
Skewed to left
Mean < median < mode
3 0

Measure of flatness or peakedness of a frequency distribution

F re q ue nc y

Skewed to left Symmetric or unskewed Skewed to right Kurtosis Platykurtic (relatively flat) Mesokurtic (normal) Leptokurtic (relatively peaked)
NHDzung Lesson 1, slide 45

2 0

1 0

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0

NHDzung Lesson 1, slide 46

Kurtosis
Platykurtic - flat distribution
7 0 0

Kurtosis
Mesokurtic - not too flat and not too peaked
5 0 0

6 0 0 4 0 0 5 0 0

F re q u e n c y

4 0 0 3 0 0 2 0 0

F re q u e n c y

3 0 0

2 0 0

1 0 0 1 0 0 0 - 3 .5 - 2 .7 - 1 .9 - 1 .1 - 0 .3 0 .5 1 .3 2 .1 2 .9 3 .7 0 -4 -3 -2 -1 0 1 2 3 4

NHDzung Lesson 1, slide 47

NHDzung Lesson 1, slide 48

Diagram

Quantitative variable

NHDzung Lesson 1, slide 49

NHDzung Lesson 1, slide 50

Quantitative variable
If we want to see in detail: 21 freq. between 1.65 m & 1.70 m distribute in 8 in [1.65 ; 1.675] & 13 in [1.675 ; 1.70]

Quantitative variable : boxplot


x x

Plus grande valeur infrieure q 0.75 +1.5(q 0.75 - q 0.25) q 0.75 Median q 0.25 Plus petite valeur suprieure q 0.25 -1.5(q 0.75 - q 0.25)
x

?
NHDzung Lesson 1, slide 51

Bote moustaches
NHDzung Lesson 1, slide 52

Form indicators
1 < 0
Asymetry Symetry

Principes of good figure


1 > 0
Asymetry
Biu din kt qu phc tp mt cch r rng, ng, ch chnh xc v hiu qu Tr Trnh by nhiu tng mt cch hiu qu nht Khng ni di !

Q1

Q 2 Q3

Q1 Q 2Q3

Q1 Q2

Q3

NHDzung Lesson 1, slide 53

NHDzung Lesson 1, slide 54

A BAD figure
Fig.
Digestion interactions of coral
ri da e i or te s (M ) us si da e A y lc ac ea ns i or te s (B ) A a lg e F i av id

A GOOD figure
Figure 3. Digestion interactions for coral taxa sampled at Pioneer Bay, Orpheus Island
60

Frequency

A
120 110 100 90 80 70 60 50 40 30 20 10 0

o cr

po

on

ae S

n po

ge

Wins Losses

50 40 30 20 10 0

Freq.

Wins

Losses

op cr A

ae id or

) (M es rit o P

ae sid us M

an ce na yo lc A

( es rit Po

B)

ae lg A

ae id vi Fa

o Sp

es ng

Taxon

NHDzung Lesson 1, slide 55

NHDzung Lesson 1, slide 56

10