Sie sind auf Seite 1von 54

Practical Statistics and Experimental Design

02 WEDNESDAY 11TH APRIL 2018

Descriptive Statistics
Distributions

DR. CHRISTIAN KLUTH Von-Siebold-Str. 8


Georg-August-Universität Göttingen 37075 Göttingen
Lehrkraft für besondere Aufgaben Tel.: 0551/394356
Statistikberatung für Studierende E-Mail: ckluth@uni-goettingen.de
Department für Nutzpflanzenwissenschaften Appointment by arrangement
The Research Process
Identify appropriate Study System, factors subsampling vs biological replicates
Desire of sample size Primary
and variables, define factor levels Measure variable
improvement Study Unit
Problem Distributions (dep. vs. indep.) Measurement Error
Literature Data types
Field Variables Effect size
study Sampling
Initial Scientific Experimental Randomi-
Observation Hypothesis set up sation
Expertise Experimental Ceteris paribus
Logic of design Software
statistical inferenceAlways consider: skills
- Population
Alternative
- Feasibility Field plan,
Unbiased Graphics Scientific Hypothesis HA
- Relevance Data Data structure,
and Relevant Tables writing/ vs.
Collection Paper form for
Conclusion presentation Null Hypothesis H0 Avoid:
Software Text - Block-Treatment field work
skills Confounding
- HARK
Software Data
Data management
skills
analysis
Model Significance test, Statistical Data
Assumptions Statistical Modelling Hypothesis Processing
Statistical Software
tool box skills
Descriptive
Key numbers
statistics
Graphical Data exploration

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 2
Distributions
Generally, description of how things are spread in space.
e.g. species distribution
=> likely or unlikely to find the
species in a certain area
Eranthis hyemalis seedlings 10th April 2018

https://en.wikipedia.org/wiki/Speci
High density of seedlings es_distribution

high probability of finding


a seedling in this area

Low density of https://upload.wikimedia.


org/wikipedia/commons/b
seedlings /b4/Winterling-Bluete-
low probability of 70.jpg

finding a seedling in this


area
Foto: C. Kluth

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 3
Descriptive Statistics for
Univariate Sample distributions
Measure of central tendency
-central or typical value of a distribution => expected value
- depending on the data type
-> Mode: most common value among a group (at least nominal scaled)
-> Median: middle score of sorted data (at least ordinal scaled)
-> Arithmetic mean (𝑥 ): sum of numerical values divided by
number of values (at least interval scaled)
-> Geometric mean: nth root of the product of n numbers (rational scaled)
-> …
Measure of dispersion
-variability of a distribution i.e. measure of how likely it is to get the central
value
- depending on the data type
-> e.g. Range from minimum value to maximum value (at least ordinal scaled)
- for other scaled data see later

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 4
Distributions

Theoretical distribution
expected distribution based on the character of the interesting variable and on
the sampling process

Sample distribution
the observed distribution within one sample

Sampling distribution
bunch of several sample distributions (the distribution of sample mean)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 5
Distributions

In data science the distribution of any random variable can by displayed by a histogram
It is a graphical display of numerical data in the form of upright bars, with the area of
each bar representing frequency expressed as relative frequency it can be directly
interpreted as probability to get an observation in a certain range if an object is
randomly chosen from that distribution. Cumulative frequency gives the summed
probability of getting a certain value or lower given a (sample) distribution.

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 6
Distributions
Random Variables
Any Random process leads to a random variable
eg. flipping coin, rolling dice, how much rain will fall to morrow, how tall is a
randomly selected plant
Random variables describe the outcome of a random process in numbers
discrete RV
Random Variable
continuous RV

Coin flip Sum of rolling Counts of seedlings/spores


5 times dice

1 if heads 5 0
X1 X2 ... X3 ...
0 if tail 30 max
P(X1=1)=0.5, P(X2=5)=1/((1/6)^5) P(X3=12)=? has to be estimated.
P(X1=0)=0.5 =0.000128 from samples

Xi are discrete RV => the probability of an exact outcome can be given


C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 7
Discrete Random Variables
Binomial distribution B(n,p)
Consider a series of n independent trials (Bernulli experiment). Assume that there are only two possible
outcomes (events: success of failure) at each trial, and that p is the probability of a success. This implies that
the probability of a failure at each trial is 1-p = q.
The typical example for that situation is tossing a coin (heads or tails(value)). The probabilities of getting a
certain result after a given number of trials can be exemplified by multiplying the single probabilities in a
probability tree:
1st trial 2nd trial 3rd trial Result Pr (Result) k success k Pr(k)

0.5 h hhh 0.125 0 0 0.125


h 1 0.375
0.5 0.5 t hht 0.125 1 2 0.375
h 3 0.125
0.5 0.5 h hth 0.125 1
0.5 t
0.5 t htt 0.125 2

0.5 h thh 0.125 1


0.5 h
0.5 0.5 t tht 0.125 2
t
0.5 0.5 h tth 0.125 2
t
0.5 t ttt 0.125 3

𝑛 𝑘 𝑛−𝑘
Probability function: Pr 𝑘 = 𝑝 𝑞 => R: dbinom(x, size, prob, log = FALSE)
𝑘
𝑛x =k, size=n,
𝑛!
prob=p (no of successes)
= 𝑘! 𝑛−𝑘 ! , the Binomial coefficent (meaning: possible combinations of getting k
𝑘
successes out of n trials) => R: choose(n, k)
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 8
Discrete Random Variables
Binomial distribution
Probability tree for sampling 3 plants in an inoculated field assuming that the probability of successful
inoculation is 0.2

successfull
1st trial 2nd trial 3rd trial Result Pr (Result) no. helaty plants Pr(k)
infections

0.8 h hhh 0.512 3 0 0.512


h 1 0.384
0.8 0.2 i hhi 0.128 2 2 0.096
h 3 0.008
0.2 0.8 h hih 0.128 2
0.8 i
0.2 i hii 0.032 1

0.8 h ihh 0.128 2


0.2 h
0.8 0.2 i ihi 0.032 1
i
0.2 0.8 h iih 0.032 1
i
0.2 i iii 0.008 0

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 9
Discrete Random Variables
0,4
Binomial distribution
𝑛 𝑘 𝑛−𝑘
0,4 Pr 𝑘 = 𝑝 𝑞 The function is defined only
𝑘

Obs. relative Frequency of k success


0,35
0,35 n= 4 at integer values of k.
Observed relative Frequency

series= 48 0,3 The connecting lines are


0,3
p= 0,532 n= 4
0,25
0,25 series= 48 only guides for the eye.
of k success

Pr(k)
0,2 p= 0,532
0,2 Mean= 2,00

0,15
0,15 E(Y)= 2,13 E(Y)=n*p
0,1
0,1 Var(Y)=npq
0,05
0,05
0
0
01234
01234
Nomber of Successes (k)
Nomber of Successes (k)

0,5 𝑛 𝑘 𝑛−𝑘
0,5 Pr 𝑘 = 𝑝 𝑞
𝑘
Obs. relative Frequency of k success

0,45
0,45
n= 6 0,4
Observed relative Frequency

0,4 series= 86
p= 0,866 0,35 n= 6
0,35
0,3 series= 86
of k success

0,3
Pr(k)

0,25 p= 0,866
0,25 Mean= 5,17
0,2 E(Y)= 5,20
0,2
0,15
0,15 further examples in Excel
0,1
0,1
0,05 Optional Exercise for R Experts:
0,05
0  try to program in R
0
0123456
0123456
Nomber of Successes (k)
C. Kluth Nomber of Successes
Practical (k)
Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 10
Check up 02a

Working with the binomial distribution


Four seeds are planted and treated in the same
way. Suppose each has the probability of 0.8 of
germination. Find the probability distribution of the
number of seeds germinating. In other words, for
each value of k from 0 to 4 find P(k).
(from Clewer & Scarisbrick 2001)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 11
Discrete Random Variables
Poisson distribution P(λ)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 12
Discrete Random Variables
Poisson distribution P(λ)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 13
Discrete Random Variables
Poisson distribution

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 14
Discrete Random Variables
Poisson distribution

=> Show further examples in Excel

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 15
Descriptive Statistics for Metric Data

Living histogram
Connecticut State Agricultural College
(J. Heredity 5:511–518, 1914)
http://jhered.oxfordjournals.org/content/95/5/365.full.pdf

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 16
Original List
Stud_No Height Stud_No Height Stud_No Height Stud_No Height
Ordered List
Order Stud_No Height Order Stud_No Height Order Stud_No Height Order Stud_No Height
1 67.04 33 65.75 65 67.11 97 63.38 1 61 57.52 33 68 65.40 65 106 67.26 97 10 69.78
2 67.74 34 64.71 66 69.78 98 69.27 2 43 60.73 34 53 65.45 66 100 67.35 98 66 69.78
3 66.72 35 68.92 67 65.64 99 69.10 3 26 61.63 35 87 65.47 67 28 67.54 99 86 69.87
4 68.56 36 71.09 68 65.40 100 67.35 4 7 61.84 36 6 65.58 68 59 67.58 100 103 69.96
5 65.68 37 67.69 69 72.70 101 71.36 5 20 62.10 37 67 65.64 69 83 67.58 101 17 70.06
6 65.58 38 67.05 70 64.85 102 68.96 6 71 62.49 38 5 65.68 70 37 67.69 102 123 70.08
7 61.84 39 67.80 71 62.49 103 69.96 7 46 62.60 39 40 65.70 71 91 67.71 103 108 70.48
8 65.84 40 65.70 72 67.77 104 66.79 8 76 62.69 40 77 65.70 72 2 67.74 104 112 70.48
9 64.88 41 68.08 73 63.01 105 67.80 9 73 63.01 41 15 65.72 73 72 67.77 105 95 70.72
10 69.78 42 64.58 74 67.10 106 67.26 10 80 63.06 42 33 65.75 74 39 67.80 106 122 70.75
11 66.61 43 60.73 75 68.43 107 63.62 11 120 63.18 43 25 65.81 75 105 67.80 107 90 70.80
12 64.86 44 66.45 76 62.69 108 70.48 12 97 63.38 44 8 65.84 76 52 67.83 108 118 70.89
13 69.00 45 65.89 77 65.70 109 68.98 13 31 63.58 45 45 65.89 77 119 67.90 109 21 70.97
14 64.69 46 62.60 78 72.02 110 66.08 14 107 63.62 46 111 65.92 78 41 68.08 110 23 71.06
15 65.72 47 72.55 79 68.36 111 65.92 15 126 63.70 47 110 66.08 79 84 68.11 111 36 71.09
16 69.31 48 66.12 80 63.06 112 70.48 16 121 63.95 48 48 66.12 80 113 68.30 112 96 71.14
17 70.06 49 71.24 81 64.40 113 68.30 17 94 64.29 49 85 66.31 81 79 68.36 113 49 71.24
18 71.62 50 64.45 82 69.30 114 71.40 18 81 64.40 50 55 66.32 82 117 68.36 114 101 71.36
19 72.20 51 66.77 83 67.58 115 71.41 19 50 64.45 51 44 66.45 83 75 68.43 115 114 71.40
20 62.10 52 67.83 84 68.11 116 72.90 20 58 64.46 52 30 66.59 84 4 68.56 116 57 71.40
21 70.97 53 65.45 85 66.31 117 68.36 21 42 64.58 53 11 66.61 85 35 68.92 117 115 71.41
22 69.10 54 72.72 86 69.87 118 70.89 22 88 64.67 54 27 66.65 86 102 68.96 118 18 71.62
23 71.06 55 66.32 87 65.47 119 67.90 23 14 64.69 55 3 66.72 87 109 68.98 119 124 71.80
24 69.66 56 65.32 88 64.67 120 63.18 24 32 64.71 56 51 66.77 88 13 69.00 120 78 72.02
25 65.81 57 71.40 89 66.80 121 63.95 25 34 64.71 57 104 66.79 89 22 69.10 121 19 72.20
26 61.63 58 64.46 90 70.80 122 70.75 26 63 64.80 58 89 66.80 90 99 69.10 122 47 72.55
27 66.65 59 67.58 91 67.71 123 70.08 27 70 64.85 59 1 67.04 91 98 69.27 123 69 72.70
28 67.54 60 67.13 92 67.20 124 71.80 28 125 64.85 60 38 67.05 92 82 69.30 124 54 72.72
29 69.38 61 57.52 93 74.27 125 64.85 29 12 64.86 61 74 67.10 93 16 69.31 125 116 72.90
30 66.59 62 64.96 94 64.29 126 63.70 30 9 64.88 62 65 67.11 94 29 69.38 126 93 74.27
31 63.58 63 64.80 95 70.72 31 62 64.96 63 60 67.13 95 64 69.64
32 64.71 64 69.64 96 71.14 32 56 65.32 64 92 67.20 96 24 69.66
𝒔𝒖𝒎 𝟖𝟒𝟖𝟎.𝟕𝟕
mean= 𝒏 = = 𝟔𝟕. 𝟑𝟏 median= 𝟔𝟕. 𝟏𝟔 𝒎𝒆𝒂𝒏 𝒐𝒇 𝒕𝒉𝒆 𝒕𝒘𝒐 𝒎𝒊𝒅𝒅𝒍𝒆 𝒗𝒂𝒍𝒖𝒆𝒔 𝒊𝒇 𝒏 𝒊𝒔 𝒆𝒗𝒆𝒏
𝟏𝟐𝟔
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 17
𝒓𝒂𝒏𝒈𝒆 𝒇𝒓𝒐𝒎 𝟓𝟕. 𝟓𝟐 𝒕𝒐 𝟕𝟒. 𝟐𝟕 or range = 74.27-57.52=16.75
Descriptive Statistics for metric data

midpoi Abs. Rel. Cumuati


Class nt Tally List Freq Freq ve Freq.
58 - 59 58.5| 1 0.0079 0.0079
59 - 60 59.5 0 0.0000 0.0079
60 - 61 60.5 0 0.0000 0.0079
61 - 62 61.5| 1 0.0079 0.0159 Histogram
62 - 63 62.5|| 2 0.0159 0.0317
1. Categorize data
63 - 64 63.5|||| 4 0.0317 0.0635
(usually in equal sized classes,
64 - 65 64.5|||| ||| 8 0.0635 0.1270
|||| |||| correct class: e.g. 58≤x<59)
65 - 66 65.5|||| 15 0.1190 0.2460 2. Count how many fall into a class
|||| ||||
66 - 67 66.5|||| 15 0.1190 0.3651 => absolute frequency
67 - 68 67.5|||| |||| || 12 0.0952 0.4603 3. Divide absolute frequency
|||| |||| by total number of observation
68 - 69 68.5|||| |||| 19 0.1508 0.6111 4. Draw bar chart with out spaces
69 - 70 69.5|||| |||| 10 0.0794 0.6905
70 - 71 70.5|||| |||| ||| 13 0.1032 0.7937
71 - 72 71.5|||| |||| 9 0.0714 0.8651 Relative frequency can be
72 - 73 72.5|||| |||| 10 0.0794 0.9444 interpreted as the probability of
73 - 74 73.5|||| | 6 0.0476 0.9921 random sample falling in a
74 - 75 74.5 0 0.0000 0.9921 certain class given that specific
sample distribution
75 - 76 75.5| 1 0.0079 1.0000

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 18
Displaying empirical univariate distributions:
Histogram -> Cumulative Distribution

The cumulative distribution can be displayed


even without classifying the data before
=> experimental cumulative distribution
function R: ecdf()

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 19
Statistical Key Numbers, Quantiles
To describe the distribution with few numbers
given the cumulative frequency the relative proportion up to a certain value can easily
depicted from the graph
e.g. 40% of the data (students) have a height lower then 66 inch
A quantile Qt gives the percentage of the data that lies under the cut point of t%.
Quantiles have the property that both measures of central tendency and of dispersion
can be derived

the Q50 quantile cuts the


distribution in to two halves
also called the median (for
sample size with odd number, for even
numbered sample size the average of
the two nearest neighbours)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 20
Displaying empirical univariate distributions:
Histogram -> Cumulative Distribution
Example: Height of 16 randomly selected maize plants

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
xi 175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
7
mid- Abs. Rel. Cumuative
6
Class point Freq Freq Freq.
Absolute Frequency

5
130-< 140 135 1 0.0625 0.0625
4
140-< 150 145 0 0.0000 0.0625
3
150-< 160 155 2 0.1250 0.1875
2
160-< 170 165 5 0.3125 0.5
1
170-< 180 175 6 0.3750 0.875
0
135 145 155 165 175 185 195
180-< 190 185 1 0.0625 0.9375
Height Class 190-< 200 195 1 0.0625 1
0,40 1

0,35 0,9
0,8
0,30 Cumulative Frequency
Relative Frequency

0,7
0,25 0,6
0,20 0,5

0,15 0,4
0,3
0,10
0,2
0,05 0,1
0,00 0
135 145 155 165 175 185 195 135 145 155 165 175 185 195
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution
Height Class Height Class 11 April 2018, p. 21
Example Quantiles (also called
Percentile)
Given a sample of size n=16 maize plants

ordered list
175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162

Order j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
xj 136 154 157 162 163 164 165 167 172 175 175 176 177 179 186 194
or Quartiles x x x x x x x x x x x x x x x x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Q25= Q50= Q75=


Q0=! Q100=!
(162+163)/2 median (176+177)/2
min=136 =1176,5.5 max=149
=163 (167+172)/2
=169.5

Q0 Q1 Q2 Q3 Q4
measure of
range → = Q4 − Q0=max-min central tendency
interquartile range,
→ IQR = Q3 − Q1 measure of
middle fifty dispersion
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 22
Displaying empirical univariate
distributions: Quantile Plot, Box Plot
for each order number the t%-quantile can be determined

Order j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
t%=(j-0.5)*100%/n 3.1 9.4 16 22 28 34 41 47 53 59 66 72 78 84 91 97
xj 136 154 157 162 163 164 165 167 172 175 175 176 177 179 186 194

Quantile Plots
Scatter plot of x (height) against • Box-Whisker-Plot
=> corresponds to the cumulative
respective Quantiles or vice versa Displaying Quartiles of the
frequency but data not categorized
sample distribution
200 100
90 Q4
190
80
180 70 Q3
Quantile [%]
Height [cm]

170 60

160
50
40
Q2
30
150
20 Q1
140 10
130 0 Q0
0 20 40 60 80 100 130 140 150 160 170 180 190 20
Quantile [%] Height [cm]

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 23
Properties of mean and median

Arithmetic mean ̅x n
x x
i 1
i /n

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
xi 175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
Mean 168.9 Median 169.5
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
xi 300 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
Mean 176.7 Median 169.5

the mean value is sensitive


for outliers
the median is robust against
outliers

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 24
Key Numbers describing Dispersion
for Metric Data
- Variance σ2, S2
- Standard deviation σ, S
- Standard error SE
- Coefficient of variation cv
- (Kurtosis and Skewness)
- Confidence interval (details later)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 25
Dispersion for Metric Data

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Sum of Squares
15 16
xi 175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
mean 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9
xi-mean 6.125 3.125 10.13 -1.88 -5.88 -14.9 -32.9 -4.88 -11.9 8.125 17.13 -3.88 6.125 25.13 7.125 -6.88
(xi-mean)^2 37.52 9.766 102.5 3.516 34.52 221.3 1081 23.77 141 66.02 293.3 15.02 37.52 631.3 50.77 47.27
SS 2796
200

190 Mean
Plant height x [cm]

180 absolute deviations =


170 residuals

160

150

140

130
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Observation No i

different ways to compute (displacement law)


𝑛 2
𝑛 2 𝑛 2 𝑖=1 𝑥𝑖 𝑛
SS = 𝑖=1 𝑖𝑥 − 𝑥 = 𝑥
𝑖=1 𝑖 − = 𝑖=1 𝑥𝑖 2 − 𝑛𝑥 2
𝑛
xi^2 30625 29584 32041 27889 26569 23716 18496 26896 24649 31329 34596 27225 30625 37636 30976 26244
Σxi 2702
(Σxi)^2 7300804
Σ(xi^2) 459096
SS 2796
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 26
Variance=Mean Squared Deviation
• Since SS increases with n it is not yet a good measure for dispersion
=> divide by n and n-1 resp.

𝑛 1 𝑆𝑆
• Population 𝜎 2 = 𝑖=1 𝑥𝑖 − 𝑥 2
∗ = = 𝑀𝑆
𝑛 𝑛
𝑛 1 𝑆𝑆
• Sample 𝑠 2 = 𝑖=1 𝑥𝑖 − 𝑥 2
∗ = = 𝑀𝑆
𝑛−1 𝑑𝑓
xi = value of ith observation
x̅ = mean value
n = number of observations (sample size)
df= degree of freedom df=(n-1) number of independent value that are free to vary in a given system
SS = sum of squares

𝟐𝟕𝟗𝟔
𝑀𝑆 = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = = 𝟏𝟖𝟔. 𝟒
𝟏𝟓

• Unit is the unit of the meassure data squared


C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 27
Standard deviation S,
Standard Error of Mean SE
𝑠 = 𝑠 2 eg. s = 186.4 = 13.65
having again the unit of the measurement variable
but still depending on the sample size (as the variance)
e.g. doubling the data set:
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
xi 175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
xi^2 30625 29584 32041 27889 26569 23716 18496 26896 24649 31329 34596 27225 30625 37636 30976 26244
i 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
xi 175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
xi^2 30625 29584 32041 27889 26569 23716 18496 26896 24649 31329 34596 27225 30625 37636 30976 26244
Σxi 5404
(Σxi)^2 2.9E+07
Σ(xi^2) 918192 S = 372.8 = 19.3
SS 5591.5 MS 372.8

Solution=> Standard Error => Standard deviation of the expected mean


S
SE =
n
13.65
eg simple data set 16 = 3.41
19.3
doubele data set = 3.41
32

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 28
Coefficient of variation cv

cv = S / x̅
=> relative standard deviation
only meaning full for ratioal scaled data

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 29
Properties of Variance based
Measures
all variance based measures of dispersion are sensitive to outliers
the range is aswell sensitive to outliers
more robust measure of dispersion is the IQR
or median of absolute deviations (mad, rarely used)
advantage:
more information (of all measurements) is used
=> comparison to theoretical distributions

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 30
Normal distribution N(µ,σ2)

A living histogram from


the Connecticut State
Agricultural College (J.
Heredity 5:511–518,
1914).

http://jhered.oxfordjournals.org/c
ontent/95/5/365.full.pdf

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 31
Normal distribution N(µ,σ2)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 32
Properties of normal distribution
Bell shaped
Unimodal
Symmetric

probability density functionf(x)


Probability density function
  X   2

f X 
1
 e 2 2

with  2
µ = mean
σ = standard deviation
describing the distribution completely
Highest density at the mean , ie the grater
x
the distance from the mean the less µ-σ µ µ+σ
values we have
the mean divides the distribution in two • Inflection point at µ-σ and µ+σ
equally sized parts • Approximately 68% of the values are
=>mean= mode= median between µ-σ and µ+σ

• In contrast to discreet distributions we do not ask for the probability of getting a certain event
or result, but for the probability of getting a value with in certain limits (intervals)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 33
Distributions of discreet vs.
continuous variables
discreet variable continuous variable
x∈ℕ x∈ℝ
Probability function Probability density function
Cumulative probability function Cumulative density function
e.g. binomial and Poisson distribution e.g. normal distribution

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 34
Normal distribution N(µ,σ2)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 35
Family of normal distributions
0,9

0,8 N(2,0.5)

Probability density function f(x)


0,7

0,6

0,5
N(0,1)
0,4

0,3

0,2 N(2,2)
0,1

0
-4 -3 -2 -1 0 1 2 3 4 5 6 7 8

x
Any number of normal distributions, that are characterized by the mean µ
and the standard deviation σ, N(µ,σ)
The area under any N(µ,σ) is 1
N(0,1) is called standard normal distribution

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 36


Normal distribution
0.977
distribution 𝑥
function F(x) 𝐹 𝑥 = 𝑓 𝑥 𝑑𝑥
Probability density function f(x)

−∞
Distribution function F(x)

Integral of the proba-


0.50 bility density function
no antiderivative
=>
probability Areas under the normal
distribution are tabulated
density function
f(x)

The cumulative distribution function (or just distribution function) of the normal
distribution gives the area under the curve that is under a certain value of x
E.g. below x=2 for N(2,0.5) we find 50 % of the values, for N(0,1) we find 97.7
% of the values below x=2

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 37


Working with the cumulative
distribution function
Assume that mean yearly milk yield of cows in Lower Saxony is normally
distributes with a mean of 6000 kg and a standard deviation of 1000 kg
(N(6000, 1000)).
If you randomly sample one cow out of the population you can answer
the question: What is the probability that the cow has a yield of 5000 kg
or less?

Wanted:
Probability density function f(x)

probability of X ≤5000;

Distribution function F(x)


given µ =6000 and σ =1000

P(X≤5000|6000,1000)=0.1587=
15.87%

Milk yield kg/year

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 38


Working with the cumulative
distribution function
What is the probability that a randomly selected cow has a yield larger
than 8000 kg?
Wanted: the probability X > 8000; given µ = 6000 and σ =1000:
P(X>8000|6000,1000)

-> (cumulative) distribution


Probability density function f(x)

function gives the area under the

Distribution function F(x)


probability density function left of
the limit of X.
But we are now interested in the
part right of X. Since the total area
under the probability density
function = 1 the area of interest is
P(X>8000|6000,1000)
= 1- P(X≤8000|6000,1000)
Milk yield kg/year
=1-0.9772=0.0228=2.28%
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 39
z-Transformation

Since there is an infinite amount of normal distributions N(µ, σ) there are no tabulated
distribution functions for these, but
any normal distribution N(µ, σ) can be transformed to the standard normal distribution
with the mean of 0 and the standard deviation of 1 N(0, 1):
𝑥−µ
𝑧=
σ
z-transformation
The distribution function of the standard normal distribution N(0, 1) is tabulated

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 40
z-transformation
Example milk yield N(6000, 1000)

N(0, 1000)

N(0, 1)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 41
Standard normal distribution
Standard normal distribution
Probability densitiy function f(x=z) and cumulative distribution function F(x=z) of the standard normal distribution, N(0;1) ,

• Because of the symmetric


1
 X   
2

0.9 1
f X  e 2 2
0.8  2
form of the normal 0.7 f(z)

distribution, negative values 0.6


0.5
F(z)

f z  
1  z 
e 2
2

2
of z are not shown in the 0.4
0.3

table. The simple relation 0.2


0.1

F(−z) = 1 − F(z) 0
-4 -3 -2 -1 0 1 2 3 4
z

can be used instead


• Two different questions can P(z) and F(z), respectively: leftsided area under the standard normal distribution less then the limit z.

be answered: z
integer.first
second decimal place

decimal 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1. Which z-value is related to 0.0
0.1
0.5000
0.5398
0.5040
0.5438
0.5080
0.5478
0.5120
0.5517
0.5160
0.5557
0.5199
0.5596
0.5239
0.5636
0.5279
0.5675
0.5319
0.5714
0.5359
0.5753

a distinct area. 0.2


0.3
0.5793
0.6179
0.5832
0.6217
0.5871
0.6255
0.5910
0.6293
0.5948
0.6331
0.5987
0.6368
0.6026
0.6406
0.6064
0.6443
0.6103
0.6480
0.6141
0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
2. Which area belongs to a 0.5
0.6
0.6915
0.7257
0.6950
0.7291
0.6985
0.7324
0.7019
0.7357
0.7054
0.7389
0.7088
0.7422
0.7123
0.7454
0.7157
0.7486
0.7190
0.7517
0.7224
0.7549
distinct z-value? 0.7
0.8
0.7580
0.7881
0.7611
0.7910
0.7642
0.7939
0.7673
0.7967
0.7704
0.7995
0.7734
0.8023
0.7764
0.8051
0.7794
0.8078
0.7823
0.8106
0.7852
0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
P(z) gives the shaded area. 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
eg. the z -value 1.23 cuts of an area of 0.8907.
Because the normal distribution curve is symmetrical, probabilities for only positive values of z are given. Negative values of z can be computed by
P(-z) = 1-P(z) eg : P(z = -2)= 1-P(z= 2) = 1-0.9772 = 0.0228

Using the z transformation for each x ~N(µ;σ) the corresponding z -value can be calculated
z = (x - µ) / σ
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 42
z P z P z p z p z p z p z p z p z p z p z p z p z p z p z p z p
-4.00 0.0000 -3.50 0.0002 -3.00 0.0013 -2.50 0.0062 -2.00 0.0228 -1.50 0.0668 -1.00 0.1587 -0.50 0.3085 0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987 3.50 0.9998
-3.99 0.0000 -3.49 0.0002 -2.99 0.0014 -2.49 0.0064 -1.99 0.0233 -1.49 0.0681 -0.99 0.1611 -0.49 0.3121 0.01 0.5040 0.51 0.6950 1.01 0.8438 1.51 0.9345 2.01 0.9778 2.51 0.9940 3.01 0.9987 3.51 0.9998
-3.98 0.0000 -3.48 0.0003 -2.98 0.0014 -2.48 0.0066 -1.98 0.0239 -1.48 0.0694 -0.98 0.1635 -0.48 0.3156 0.02 0.5080 0.52 0.6985 1.02 0.8461 1.52 0.9357 2.02 0.9783 2.52 0.9941 3.02 0.9987 3.52 0.9998
-3.97 0.0000 -3.47 0.0003 -2.97 0.0015 -2.47 0.0068 -1.97 0.0244 -1.47 0.0708 -0.97 0.1660 -0.47 0.3192 0.03 0.5120 0.53 0.7019 1.03 0.8485 1.53 0.9370 2.03 0.9788 2.53 0.9943 3.03 0.9988 3.53 0.9998
-3.96 0.0000 -3.46 0.0003 -2.96 0.0015 -2.46 0.0069 -1.96 0.0250 -1.46 0.0721 -0.96 0.1685 -0.46 0.3228 0.04 0.5160 0.54 0.7054 1.04 0.8508 1.54 0.9382 2.04 0.9793 2.54 0.9945 3.04 0.9988 3.54 0.9998
-3.95 0.0000 -3.45 0.0003 -2.95 0.0016 -2.45 0.0071 -1.95 0.0256 -1.45 0.0735 -0.95 0.1711 -0.45 0.3264 0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946 3.05 0.9989 3.55 0.9998
-3.94 0.0000 -3.44 0.0003 -2.94 0.0016 -2.44 0.0073 -1.94 0.0262 -1.44 0.0749 -0.94 0.1736 -0.44 0.3300 0.06 0.5239 0.56 0.7123 1.06 0.8554 1.56 0.9406 2.06 0.9803 2.56 0.9948 3.06 0.9989 3.56 0.9998
-3.93 0.0000 -3.43 0.0003 -2.93 0.0017 -2.43 0.0075 -1.93 0.0268 -1.43 0.0764 -0.93 0.1762 -0.43 0.3336 0.07 0.5279 0.57 0.7157 1.07 0.8577 1.57 0.9418 2.07 0.9808 2.57 0.9949 3.07 0.9989 3.57 0.9998
-3.92 0.0000 -3.42 0.0003 -2.92 0.0018 -2.42 0.0078 -1.92 0.0274 -1.42 0.0778 -0.92 0.1788 -0.42 0.3372 0.08 0.5319 0.58 0.7190 1.08 0.8599 1.58 0.9429 2.08 0.9812 2.58 0.9951 3.08 0.9990 3.58 0.9998
-3.91 0.0000 -3.41 0.0003 -2.91 0.0018 -2.41 0.0080 -1.91 0.0281 -1.41 0.0793 -0.91 0.1814 -0.41 0.3409 0.09 0.5359 0.59 0.7224 1.09 0.8621 1.59 0.9441 2.09 0.9817 2.59 0.9952 3.09 0.9990 3.59 0.9998
-3.90 0.0000 -3.40 0.0003 -2.90 0.0019 -2.40 0.0082 -1.90 0.0287 -1.40 0.0808 -0.90 0.1841 -0.40 0.3446 0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953 3.10 0.9990 3.60 0.9998
-3.89 0.0001 -3.39 0.0003 -2.89 0.0019 -2.39 0.0084 -1.89 0.0294 -1.39 0.0823 -0.89 0.1867 -0.39 0.3483 0.11 0.5438 0.61 0.7291 1.11 0.8665 1.61 0.9463 2.11 0.9826 2.61 0.9955 3.11 0.9991 3.61 0.9998
-3.88 0.0001 -3.38 0.0004 -2.88 0.0020 -2.38 0.0087 -1.88 0.0301 -1.38 0.0838 -0.88 0.1894 -0.38 0.3520 0.12 0.5478 0.62 0.7324 1.12 0.8686 1.62 0.9474 2.12 0.9830 2.62 0.9956 3.12 0.9991 3.62 0.9999
-3.87 0.0001 -3.37 0.0004 -2.87 0.0021 -2.37 0.0089 -1.87 0.0307 -1.37 0.0853 -0.87 0.1922 -0.37 0.3557 0.13 0.5517 0.63 0.7357 1.13 0.8708 1.63 0.9484 2.13 0.9834 2.63 0.9957 3.13 0.9991 3.63 0.9999
-3.86 0.0001 -3.36 0.0004 -2.86 0.0021 -2.36 0.0091 -1.86 0.0314 -1.36 0.0869 -0.86 0.1949 -0.36 0.3594 0.14 0.5557 0.64 0.7389 1.14 0.8729 1.64 0.9495 2.14 0.9838 2.64 0.9959 3.14 0.9992 3.64 0.9999
-3.85 0.0001 -3.35 0.0004 -2.85 0.0022 -2.35 0.0094 -1.85 0.0322 -1.35 0.0885 -0.85 0.1977 -0.35 0.3632 0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960 3.15 0.9992 3.65 0.9999
-3.84 0.0001 -3.34 0.0004 -2.84 0.0023 -2.34 0.0096 -1.84 0.0329 -1.34 0.0901 -0.84 0.2005 -0.34 0.3669 0.16 0.5636 0.66 0.7454 1.16 0.8770 1.66 0.9515 2.16 0.9846 2.66 0.9961 3.16 0.9992 3.66 0.9999
-3.83 0.0001 -3.33 0.0004 -2.83 0.0023 -2.33 0.0099 -1.83 0.0336 -1.33 0.0918 -0.83 0.2033 -0.33 0.3707 0.17 0.5675 0.67 0.7486 1.17 0.8790 1.67 0.9525 2.17 0.9850 2.67 0.9962 3.17 0.9992 3.67 0.9999
-3.82 0.0001 -3.32 0.0005 -2.82 0.0024 -2.32 0.0102 -1.82 0.0344 -1.32 0.0934 -0.82 0.2061 -0.32 0.3745 0.18 0.5714 0.68 0.7517 1.18 0.8810 1.68 0.9535 2.18 0.9854 2.68 0.9963 3.18 0.9993 3.68 0.9999
-3.81 0.0001 -3.31 0.0005 -2.81 0.0025 -2.31 0.0104 -1.81 0.0351 -1.31 0.0951 -0.81 0.2090 -0.31 0.3783 0.19 0.5753 0.69 0.7549 1.19 0.8830 1.69 0.9545 2.19 0.9857 2.69 0.9964 3.19 0.9993 3.69 0.9999
-3.80 0.0001 -3.30 0.0005 -2.80 0.0026 -2.30 0.0107 -1.80 0.0359 -1.30 0.0968 -0.80 0.2119 -0.30 0.3821 0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965 3.20 0.9993 3.70 0.9999
-3.79 0.0001 -3.29 0.0005 -2.79 0.0026 -2.29 0.0110 -1.79 0.0367 -1.29 0.0985 -0.79 0.2148 -0.29 0.3859 0.21 0.5832 0.71 0.7611 1.21 0.8869 1.71 0.9564 2.21 0.9864 2.71 0.9966 3.21 0.9993 3.71 0.9999
-3.78 0.0001 -3.28 0.0005 -2.78 0.0027 -2.28 0.0113 -1.78 0.0375 -1.28 0.1003 -0.78 0.2177 -0.28 0.3897 0.22 0.5871 0.72 0.7642 1.22 0.8888 1.72 0.9573 2.22 0.9868 2.72 0.9967 3.22 0.9994 3.72 0.9999
-3.77 0.0001 -3.27 0.0005 -2.77 0.0028 -2.27 0.0116 -1.77 0.0384 -1.27 0.1020 -0.77 0.2206 -0.27 0.3936 0.23 0.5910 0.73 0.7673 1.23 0.8907 1.73 0.9582 2.23 0.9871 2.73 0.9968 3.23 0.9994 3.73 0.9999
-3.76 0.0001 -3.26 0.0006 -2.76 0.0029 -2.26 0.0119 -1.76 0.0392 -1.26 0.1038 -0.76 0.2236 -0.26 0.3974 0.24 0.5948 0.74 0.7704 1.24 0.8925 1.74 0.9591 2.24 0.9875 2.74 0.9969 3.24 0.9994 3.74 0.9999
-3.75 0.0001 -3.25 0.0006 -2.75 0.0030 -2.25 0.0122 -1.75 0.0401 -1.25 0.1056 -0.75 0.2266 -0.25 0.4013 0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970 3.25 0.9994 3.75 0.9999
-3.74 0.0001 -3.24 0.0006 -2.74 0.0031 -2.24 0.0125 -1.74 0.0409 -1.24 0.1075 -0.74 0.2296 -0.24 0.4052 0.26 0.6026 0.76 0.7764 1.26 0.8962 1.76 0.9608 2.26 0.9881 2.76 0.9971 3.26 0.9994 3.76 0.9999
-3.73 0.0001 -3.23 0.0006 -2.73 0.0032 -2.23 0.0129 -1.73 0.0418 -1.23 0.1093 -0.73 0.2327 -0.23 0.4090 0.27 0.6064 0.77 0.7794 1.27 0.8980 1.77 0.9616 2.27 0.9884 2.77 0.9972 3.27 0.9995 3.77 0.9999
-3.72 0.0001 -3.22 0.0006 -2.72 0.0033 -2.22 0.0132 -1.72 0.0427 -1.22 0.1112 -0.72 0.2358 -0.22 0.4129 0.28 0.6103 0.78 0.7823 1.28 0.8997 1.78 0.9625 2.28 0.9887 2.78 0.9973 3.28 0.9995 3.78 0.9999
-3.71 0.0001 -3.21 0.0007 -2.71 0.0034 -2.21 0.0136 -1.71 0.0436 -1.21 0.1131 -0.71 0.2389 -0.21 0.4168 0.29 0.6141 0.79 0.7852 1.29 0.9015 1.79 0.9633 2.29 0.9890 2.79 0.9974 3.29 0.9995 3.79 0.9999
-3.70 0.0001 -3.20 0.0007 -2.70 0.0035 -2.20 0.0139 -1.70 0.0446 -1.20 0.1151 -0.70 0.2420 -0.20 0.4207 0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974 3.30 0.9995 3.80 0.9999
-3.69 0.0001 -3.19 0.0007 -2.69 0.0036 -2.19 0.0143 -1.69 0.0455 -1.19 0.1170 -0.69 0.2451 -0.19 0.4247 0.31 0.6217 0.81 0.7910 1.31 0.9049 1.81 0.9649 2.31 0.9896 2.81 0.9975 3.31 0.9995 3.81 0.9999
-3.68 0.0001 -3.18 0.0007 -2.68 0.0037 -2.18 0.0146 -1.68 0.0465 -1.18 0.1190 -0.68 0.2483 -0.18 0.4286 0.32 0.6255 0.82 0.7939 1.32 0.9066 1.82 0.9656 2.32 0.9898 2.82 0.9976 3.32 0.9995 3.82 0.9999
-3.67 0.0001 -3.17 0.0008 -2.67 0.0038 -2.17 0.0150 -1.67 0.0475 -1.17 0.1210 -0.67 0.2514 -0.17 0.4325 0.33 0.6293 0.83 0.7967 1.33 0.9082 1.83 0.9664 2.33 0.9901 2.83 0.9977 3.33 0.9996 3.83 0.9999
-3.66 0.0001 -3.16 0.0008 -2.66 0.0039 -2.16 0.0154 -1.66 0.0485 -1.16 0.1230 -0.66 0.2546 -0.16 0.4364 0.34 0.6331 0.84 0.7995 1.34 0.9099 1.84 0.9671 2.34 0.9904 2.84 0.9977 3.34 0.9996 3.84 0.9999
-3.65 0.0001 -3.15 0.0008 -2.65 0.0040 -2.15 0.0158 -1.65 0.0495 -1.15 0.1251 -0.65 0.2578 -0.15 0.4404 0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978 3.35 0.9996 3.85 0.9999
-3.64 0.0001 -3.14 0.0008 -2.64 0.0041 -2.14 0.0162 -1.64 0.0505 -1.14 0.1271 -0.64 0.2611 -0.14 0.4443 0.36 0.6406 0.86 0.8051 1.36 0.9131 1.86 0.9686 2.36 0.9909 2.86 0.9979 3.36 0.9996 3.86 0.9999
-3.63 0.0001 -3.13 0.0009 -2.63 0.0043 -2.13 0.0166 -1.63 0.0516 -1.13 0.1292 -0.63 0.2643 -0.13 0.4483 0.37 0.6443 0.87 0.8078 1.37 0.9147 1.87 0.9693 2.37 0.9911 2.87 0.9979 3.37 0.9996 3.87 0.9999
-3.62 0.0001 -3.12 0.0009 -2.62 0.0044 -2.12 0.0170 -1.62 0.0526 -1.12 0.1314 -0.62 0.2676 -0.12 0.4522 0.38 0.6480 0.88 0.8106 1.38 0.9162 1.88 0.9699 2.38 0.9913 2.88 0.9980 3.38 0.9996 3.88 0.9999
-3.61 0.0002 -3.11 0.0009 -2.61 0.0045 -2.11 0.0174 -1.61 0.0537 -1.11 0.1335 -0.61 0.2709 -0.11 0.4562 0.39 0.6517 0.89 0.8133 1.39 0.9177 1.89 0.9706 2.39 0.9916 2.89 0.9981 3.39 0.9997 3.89 0.9999
-3.60 0.0002 -3.10 0.0010 -2.60 0.0047 -2.10 0.0179 -1.60 0.0548 -1.10 0.1357 -0.60 0.2743 -0.10 0.4602 0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981 3.40 0.9997 3.90 1.0000
-3.59 0.0002 -3.09 0.0010 -2.59 0.0048 -2.09 0.0183 -1.59 0.0559 -1.09 0.1379 -0.59 0.2776 -0.09 0.4641 0.41 0.6591 0.91 0.8186 1.41 0.9207 1.91 0.9719 2.41 0.9920 2.91 0.9982 3.41 0.9997 3.91 1.0000
-3.58 0.0002 -3.08 0.0010 -2.58 0.0049 -2.08 0.0188 -1.58 0.0571 -1.08 0.1401 -0.58 0.2810 -0.08 0.4681 0.42 0.6628 0.92 0.8212 1.42 0.9222 1.92 0.9726 2.42 0.9922 2.92 0.9982 3.42 0.9997 3.92 1.0000
-3.57 0.0002 -3.07 0.0011 -2.57 0.0051 -2.07 0.0192 -1.57 0.0582 -1.07 0.1423 -0.57 0.2843 -0.07 0.4721 0.43 0.6664 0.93 0.8238 1.43 0.9236 1.93 0.9732 2.43 0.9925 2.93 0.9983 3.43 0.9997 3.93 1.0000
-3.56 0.0002 -3.06 0.0011 -2.56 0.0052 -2.06 0.0197 -1.56 0.0594 -1.06 0.1446 -0.56 0.2877 -0.06 0.4761 0.44 0.6700 0.94 0.8264 1.44 0.9251 1.94 0.9738 2.44 0.9927 2.94 0.9984 3.44 0.9997 3.94 1.0000
-3.55 0.0002 -3.05 0.0011 -2.55 0.0054 -2.05 0.0202 -1.55 0.0606 -1.05 0.1469 -0.55 0.2912 -0.05 0.4801 0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984 3.45 0.9997 3.95 1.0000
-3.54 0.0002 -3.04 0.0012 -2.54 0.0055 -2.04 0.0207 -1.54 0.0618 -1.04 0.1492 -0.54 0.2946 -0.04 0.4840 0.46 0.6772 0.96 0.8315 1.46 0.9279 1.96 0.9750 2.46 0.9931 2.96 0.9985 3.46 0.9997 3.96 1.0000
-3.53 0.0002 -3.03 0.0012 -2.53 0.0057 -2.03 0.0212 -1.53 0.0630 -1.03 0.1515 -0.53 0.2981 -0.03 0.4880 0.47 0.6808 0.97 0.8340 1.47 0.9292 1.97 0.9756 2.47 0.9932 2.97 0.9985 3.47 0.9997 3.97 1.0000
-3.52 0.0002 -3.02 0.0013 -2.52 0.0059 -2.02 0.0217 -1.52 0.0643 -1.02 0.1539 -0.52 0.3015 -0.02 0.4920 0.48 0.6844 0.98 0.8365 1.48 0.9306 1.98 0.9761 2.48 0.9934 2.98 0.9986 3.48 0.9997 3.98 1.0000
-3.51 0.0002 -3.01 0.0013 -2.51 0.0060 -2.01 0.0222 -1.51 0.0655 -1.01 0.1562 -0.51 0.3050 -0.01 0.4960 0.49 0.6879 0.99 0.8389 1.49 0.9319 1.99 0.9767 2.49 0.9936 2.99 0.9986 3.49 0.9998 3.99 1.0000

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 43
Working with the z-table

We assume that Lower Saxony cows have a mean yield of 6000 kg and a standard deviation of
1000 kg N(6000, 1000). We want to find out the probability of X being larger than 8000; with µ
given 6000 and σ given 1000 P(X>8000,6000|1000)
Procedure:
-First, we are interested in the area under the normal distribution N(6000, 1000) to the right of
(respectively above) the limit of X=8000.
-We replace the values =6000, =1000, X=8000 in the formula for the z-transformation =>
z=(8000-6000)/1000 =2.00.
-Interpretation: the limit is 2 standard deviations above the mean value
-P(z=2) = 0.9772 is taken from the table
-97.72% of the animals thus have a lower milk yield than 8000 kg
-The difference to 100% then has a milk yield > 8000kg
P(X>8000|6000,1000)=1-0.9772= 0.0228=2.28%
If we sample 500 cows randomly out of the Lower Saxony cows we can expect to get with in this
sample 500*2.28%=11.4 cows
P(z) and F(z), respectively: leftsided area under the standard normal distribution less then the limit z.
z second decimal place
integer.first
decimal 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
...
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
...

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 44
Particular areas of the normal
distribution
µ±σ P(x)
µ±1σ 0.683
µ±1.645σ 0.900
µ±1.96σ 0.950
µ±2σ 0.954
µ±2.6σ 0.990
µ±3σ 0.997
• For every N(µ, σ) the area
proportion between any limits can
be given as a multiple of σ with the
help of the Z-table. Symmetric
limits in the form µ±σ have a
specific meaning.

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 45


Summary normal distribution

Areas under the normal distribution curve indicate probabilities


The normal distribution of data is regarded as a prerequisite for many statistical
tests
Normally distributed data can be assumed in most cases for continuous variables
(e.g. yield, milk yield, …) that can be assumed to be influenced by a multiple set
of random influences (year(specific weather conditions), site, ...)
For example, the milk yield depends on ‚random factors‘ like technical
measurement error, individual, age, lactation, race, mother, father, rank, stress,
feed, weather, …
For example, the height of maize plants can a priori assumed as normaly
distributed because, ….

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 46


Describing Distributions
• Continuous random variables are not necessarily normal distributed

Skew = 0 Skew<0 Skew>0


Binomial distribution Count data,
Skewness

Binomial distribution

Bell shaped Left skewed Right skewed

Kurt≈ 3 or ≈ 0
dep. whether a correction Percentage scale
Uniform
Kurtosis

term (-3) considered distributions

normal Negative excess Positive excess


Number of
Modi

unimodal bimodal multimodal


C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 47
Compare Sample and Theoretical
Distribution, QQ-Plot
75,0

73,0
Height (observerd and expected, resp.)

observed
71,0 expected
3
69,0

67,0 2

65,0
1
63,0

Observed Quantile
61,0 0
59,0 -4 -2 0 2 4

-1
57,0
0,0 0,2 0,4 0,6 0,8 1,0
Quantil [observerd and expected, respectively) -2

QQ-Plots
• As well to compare sample distribution with other theoretical -3
distributions
• several different types exist with respect to -4
Theoratical Quantile
- axis (flipped)
- axis labels (PP-plot, quantiles presented as probabilities)
- quantiles (standardized or not)
- reference line
 Generally ,if assumed theoretical distribution fits well to the
sample distribution the data points lay more or less on the
straight reference line
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 48
The t-distribution (Student-t-
distribution)
For small sample sizes n<30
Flatter than the normal distribution; the higher n resp. df, the closer to the normal
distribution
-> For small n it is less probable to by chance draw values close to the mean
The total area under the probability density function of the t-distribution is 1
Areas for different degrees of
freedom are tabulated.

FG=30, ~N(0,1)

FG=3

-4 -3 -2 -1 0 1 2 3 4
t

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 49


Displaying distributions: Box Plot
The whiskers or lines on either side of the
box show the range of the data (Min and
Max).
The box contains the middle 50 % of the
data values (interquartile range, IQR).
Lower quartile Q1: 25 % of the observations
are less than Q1
Upper Quartile Q3: 75 % of the observations
are less than Q3
Median (Q2): 50 % of observations are less
than the median (mid-value).
If n is odd, this is the middle number after Yield of two rapeseed cultivars in a series of experiments (24
sorting them in order of magnitude, if n is sites)
even it is the average of the middle two.
Box plots show whether a distribution is
The median is preferred to the arithmetic
skewed (cultivar B) or symmetrical (cultivar A).
mean when the distribution is skewed
Tukey box plot: Outliers are excluded from the
(nonsymmetrical).
whiskers and highlighted with symbols. The
ends of the whiskers represent the lowest
datum still within 1.5 IQR of the lower quartile,
and the highest datum still within 1.5 IQR of the
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 50
upper quartile
Central Limit Theorem
Example: Counts of weed seedlings
in randomly selected plots
- n = 600, left skewed, x̅ = 4.95, median= 4,
mode = 3, Skew: 3 = 1.18
- out of 10 values means are calculated
 n = 60 mean values, x̅10 = 4.95, median =
4.9, mode = 5.2, Skew : 3 = 0.347
Dispersion of the mean values (x̅10) is
approximately symmetric
According to the central limit theorem (CLT)
the arithmetic mean of subsamples will be
approximately normally distributed, regardless of
the underlying distribution of the single
observations
TIP:
Think about the expected distribution of your
response variable
 try to collect the data in a way that you get
metric data.

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 51
Check up 2b
The heights within a wheat population are normally
distributed with a mean of 80 cm and a standard
deviation of 5 cm. Which proportion of plants has
heights
(1) less than 72 cm,
(2) between 82 and 87 cm
(3) between 72 and 82 cm.
(4) Above what height are the top 20 % of plants?
(5) Below what height are the lowest 4 % of plants?

[from Clewer & Scarisbrick 2001]

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 52
Solution Check up 1a)

No. Scale Meassure of Meassure of Mathmatical Example


central tendency disperson operation
1 Mode a=b; a≠b

2 Ordinal Disease rating

3 Degree Celsius

4 Rational Coefficient of
variance

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 53
Solution Check up 1b)

No. Scale Measure of Measure of Mathmatical Example


central tendency disperson operation
1 Nominal* Mode Diversity indices a=b; a≠b Sex; Location; Name
eg. Shannon-Index H

2 Ordinal Median; mode range: from Min to a>b; a<b Disease rating
Max & those of No. 1
3 Interval Arithmetic mean; median Variance; Standard a-b=c; a+b=c Degree Celsius
deviation; range= & those of No. 1, 2
Max-Min
4 Rational Geometric mean; arithmetic Coefficient of a/b=c; a*b=c Yield
mean; median variance; standard & those of No. 1, 2, 3
dev.; range

*For the special case of a Binominal scale (False =0 vs. True=1) aswell the variance and the mean value is allowed.

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 54

Das könnte Ihnen auch gefallen