PSED18 02 Descriptive Statistics, Distributions

Practical Statistics and Experimental Design
02 WEDNESDAY 11TH APRIL 2018
Descriptive Statistics
Distributions
DR. CHRISTIAN KLUTH Von-Siebold-Str. 8

Georg-August-Universität Göttingen 37075 Göttingen
Lehrkraft für besondere Aufgaben Tel.: 0551/394356
Statistikberatung für Studierende E-Mail: ckluth@uni-goettingen.de
Department für Nutzpflanzenwissenschaften Appointment by arrangement
The Research Process
Identify appropriate Study System, factors subsampling vs biological replicates
Desire of sample size Primary
and variables, define factor levels Measure variable
improvement Study Unit
Problem Distributions (dep. vs. indep.) Measurement Error
Literature Data types
Field Variables Effect size
study Sampling
Initial Scientific Experimental Randomi-
Observation Hypothesis set up sation
Expertise Experimental Ceteris paribus
Logic of design Software
statistical inferenceAlways consider: skills
- Population
Alternative
- Feasibility Field plan,
Unbiased Graphics Scientific Hypothesis HA
- Relevance Data Data structure,
and Relevant Tables writing/ vs.
Collection Paper form for
Conclusion presentation Null Hypothesis H0 Avoid:
Software Text - Block-Treatment field work
skills Confounding
- HARK
Software Data
Data management
skills
analysis
Model Significance test, Statistical Data
Assumptions Statistical Modelling Hypothesis Processing
Statistical Software
tool box skills
Descriptive
Key numbers
statistics
Graphical Data exploration
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 2
Distributions
Generally, description of how things are spread in space.
e.g. species distribution
=> likely or unlikely to find the
species in a certain area
Eranthis hyemalis seedlings 10th April 2018
https://en.wikipedia.org/wiki/Speci
High density of seedlings es_distribution
high probability of finding

a seedling in this area
Low density of https://upload.wikimedia.

org/wikipedia/commons/b
seedlings /b4/Winterling-Bluete-
low probability of 70.jpg
finding a seedling in this

area
Foto: C. Kluth
Descriptive Statistics for
Univariate Sample distributions
Measure of central tendency
-central or typical value of a distribution => expected value
- depending on the data type
-> Mode: most common value among a group (at least nominal scaled)
-> Median: middle score of sorted data (at least ordinal scaled)
-> Arithmetic mean (𝑥 ): sum of numerical values divided by
number of values (at least interval scaled)
-> Geometric mean: nth root of the product of n numbers (rational scaled)
-> …
Measure of dispersion
-variability of a distribution i.e. measure of how likely it is to get the central
value
- depending on the data type
-> e.g. Range from minimum value to maximum value (at least ordinal scaled)
- for other scaled data see later
Distributions
Theoretical distribution
expected distribution based on the character of the interesting variable and on
the sampling process
Sample distribution
the observed distribution within one sample
Sampling distribution
bunch of several sample distributions (the distribution of sample mean)
Distributions
In data science the distribution of any random variable can by displayed by a histogram
It is a graphical display of numerical data in the form of upright bars, with the area of
each bar representing frequency expressed as relative frequency it can be directly
interpreted as probability to get an observation in a certain range if an object is
randomly chosen from that distribution. Cumulative frequency gives the summed
probability of getting a certain value or lower given a (sample) distribution.
Distributions
Random Variables
Any Random process leads to a random variable
eg. flipping coin, rolling dice, how much rain will fall to morrow, how tall is a
randomly selected plant
Random variables describe the outcome of a random process in numbers
discrete RV
Random Variable
continuous RV
Coin flip Sum of rolling Counts of seedlings/spores

5 times dice
1 if heads 5 0
X1 X2 ... X3 ...
0 if tail 30 max
P(X1=1)=0.5, P(X2=5)=1/((1/6)^5) P(X3=12)=? has to be estimated.
P(X1=0)=0.5 =0.000128 from samples
Xi are discrete RV => the probability of an exact outcome can be given

Discrete Random Variables
Binomial distribution B(n,p)
Consider a series of n independent trials (Bernulli experiment). Assume that there are only two possible
outcomes (events: success of failure) at each trial, and that p is the probability of a success. This implies that
the probability of a failure at each trial is 1-p = q.
The typical example for that situation is tossing a coin (heads or tails(value)). The probabilities of getting a
certain result after a given number of trials can be exemplified by multiplying the single probabilities in a
probability tree:
1st trial 2nd trial 3rd trial Result Pr (Result) k success k Pr(k)
0.5 h hhh 0.125 0 0 0.125

h 1 0.375
0.5 0.5 t hht 0.125 1 2 0.375
h 3 0.125
0.5 0.5 h hth 0.125 1
0.5 t
0.5 t htt 0.125 2
0.5 h thh 0.125 1

0.5 h
0.5 0.5 t tht 0.125 2
t
0.5 0.5 h tth 0.125 2
t
0.5 t ttt 0.125 3
𝑛 𝑘 𝑛−𝑘
Probability function: Pr 𝑘 = 𝑝 𝑞 => R: dbinom(x, size, prob, log = FALSE)
𝑘
𝑛x =k, size=n,
𝑛!
prob=p (no of successes)
= 𝑘! 𝑛−𝑘 ! , the Binomial coefficent (meaning: possible combinations of getting k
𝑘
successes out of n trials) => R: choose(n, k)
Binomial distribution
Probability tree for sampling 3 plants in an inoculated field assuming that the probability of successful
inoculation is 0.2
successfull
1st trial 2nd trial 3rd trial Result Pr (Result) no. helaty plants Pr(k)
infections
0.8 h hhh 0.512 3 0 0.512

h 1 0.384
0.8 0.2 i hhi 0.128 2 2 0.096
h 3 0.008
0.2 0.8 h hih 0.128 2
0.8 i
0.2 i hii 0.032 1
0.8 h ihh 0.128 2

0.2 h
0.8 0.2 i ihi 0.032 1
i
0.2 0.8 h iih 0.032 1
i
0.2 i iii 0.008 0
0,4
𝑛 𝑘 𝑛−𝑘
0,4 Pr 𝑘 = 𝑝 𝑞 The function is defined only
𝑘
Obs. relative Frequency of k success

0,35
0,35 n= 4 at integer values of k.
Observed relative Frequency
series= 48 0,3 The connecting lines are

0,3
p= 0,532 n= 4
0,25
0,25 series= 48 only guides for the eye.
of k success
Pr(k)
0,2 p= 0,532
0,2 Mean= 2,00
0,15
0,15 E(Y)= 2,13 E(Y)=n*p
0,1
0,1 Var(Y)=npq
0,05
0,05
0
0
01234
01234
Nomber of Successes (k)
0,5 𝑛 𝑘 𝑛−𝑘
0,5 Pr 𝑘 = 𝑝 𝑞
𝑘
Obs. relative Frequency of k success
0,45
0,45
n= 6 0,4
Observed relative Frequency
0,4 series= 86
p= 0,866 0,35 n= 6
0,35
0,3 series= 86
of k success
0,3
Pr(k)
0,25 p= 0,866
0,25 Mean= 5,17
0,2 E(Y)= 5,20
0,2
0,15
0,15 further examples in Excel
0,1
0,1
0,05 Optional Exercise for R Experts:
0,05
0  try to program in R
0
0123456
0123456
C. Kluth Nomber of Successes
Practical (k)
Statistics and Experimental Design • 02 Descriptive Stats, Distribution 11 April 2018, p. 10
Check up 02a
Working with the binomial distribution

Four seeds are planted and treated in the same
way. Suppose each has the probability of 0.8 of
germination. Find the probability distribution of the
number of seeds germinating. In other words, for
each value of k from 0 to 4 find P(k).
(from Clewer & Scarisbrick 2001)
Poisson distribution P(λ)
Poisson distribution P(λ)
Poisson distribution
Poisson distribution
=> Show further examples in Excel
Descriptive Statistics for Metric Data
Living histogram
Connecticut State Agricultural College
(J. Heredity 5:511–518, 1914)
http://jhered.oxfordjournals.org/content/95/5/365.full.pdf
Original List
Stud_No Height Stud_No Height Stud_No Height Stud_No Height
Ordered List
Order Stud_No Height Order Stud_No Height Order Stud_No Height Order Stud_No Height
1 67.04 33 65.75 65 67.11 97 63.38 1 61 57.52 33 68 65.40 65 106 67.26 97 10 69.78
2 67.74 34 64.71 66 69.78 98 69.27 2 43 60.73 34 53 65.45 66 100 67.35 98 66 69.78
3 66.72 35 68.92 67 65.64 99 69.10 3 26 61.63 35 87 65.47 67 28 67.54 99 86 69.87
4 68.56 36 71.09 68 65.40 100 67.35 4 7 61.84 36 6 65.58 68 59 67.58 100 103 69.96
5 65.68 37 67.69 69 72.70 101 71.36 5 20 62.10 37 67 65.64 69 83 67.58 101 17 70.06
6 65.58 38 67.05 70 64.85 102 68.96 6 71 62.49 38 5 65.68 70 37 67.69 102 123 70.08
7 61.84 39 67.80 71 62.49 103 69.96 7 46 62.60 39 40 65.70 71 91 67.71 103 108 70.48
8 65.84 40 65.70 72 67.77 104 66.79 8 76 62.69 40 77 65.70 72 2 67.74 104 112 70.48
9 64.88 41 68.08 73 63.01 105 67.80 9 73 63.01 41 15 65.72 73 72 67.77 105 95 70.72
10 69.78 42 64.58 74 67.10 106 67.26 10 80 63.06 42 33 65.75 74 39 67.80 106 122 70.75
11 66.61 43 60.73 75 68.43 107 63.62 11 120 63.18 43 25 65.81 75 105 67.80 107 90 70.80
12 64.86 44 66.45 76 62.69 108 70.48 12 97 63.38 44 8 65.84 76 52 67.83 108 118 70.89
13 69.00 45 65.89 77 65.70 109 68.98 13 31 63.58 45 45 65.89 77 119 67.90 109 21 70.97
14 64.69 46 62.60 78 72.02 110 66.08 14 107 63.62 46 111 65.92 78 41 68.08 110 23 71.06
15 65.72 47 72.55 79 68.36 111 65.92 15 126 63.70 47 110 66.08 79 84 68.11 111 36 71.09
16 69.31 48 66.12 80 63.06 112 70.48 16 121 63.95 48 48 66.12 80 113 68.30 112 96 71.14
17 70.06 49 71.24 81 64.40 113 68.30 17 94 64.29 49 85 66.31 81 79 68.36 113 49 71.24
18 71.62 50 64.45 82 69.30 114 71.40 18 81 64.40 50 55 66.32 82 117 68.36 114 101 71.36
19 72.20 51 66.77 83 67.58 115 71.41 19 50 64.45 51 44 66.45 83 75 68.43 115 114 71.40
20 62.10 52 67.83 84 68.11 116 72.90 20 58 64.46 52 30 66.59 84 4 68.56 116 57 71.40
21 70.97 53 65.45 85 66.31 117 68.36 21 42 64.58 53 11 66.61 85 35 68.92 117 115 71.41
22 69.10 54 72.72 86 69.87 118 70.89 22 88 64.67 54 27 66.65 86 102 68.96 118 18 71.62
23 71.06 55 66.32 87 65.47 119 67.90 23 14 64.69 55 3 66.72 87 109 68.98 119 124 71.80
24 69.66 56 65.32 88 64.67 120 63.18 24 32 64.71 56 51 66.77 88 13 69.00 120 78 72.02
25 65.81 57 71.40 89 66.80 121 63.95 25 34 64.71 57 104 66.79 89 22 69.10 121 19 72.20
26 61.63 58 64.46 90 70.80 122 70.75 26 63 64.80 58 89 66.80 90 99 69.10 122 47 72.55
27 66.65 59 67.58 91 67.71 123 70.08 27 70 64.85 59 1 67.04 91 98 69.27 123 69 72.70
28 67.54 60 67.13 92 67.20 124 71.80 28 125 64.85 60 38 67.05 92 82 69.30 124 54 72.72
29 69.38 61 57.52 93 74.27 125 64.85 29 12 64.86 61 74 67.10 93 16 69.31 125 116 72.90
30 66.59 62 64.96 94 64.29 126 63.70 30 9 64.88 62 65 67.11 94 29 69.38 126 93 74.27
31 63.58 63 64.80 95 70.72 31 62 64.96 63 60 67.13 95 64 69.64
32 64.71 64 69.64 96 71.14 32 56 65.32 64 92 67.20 96 24 69.66
𝒔𝒖𝒎 𝟖𝟒𝟖𝟎.𝟕𝟕
mean= 𝒏 = = 𝟔𝟕. 𝟑𝟏 median= 𝟔𝟕. 𝟏𝟔 𝒎𝒆𝒂𝒏 𝒐𝒇 𝒕𝒉𝒆 𝒕𝒘𝒐 𝒎𝒊𝒅𝒅𝒍𝒆 𝒗𝒂𝒍𝒖𝒆𝒔 𝒊𝒇 𝒏 𝒊𝒔 𝒆𝒗𝒆𝒏
𝟏𝟐𝟔
𝒓𝒂𝒏𝒈𝒆 𝒇𝒓𝒐𝒎 𝟓𝟕. 𝟓𝟐 𝒕𝒐 𝟕𝟒. 𝟐𝟕 or range = 74.27-57.52=16.75
Descriptive Statistics for metric data
midpoi Abs. Rel. Cumuati

Class nt Tally List Freq Freq ve Freq.
58 - 59 58.5| 1 0.0079 0.0079
59 - 60 59.5 0 0.0000 0.0079
60 - 61 60.5 0 0.0000 0.0079
61 - 62 61.5| 1 0.0079 0.0159 Histogram
62 - 63 62.5|| 2 0.0159 0.0317
1. Categorize data
63 - 64 63.5|||| 4 0.0317 0.0635
(usually in equal sized classes,
64 - 65 64.5|||| ||| 8 0.0635 0.1270
|||| |||| correct class: e.g. 58≤x<59)
65 - 66 65.5|||| 15 0.1190 0.2460 2. Count how many fall into a class
|||| ||||
66 - 67 66.5|||| 15 0.1190 0.3651 => absolute frequency
67 - 68 67.5|||| |||| || 12 0.0952 0.4603 3. Divide absolute frequency
|||| |||| by total number of observation
68 - 69 68.5|||| |||| 19 0.1508 0.6111 4. Draw bar chart with out spaces
69 - 70 69.5|||| |||| 10 0.0794 0.6905
70 - 71 70.5|||| |||| ||| 13 0.1032 0.7937
71 - 72 71.5|||| |||| 9 0.0714 0.8651 Relative frequency can be
72 - 73 72.5|||| |||| 10 0.0794 0.9444 interpreted as the probability of
73 - 74 73.5|||| | 6 0.0476 0.9921 random sample falling in a
74 - 75 74.5 0 0.0000 0.9921 certain class given that specific
sample distribution
75 - 76 75.5| 1 0.0079 1.0000
Displaying empirical univariate distributions:
Histogram -> Cumulative Distribution
The cumulative distribution can be displayed

even without classifying the data before
=> experimental cumulative distribution
function R: ecdf()
Statistical Key Numbers, Quantiles
To describe the distribution with few numbers
given the cumulative frequency the relative proportion up to a certain value can easily
depicted from the graph
e.g. 40% of the data (students) have a height lower then 66 inch
A quantile Qt gives the percentage of the data that lies under the cut point of t%.
Quantiles have the property that both measures of central tendency and of dispersion
can be derived
the Q50 quantile cuts the

distribution in to two halves
also called the median (for
sample size with odd number, for even
numbered sample size the average of
the two nearest neighbours)
Displaying empirical univariate distributions:
Histogram -> Cumulative Distribution
Example: Height of 16 randomly selected maize plants
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
xi 175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
7
mid- Abs. Rel. Cumuative
6
Class point Freq Freq Freq.
Absolute Frequency
5
130-< 140 135 1 0.0625 0.0625
4
140-< 150 145 0 0.0000 0.0625
3
150-< 160 155 2 0.1250 0.1875
2
160-< 170 165 5 0.3125 0.5
1
170-< 180 175 6 0.3750 0.875
0
135 145 155 165 175 185 195
180-< 190 185 1 0.0625 0.9375
Height Class 190-< 200 195 1 0.0625 1
0,40 1
0,35 0,9
0,8
0,30 Cumulative Frequency
Relative Frequency
0,7
0,25 0,6
0,20 0,5
0,15 0,4
0,3
0,10
0,2
0,05 0,1
0,00 0
135 145 155 165 175 185 195 135 145 155 165 175 185 195
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution
Height Class Height Class 11 April 2018, p. 21
Example Quantiles (also called
Percentile)
Given a sample of size n=16 maize plants
ordered list
175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
Order j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
xj 136 154 157 162 163 164 165 167 172 175 175 176 177 179 186 194
or Quartiles x x x x x x x x x x x x x x x x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Q25= Q50= Q75=

Q0=! Q100=!
(162+163)/2 median (176+177)/2
min=136 =1176,5.5 max=149
=163 (167+172)/2
=169.5
Q0 Q1 Q2 Q3 Q4
measure of
range → = Q4 − Q0=max-min central tendency
interquartile range,
→ IQR = Q3 − Q1 measure of
middle fifty dispersion
Displaying empirical univariate
distributions: Quantile Plot, Box Plot
for each order number the t%-quantile can be determined
Order j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
t%=(j-0.5)*100%/n 3.1 9.4 16 22 28 34 41 47 53 59 66 72 78 84 91 97
xj 136 154 157 162 163 164 165 167 172 175 175 176 177 179 186 194
Quantile Plots
Scatter plot of x (height) against • Box-Whisker-Plot
=> corresponds to the cumulative
respective Quantiles or vice versa Displaying Quartiles of the
frequency but data not categorized
sample distribution
200 100
90 Q4
190
80
180 70 Q3
Quantile [%]
Height [cm]
170 60
160
50
40
Q2
30
150
20 Q1
140 10
130 0 Q0
0 20 40 60 80 100 130 140 150 160 170 180 190 20
Quantile [%] Height [cm]
Properties of mean and median
Arithmetic mean ̅x n
x x
i 1
i /n
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
xi 175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
Mean 168.9 Median 169.5
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
xi 300 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
Mean 176.7 Median 169.5
the mean value is sensitive

for outliers
the median is robust against
outliers
Key Numbers describing Dispersion
for Metric Data
- Variance σ2, S2
- Standard deviation σ, S
- Standard error SE
- Coefficient of variation cv
- (Kurtosis and Skewness)
- Confidence interval (details later)
Dispersion for Metric Data
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Sum of Squares
15 16
xi 175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
mean 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9 168.9
xi-mean 6.125 3.125 10.13 -1.88 -5.88 -14.9 -32.9 -4.88 -11.9 8.125 17.13 -3.88 6.125 25.13 7.125 -6.88
(xi-mean)^2 37.52 9.766 102.5 3.516 34.52 221.3 1081 23.77 141 66.02 293.3 15.02 37.52 631.3 50.77 47.27
SS 2796
200
190 Mean
Plant height x [cm]
180 absolute deviations =

170 residuals
160
150
140
130
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Observation No i
different ways to compute (displacement law)

𝑛 2
𝑛 2 𝑛 2 𝑖=1 𝑥𝑖 𝑛
SS = 𝑖=1 𝑖𝑥 − 𝑥 = 𝑥
𝑖=1 𝑖 − = 𝑖=1 𝑥𝑖 2 − 𝑛𝑥 2
𝑛
xi^2 30625 29584 32041 27889 26569 23716 18496 26896 24649 31329 34596 27225 30625 37636 30976 26244
Σxi 2702
(Σxi)^2 7300804
Σ(xi^2) 459096
SS 2796
Variance=Mean Squared Deviation
• Since SS increases with n it is not yet a good measure for dispersion
=> divide by n and n-1 resp.
𝑛 1 𝑆𝑆
• Population 𝜎 2 = 𝑖=1 𝑥𝑖 − 𝑥 2
∗ = = 𝑀𝑆
𝑛 𝑛
𝑛 1 𝑆𝑆
• Sample 𝑠 2 = 𝑖=1 𝑥𝑖 − 𝑥 2
∗ = = 𝑀𝑆
𝑛−1 𝑑𝑓
xi = value of ith observation
x̅ = mean value
n = number of observations (sample size)
df= degree of freedom df=(n-1) number of independent value that are free to vary in a given system
SS = sum of squares
𝟐𝟕𝟗𝟔
𝑀𝑆 = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = = 𝟏𝟖𝟔. 𝟒
𝟏𝟓
• Unit is the unit of the meassure data squared

Standard deviation S,
Standard Error of Mean SE
𝑠 = 𝑠 2 eg. s = 186.4 = 13.65
having again the unit of the measurement variable
but still depending on the sample size (as the variance)
e.g. doubling the data set:
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
xi 175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
xi^2 30625 29584 32041 27889 26569 23716 18496 26896 24649 31329 34596 27225 30625 37636 30976 26244
i 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
xi 175 172 179 167 163 154 136 164 157 177 186 165 175 194 176 162
xi^2 30625 29584 32041 27889 26569 23716 18496 26896 24649 31329 34596 27225 30625 37636 30976 26244
Σxi 5404
(Σxi)^2 2.9E+07
Σ(xi^2) 918192 S = 372.8 = 19.3
SS 5591.5 MS 372.8
Solution=> Standard Error => Standard deviation of the expected mean

S
SE =
n
13.65
eg simple data set 16 = 3.41
19.3
doubele data set = 3.41
32
Coefficient of variation cv
cv = S / x̅
=> relative standard deviation
only meaning full for ratioal scaled data
Properties of Variance based
Measures
all variance based measures of dispersion are sensitive to outliers
the range is aswell sensitive to outliers
more robust measure of dispersion is the IQR
or median of absolute deviations (mad, rarely used)
advantage:
more information (of all measurements) is used
=> comparison to theoretical distributions
Normal distribution N(µ,σ2)
A living histogram from

the Connecticut State
Agricultural College (J.
Heredity 5:511–518,
1914).
http://jhered.oxfordjournals.org/c
ontent/95/5/365.full.pdf
Properties of normal distribution
Bell shaped
Unimodal
Symmetric
probability density functionf(x)

Probability density function
  X   2
f X 
1
 e 2 2
with  2
µ = mean
σ = standard deviation
describing the distribution completely
Highest density at the mean , ie the grater
x
the distance from the mean the less µ-σ µ µ+σ
values we have
the mean divides the distribution in two • Inflection point at µ-σ and µ+σ
equally sized parts • Approximately 68% of the values are
=>mean= mode= median between µ-σ and µ+σ
• In contrast to discreet distributions we do not ask for the probability of getting a certain event
or result, but for the probability of getting a value with in certain limits (intervals)
Distributions of discreet vs.
continuous variables
discreet variable continuous variable
x∈ℕ x∈ℝ
Probability function Probability density function
Cumulative probability function Cumulative density function
e.g. binomial and Poisson distribution e.g. normal distribution
Family of normal distributions
0,9
0,8 N(2,0.5)
Probability density function f(x)

0,7
0,6
0,5
N(0,1)
0,4
0,3
0,2 N(2,2)
0,1
0
-4 -3 -2 -1 0 1 2 3 4 5 6 7 8
x
Any number of normal distributions, that are characterized by the mean µ
and the standard deviation σ, N(µ,σ)
The area under any N(µ,σ) is 1
N(0,1) is called standard normal distribution
C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 36

Normal distribution
0.977
distribution 𝑥
function F(x) 𝐹 𝑥 = 𝑓 𝑥 𝑑𝑥
−∞
Distribution function F(x)
Integral of the proba-

0.50 bility density function
no antiderivative
=>
probability Areas under the normal
distribution are tabulated
density function
f(x)
The cumulative distribution function (or just distribution function) of the normal
distribution gives the area under the curve that is under a certain value of x
E.g. below x=2 for N(2,0.5) we find 50 % of the values, for N(0,1) we find 97.7
% of the values below x=2

Working with the cumulative
distribution function
Assume that mean yearly milk yield of cows in Lower Saxony is normally
distributes with a mean of 6000 kg and a standard deviation of 1000 kg
(N(6000, 1000)).
If you randomly sample one cow out of the population you can answer
the question: What is the probability that the cow has a yield of 5000 kg
or less?
Wanted:
probability of X ≤5000;

given µ =6000 and σ =1000
P(X≤5000|6000,1000)=0.1587=
15.87%
Milk yield kg/year

Working with the cumulative
distribution function
What is the probability that a randomly selected cow has a yield larger
than 8000 kg?
Wanted: the probability X > 8000; given µ = 6000 and σ =1000:
P(X>8000|6000,1000)
-> (cumulative) distribution

function gives the area under the

probability density function left of
the limit of X.
But we are now interested in the
part right of X. Since the total area
under the probability density
function = 1 the area of interest is
P(X>8000|6000,1000)
= 1- P(X≤8000|6000,1000)
Milk yield kg/year
=1-0.9772=0.0228=2.28%
z-Transformation
Since there is an infinite amount of normal distributions N(µ, σ) there are no tabulated
distribution functions for these, but
any normal distribution N(µ, σ) can be transformed to the standard normal distribution
with the mean of 0 and the standard deviation of 1 N(0, 1):
𝑥−µ
𝑧=
σ
z-transformation
The distribution function of the standard normal distribution N(0, 1) is tabulated
z-transformation
Example milk yield N(6000, 1000)
N(0, 1000)
N(0, 1)
Standard normal distribution
Standard normal distribution
Probability densitiy function f(x=z) and cumulative distribution function F(x=z) of the standard normal distribution, N(0;1) ,
• Because of the symmetric

1
 X   
2
0.9 1
f X  e 2 2
0.8  2
form of the normal 0.7 f(z)
distribution, negative values 0.6

0.5
F(z)
f z  
1  z 
e 2
2
2
of z are not shown in the 0.4
0.3
table. The simple relation 0.2

0.1
F(−z) = 1 − F(z) 0
-4 -3 -2 -1 0 1 2 3 4
z
can be used instead

• Two different questions can P(z) and F(z), respectively: leftsided area under the standard normal distribution less then the limit z.
be answered: z
integer.first
second decimal place
decimal 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1. Which z-value is related to 0.0
0.1
0.5000
0.5398
0.5040
0.5438
0.5080
0.5478
0.5120
0.5517
0.5160
0.5557
0.5199
0.5596
0.5239
0.5636
0.5279
0.5675
0.5319
0.5714
0.5359
0.5753
a distinct area. 0.2

0.3
0.5793
0.6179
0.5832
0.6217
0.5871
0.6255
0.5910
0.6293
0.5948
0.6331
0.5987
0.6368
0.6026
0.6406
0.6064
0.6443
0.6103
0.6480
0.6141
0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
2. Which area belongs to a 0.5
0.6
0.6915
0.7257
0.6950
0.7291
0.6985
0.7324
0.7019
0.7357
0.7054
0.7389
0.7088
0.7422
0.7123
0.7454
0.7157
0.7486
0.7190
0.7517
0.7224
0.7549
distinct z-value? 0.7
0.8
0.7580
0.7881
0.7611
0.7910
0.7642
0.7939
0.7673
0.7967
0.7704
0.7995
0.7734
0.8023
0.7764
0.8051
0.7794
0.8078
0.7823
0.8106
0.7852
0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
P(z) gives the shaded area. 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
eg. the z -value 1.23 cuts of an area of 0.8907.
Because the normal distribution curve is symmetrical, probabilities for only positive values of z are given. Negative values of z can be computed by
P(-z) = 1-P(z) eg : P(z = -2)= 1-P(z= 2) = 1-0.9772 = 0.0228
Using the z transformation for each x ~N(µ;σ) the corresponding z -value can be calculated
z = (x - µ) / σ
z P z P z p z p z p z p z p z p z p z p z p z p z p z p z p z p
-4.00 0.0000 -3.50 0.0002 -3.00 0.0013 -2.50 0.0062 -2.00 0.0228 -1.50 0.0668 -1.00 0.1587 -0.50 0.3085 0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987 3.50 0.9998
-3.99 0.0000 -3.49 0.0002 -2.99 0.0014 -2.49 0.0064 -1.99 0.0233 -1.49 0.0681 -0.99 0.1611 -0.49 0.3121 0.01 0.5040 0.51 0.6950 1.01 0.8438 1.51 0.9345 2.01 0.9778 2.51 0.9940 3.01 0.9987 3.51 0.9998
-3.98 0.0000 -3.48 0.0003 -2.98 0.0014 -2.48 0.0066 -1.98 0.0239 -1.48 0.0694 -0.98 0.1635 -0.48 0.3156 0.02 0.5080 0.52 0.6985 1.02 0.8461 1.52 0.9357 2.02 0.9783 2.52 0.9941 3.02 0.9987 3.52 0.9998
-3.97 0.0000 -3.47 0.0003 -2.97 0.0015 -2.47 0.0068 -1.97 0.0244 -1.47 0.0708 -0.97 0.1660 -0.47 0.3192 0.03 0.5120 0.53 0.7019 1.03 0.8485 1.53 0.9370 2.03 0.9788 2.53 0.9943 3.03 0.9988 3.53 0.9998
-3.96 0.0000 -3.46 0.0003 -2.96 0.0015 -2.46 0.0069 -1.96 0.0250 -1.46 0.0721 -0.96 0.1685 -0.46 0.3228 0.04 0.5160 0.54 0.7054 1.04 0.8508 1.54 0.9382 2.04 0.9793 2.54 0.9945 3.04 0.9988 3.54 0.9998
-3.95 0.0000 -3.45 0.0003 -2.95 0.0016 -2.45 0.0071 -1.95 0.0256 -1.45 0.0735 -0.95 0.1711 -0.45 0.3264 0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946 3.05 0.9989 3.55 0.9998
-3.94 0.0000 -3.44 0.0003 -2.94 0.0016 -2.44 0.0073 -1.94 0.0262 -1.44 0.0749 -0.94 0.1736 -0.44 0.3300 0.06 0.5239 0.56 0.7123 1.06 0.8554 1.56 0.9406 2.06 0.9803 2.56 0.9948 3.06 0.9989 3.56 0.9998
-3.93 0.0000 -3.43 0.0003 -2.93 0.0017 -2.43 0.0075 -1.93 0.0268 -1.43 0.0764 -0.93 0.1762 -0.43 0.3336 0.07 0.5279 0.57 0.7157 1.07 0.8577 1.57 0.9418 2.07 0.9808 2.57 0.9949 3.07 0.9989 3.57 0.9998
-3.92 0.0000 -3.42 0.0003 -2.92 0.0018 -2.42 0.0078 -1.92 0.0274 -1.42 0.0778 -0.92 0.1788 -0.42 0.3372 0.08 0.5319 0.58 0.7190 1.08 0.8599 1.58 0.9429 2.08 0.9812 2.58 0.9951 3.08 0.9990 3.58 0.9998
-3.91 0.0000 -3.41 0.0003 -2.91 0.0018 -2.41 0.0080 -1.91 0.0281 -1.41 0.0793 -0.91 0.1814 -0.41 0.3409 0.09 0.5359 0.59 0.7224 1.09 0.8621 1.59 0.9441 2.09 0.9817 2.59 0.9952 3.09 0.9990 3.59 0.9998
-3.90 0.0000 -3.40 0.0003 -2.90 0.0019 -2.40 0.0082 -1.90 0.0287 -1.40 0.0808 -0.90 0.1841 -0.40 0.3446 0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953 3.10 0.9990 3.60 0.9998
-3.89 0.0001 -3.39 0.0003 -2.89 0.0019 -2.39 0.0084 -1.89 0.0294 -1.39 0.0823 -0.89 0.1867 -0.39 0.3483 0.11 0.5438 0.61 0.7291 1.11 0.8665 1.61 0.9463 2.11 0.9826 2.61 0.9955 3.11 0.9991 3.61 0.9998
-3.88 0.0001 -3.38 0.0004 -2.88 0.0020 -2.38 0.0087 -1.88 0.0301 -1.38 0.0838 -0.88 0.1894 -0.38 0.3520 0.12 0.5478 0.62 0.7324 1.12 0.8686 1.62 0.9474 2.12 0.9830 2.62 0.9956 3.12 0.9991 3.62 0.9999
-3.87 0.0001 -3.37 0.0004 -2.87 0.0021 -2.37 0.0089 -1.87 0.0307 -1.37 0.0853 -0.87 0.1922 -0.37 0.3557 0.13 0.5517 0.63 0.7357 1.13 0.8708 1.63 0.9484 2.13 0.9834 2.63 0.9957 3.13 0.9991 3.63 0.9999
-3.86 0.0001 -3.36 0.0004 -2.86 0.0021 -2.36 0.0091 -1.86 0.0314 -1.36 0.0869 -0.86 0.1949 -0.36 0.3594 0.14 0.5557 0.64 0.7389 1.14 0.8729 1.64 0.9495 2.14 0.9838 2.64 0.9959 3.14 0.9992 3.64 0.9999
-3.85 0.0001 -3.35 0.0004 -2.85 0.0022 -2.35 0.0094 -1.85 0.0322 -1.35 0.0885 -0.85 0.1977 -0.35 0.3632 0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960 3.15 0.9992 3.65 0.9999
-3.84 0.0001 -3.34 0.0004 -2.84 0.0023 -2.34 0.0096 -1.84 0.0329 -1.34 0.0901 -0.84 0.2005 -0.34 0.3669 0.16 0.5636 0.66 0.7454 1.16 0.8770 1.66 0.9515 2.16 0.9846 2.66 0.9961 3.16 0.9992 3.66 0.9999
-3.83 0.0001 -3.33 0.0004 -2.83 0.0023 -2.33 0.0099 -1.83 0.0336 -1.33 0.0918 -0.83 0.2033 -0.33 0.3707 0.17 0.5675 0.67 0.7486 1.17 0.8790 1.67 0.9525 2.17 0.9850 2.67 0.9962 3.17 0.9992 3.67 0.9999
-3.82 0.0001 -3.32 0.0005 -2.82 0.0024 -2.32 0.0102 -1.82 0.0344 -1.32 0.0934 -0.82 0.2061 -0.32 0.3745 0.18 0.5714 0.68 0.7517 1.18 0.8810 1.68 0.9535 2.18 0.9854 2.68 0.9963 3.18 0.9993 3.68 0.9999
-3.81 0.0001 -3.31 0.0005 -2.81 0.0025 -2.31 0.0104 -1.81 0.0351 -1.31 0.0951 -0.81 0.2090 -0.31 0.3783 0.19 0.5753 0.69 0.7549 1.19 0.8830 1.69 0.9545 2.19 0.9857 2.69 0.9964 3.19 0.9993 3.69 0.9999
-3.80 0.0001 -3.30 0.0005 -2.80 0.0026 -2.30 0.0107 -1.80 0.0359 -1.30 0.0968 -0.80 0.2119 -0.30 0.3821 0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965 3.20 0.9993 3.70 0.9999
-3.79 0.0001 -3.29 0.0005 -2.79 0.0026 -2.29 0.0110 -1.79 0.0367 -1.29 0.0985 -0.79 0.2148 -0.29 0.3859 0.21 0.5832 0.71 0.7611 1.21 0.8869 1.71 0.9564 2.21 0.9864 2.71 0.9966 3.21 0.9993 3.71 0.9999
-3.78 0.0001 -3.28 0.0005 -2.78 0.0027 -2.28 0.0113 -1.78 0.0375 -1.28 0.1003 -0.78 0.2177 -0.28 0.3897 0.22 0.5871 0.72 0.7642 1.22 0.8888 1.72 0.9573 2.22 0.9868 2.72 0.9967 3.22 0.9994 3.72 0.9999
-3.77 0.0001 -3.27 0.0005 -2.77 0.0028 -2.27 0.0116 -1.77 0.0384 -1.27 0.1020 -0.77 0.2206 -0.27 0.3936 0.23 0.5910 0.73 0.7673 1.23 0.8907 1.73 0.9582 2.23 0.9871 2.73 0.9968 3.23 0.9994 3.73 0.9999
-3.76 0.0001 -3.26 0.0006 -2.76 0.0029 -2.26 0.0119 -1.76 0.0392 -1.26 0.1038 -0.76 0.2236 -0.26 0.3974 0.24 0.5948 0.74 0.7704 1.24 0.8925 1.74 0.9591 2.24 0.9875 2.74 0.9969 3.24 0.9994 3.74 0.9999
-3.75 0.0001 -3.25 0.0006 -2.75 0.0030 -2.25 0.0122 -1.75 0.0401 -1.25 0.1056 -0.75 0.2266 -0.25 0.4013 0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970 3.25 0.9994 3.75 0.9999
-3.74 0.0001 -3.24 0.0006 -2.74 0.0031 -2.24 0.0125 -1.74 0.0409 -1.24 0.1075 -0.74 0.2296 -0.24 0.4052 0.26 0.6026 0.76 0.7764 1.26 0.8962 1.76 0.9608 2.26 0.9881 2.76 0.9971 3.26 0.9994 3.76 0.9999
-3.73 0.0001 -3.23 0.0006 -2.73 0.0032 -2.23 0.0129 -1.73 0.0418 -1.23 0.1093 -0.73 0.2327 -0.23 0.4090 0.27 0.6064 0.77 0.7794 1.27 0.8980 1.77 0.9616 2.27 0.9884 2.77 0.9972 3.27 0.9995 3.77 0.9999
-3.72 0.0001 -3.22 0.0006 -2.72 0.0033 -2.22 0.0132 -1.72 0.0427 -1.22 0.1112 -0.72 0.2358 -0.22 0.4129 0.28 0.6103 0.78 0.7823 1.28 0.8997 1.78 0.9625 2.28 0.9887 2.78 0.9973 3.28 0.9995 3.78 0.9999
-3.71 0.0001 -3.21 0.0007 -2.71 0.0034 -2.21 0.0136 -1.71 0.0436 -1.21 0.1131 -0.71 0.2389 -0.21 0.4168 0.29 0.6141 0.79 0.7852 1.29 0.9015 1.79 0.9633 2.29 0.9890 2.79 0.9974 3.29 0.9995 3.79 0.9999
-3.70 0.0001 -3.20 0.0007 -2.70 0.0035 -2.20 0.0139 -1.70 0.0446 -1.20 0.1151 -0.70 0.2420 -0.20 0.4207 0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974 3.30 0.9995 3.80 0.9999
-3.69 0.0001 -3.19 0.0007 -2.69 0.0036 -2.19 0.0143 -1.69 0.0455 -1.19 0.1170 -0.69 0.2451 -0.19 0.4247 0.31 0.6217 0.81 0.7910 1.31 0.9049 1.81 0.9649 2.31 0.9896 2.81 0.9975 3.31 0.9995 3.81 0.9999
-3.68 0.0001 -3.18 0.0007 -2.68 0.0037 -2.18 0.0146 -1.68 0.0465 -1.18 0.1190 -0.68 0.2483 -0.18 0.4286 0.32 0.6255 0.82 0.7939 1.32 0.9066 1.82 0.9656 2.32 0.9898 2.82 0.9976 3.32 0.9995 3.82 0.9999
-3.67 0.0001 -3.17 0.0008 -2.67 0.0038 -2.17 0.0150 -1.67 0.0475 -1.17 0.1210 -0.67 0.2514 -0.17 0.4325 0.33 0.6293 0.83 0.7967 1.33 0.9082 1.83 0.9664 2.33 0.9901 2.83 0.9977 3.33 0.9996 3.83 0.9999
-3.66 0.0001 -3.16 0.0008 -2.66 0.0039 -2.16 0.0154 -1.66 0.0485 -1.16 0.1230 -0.66 0.2546 -0.16 0.4364 0.34 0.6331 0.84 0.7995 1.34 0.9099 1.84 0.9671 2.34 0.9904 2.84 0.9977 3.34 0.9996 3.84 0.9999
-3.65 0.0001 -3.15 0.0008 -2.65 0.0040 -2.15 0.0158 -1.65 0.0495 -1.15 0.1251 -0.65 0.2578 -0.15 0.4404 0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978 3.35 0.9996 3.85 0.9999
-3.64 0.0001 -3.14 0.0008 -2.64 0.0041 -2.14 0.0162 -1.64 0.0505 -1.14 0.1271 -0.64 0.2611 -0.14 0.4443 0.36 0.6406 0.86 0.8051 1.36 0.9131 1.86 0.9686 2.36 0.9909 2.86 0.9979 3.36 0.9996 3.86 0.9999
-3.63 0.0001 -3.13 0.0009 -2.63 0.0043 -2.13 0.0166 -1.63 0.0516 -1.13 0.1292 -0.63 0.2643 -0.13 0.4483 0.37 0.6443 0.87 0.8078 1.37 0.9147 1.87 0.9693 2.37 0.9911 2.87 0.9979 3.37 0.9996 3.87 0.9999
-3.62 0.0001 -3.12 0.0009 -2.62 0.0044 -2.12 0.0170 -1.62 0.0526 -1.12 0.1314 -0.62 0.2676 -0.12 0.4522 0.38 0.6480 0.88 0.8106 1.38 0.9162 1.88 0.9699 2.38 0.9913 2.88 0.9980 3.38 0.9996 3.88 0.9999
-3.61 0.0002 -3.11 0.0009 -2.61 0.0045 -2.11 0.0174 -1.61 0.0537 -1.11 0.1335 -0.61 0.2709 -0.11 0.4562 0.39 0.6517 0.89 0.8133 1.39 0.9177 1.89 0.9706 2.39 0.9916 2.89 0.9981 3.39 0.9997 3.89 0.9999
-3.60 0.0002 -3.10 0.0010 -2.60 0.0047 -2.10 0.0179 -1.60 0.0548 -1.10 0.1357 -0.60 0.2743 -0.10 0.4602 0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981 3.40 0.9997 3.90 1.0000
-3.59 0.0002 -3.09 0.0010 -2.59 0.0048 -2.09 0.0183 -1.59 0.0559 -1.09 0.1379 -0.59 0.2776 -0.09 0.4641 0.41 0.6591 0.91 0.8186 1.41 0.9207 1.91 0.9719 2.41 0.9920 2.91 0.9982 3.41 0.9997 3.91 1.0000
-3.58 0.0002 -3.08 0.0010 -2.58 0.0049 -2.08 0.0188 -1.58 0.0571 -1.08 0.1401 -0.58 0.2810 -0.08 0.4681 0.42 0.6628 0.92 0.8212 1.42 0.9222 1.92 0.9726 2.42 0.9922 2.92 0.9982 3.42 0.9997 3.92 1.0000
-3.57 0.0002 -3.07 0.0011 -2.57 0.0051 -2.07 0.0192 -1.57 0.0582 -1.07 0.1423 -0.57 0.2843 -0.07 0.4721 0.43 0.6664 0.93 0.8238 1.43 0.9236 1.93 0.9732 2.43 0.9925 2.93 0.9983 3.43 0.9997 3.93 1.0000
-3.56 0.0002 -3.06 0.0011 -2.56 0.0052 -2.06 0.0197 -1.56 0.0594 -1.06 0.1446 -0.56 0.2877 -0.06 0.4761 0.44 0.6700 0.94 0.8264 1.44 0.9251 1.94 0.9738 2.44 0.9927 2.94 0.9984 3.44 0.9997 3.94 1.0000
-3.55 0.0002 -3.05 0.0011 -2.55 0.0054 -2.05 0.0202 -1.55 0.0606 -1.05 0.1469 -0.55 0.2912 -0.05 0.4801 0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984 3.45 0.9997 3.95 1.0000
-3.54 0.0002 -3.04 0.0012 -2.54 0.0055 -2.04 0.0207 -1.54 0.0618 -1.04 0.1492 -0.54 0.2946 -0.04 0.4840 0.46 0.6772 0.96 0.8315 1.46 0.9279 1.96 0.9750 2.46 0.9931 2.96 0.9985 3.46 0.9997 3.96 1.0000
-3.53 0.0002 -3.03 0.0012 -2.53 0.0057 -2.03 0.0212 -1.53 0.0630 -1.03 0.1515 -0.53 0.2981 -0.03 0.4880 0.47 0.6808 0.97 0.8340 1.47 0.9292 1.97 0.9756 2.47 0.9932 2.97 0.9985 3.47 0.9997 3.97 1.0000
-3.52 0.0002 -3.02 0.0013 -2.52 0.0059 -2.02 0.0217 -1.52 0.0643 -1.02 0.1539 -0.52 0.3015 -0.02 0.4920 0.48 0.6844 0.98 0.8365 1.48 0.9306 1.98 0.9761 2.48 0.9934 2.98 0.9986 3.48 0.9997 3.98 1.0000
-3.51 0.0002 -3.01 0.0013 -2.51 0.0060 -2.01 0.0222 -1.51 0.0655 -1.01 0.1562 -0.51 0.3050 -0.01 0.4960 0.49 0.6879 0.99 0.8389 1.49 0.9319 1.99 0.9767 2.49 0.9936 2.99 0.9986 3.49 0.9998 3.99 1.0000
Working with the z-table
We assume that Lower Saxony cows have a mean yield of 6000 kg and a standard deviation of
1000 kg N(6000, 1000). We want to find out the probability of X being larger than 8000; with µ
given 6000 and σ given 1000 P(X>8000,6000|1000)
Procedure:
-First, we are interested in the area under the normal distribution N(6000, 1000) to the right of
(respectively above) the limit of X=8000.
-We replace the values =6000, =1000, X=8000 in the formula for the z-transformation =>
z=(8000-6000)/1000 =2.00.
-Interpretation: the limit is 2 standard deviations above the mean value
-P(z=2) = 0.9772 is taken from the table
-97.72% of the animals thus have a lower milk yield than 8000 kg
-The difference to 100% then has a milk yield > 8000kg
P(X>8000|6000,1000)=1-0.9772= 0.0228=2.28%
If we sample 500 cows randomly out of the Lower Saxony cows we can expect to get with in this
sample 500*2.28%=11.4 cows
P(z) and F(z), respectively: leftsided area under the standard normal distribution less then the limit z.
z second decimal place
integer.first
decimal 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
...
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
...
Particular areas of the normal
distribution
µ±σ P(x)
µ±1σ 0.683
µ±1.645σ 0.900
µ±1.96σ 0.950
µ±2σ 0.954
µ±2.6σ 0.990
µ±3σ 0.997
• For every N(µ, σ) the area
proportion between any limits can
be given as a multiple of σ with the
help of the Z-table. Symmetric
limits in the form µ±σ have a
specific meaning.

Summary normal distribution
Areas under the normal distribution curve indicate probabilities

The normal distribution of data is regarded as a prerequisite for many statistical
tests
Normally distributed data can be assumed in most cases for continuous variables
(e.g. yield, milk yield, …) that can be assumed to be influenced by a multiple set
of random influences (year(specific weather conditions), site, ...)
For example, the milk yield depends on ‚random factors‘ like technical
measurement error, individual, age, lactation, race, mother, father, rank, stress,
feed, weather, …
For example, the height of maize plants can a priori assumed as normaly
distributed because, ….

Describing Distributions
• Continuous random variables are not necessarily normal distributed
Skew = 0 Skew<0 Skew>0

Binomial distribution Count data,
Skewness
Bell shaped Left skewed Right skewed
Kurt≈ 3 or ≈ 0
dep. whether a correction Percentage scale
Uniform
Kurtosis
term (-3) considered distributions
normal Negative excess Positive excess

Number of
Modi
unimodal bimodal multimodal

Compare Sample and Theoretical
Distribution, QQ-Plot
75,0
73,0
Height (observerd and expected, resp.)
observed
71,0 expected
3
69,0
67,0 2
65,0
1
63,0
Observed Quantile
61,0 0
59,0 -4 -2 0 2 4
-1
57,0
0,0 0,2 0,4 0,6 0,8 1,0
Quantil [observerd and expected, respectively) -2
QQ-Plots
• As well to compare sample distribution with other theoretical -3
distributions
• several different types exist with respect to -4
Theoratical Quantile
- axis (flipped)
- axis labels (PP-plot, quantiles presented as probabilities)
- quantiles (standardized or not)
- reference line
 Generally ,if assumed theoretical distribution fits well to the
sample distribution the data points lay more or less on the
straight reference line
The t-distribution (Student-t-
distribution)
For small sample sizes n<30
Flatter than the normal distribution; the higher n resp. df, the closer to the normal
distribution
-> For small n it is less probable to by chance draw values close to the mean
The total area under the probability density function of the t-distribution is 1
Areas for different degrees of
freedom are tabulated.
FG=30, ~N(0,1)
FG=3
-4 -3 -2 -1 0 1 2 3 4
t

Displaying distributions: Box Plot
The whiskers or lines on either side of the
box show the range of the data (Min and
Max).
The box contains the middle 50 % of the
data values (interquartile range, IQR).
Lower quartile Q1: 25 % of the observations
are less than Q1
Upper Quartile Q3: 75 % of the observations
are less than Q3
Median (Q2): 50 % of observations are less
than the median (mid-value).
If n is odd, this is the middle number after Yield of two rapeseed cultivars in a series of experiments (24
sorting them in order of magnitude, if n is sites)
even it is the average of the middle two.
Box plots show whether a distribution is
The median is preferred to the arithmetic
skewed (cultivar B) or symmetrical (cultivar A).
mean when the distribution is skewed
Tukey box plot: Outliers are excluded from the
(nonsymmetrical).
whiskers and highlighted with symbols. The
ends of the whiskers represent the lowest
datum still within 1.5 IQR of the lower quartile,
and the highest datum still within 1.5 IQR of the
upper quartile
Central Limit Theorem
Example: Counts of weed seedlings
in randomly selected plots
- n = 600, left skewed, x̅ = 4.95, median= 4,
mode = 3, Skew: 3 = 1.18
- out of 10 values means are calculated
 n = 60 mean values, x̅10 = 4.95, median =
4.9, mode = 5.2, Skew : 3 = 0.347
Dispersion of the mean values (x̅10) is
approximately symmetric
According to the central limit theorem (CLT)
the arithmetic mean of subsamples will be
approximately normally distributed, regardless of
the underlying distribution of the single
observations
TIP:
Think about the expected distribution of your
response variable
 try to collect the data in a way that you get
metric data.
Check up 2b
The heights within a wheat population are normally
distributed with a mean of 80 cm and a standard
deviation of 5 cm. Which proportion of plants has
heights
(1) less than 72 cm,
(2) between 82 and 87 cm
(3) between 72 and 82 cm.
(4) Above what height are the top 20 % of plants?
(5) Below what height are the lowest 4 % of plants?
[from Clewer & Scarisbrick 2001]
Solution Check up 1a)
No. Scale Meassure of Meassure of Mathmatical Example

central tendency disperson operation
1 Mode a=b; a≠b
2 Ordinal Disease rating
3 Degree Celsius
4 Rational Coefficient of
variance
Solution Check up 1b)
No. Scale Measure of Measure of Mathmatical Example

central tendency disperson operation
1 Nominal* Mode Diversity indices a=b; a≠b Sex; Location; Name
eg. Shannon-Index H
2 Ordinal Median; mode range: from Min to a>b; a<b Disease rating
Max & those of No. 1
3 Interval Arithmetic mean; median Variance; Standard a-b=c; a+b=c Degree Celsius
deviation; range= & those of No. 1, 2
Max-Min
4 Rational Geometric mean; arithmetic Coefficient of a/b=c; a*b=c Yield
mean; median variance; standard & those of No. 1, 2, 3
dev.; range
*For the special case of a Binominal scale (False =0 vs. True=1) aswell the variance and the mean value is allowed.

PSED18 02 Descriptive Statistics, Distributions

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

PSED18 02 Descriptive Statistics, Distributions

Hochgeladen von

Copyright:

Verfügbare Formate

Practical Statistics and Experimental Design

02 WEDNESDAY 11TH APRIL 2018

DR. CHRISTIAN KLUTH Von-Siebold-Str. 8

high probability of finding

Low density of https://upload.wikimedia.

finding a seedling in this

Coin flip Sum of rolling Counts of seedlings/spores

Xi are discrete RV => the probability of an exact outcome can be given

0.5 h hhh 0.125 0 0 0.125

0.5 h thh 0.125 1

0.8 h hhh 0.512 3 0 0.512

0.8 h ihh 0.128 2

Obs. relative Frequency of k success

series= 48 0,3 The connecting lines are

Working with the binomial distribution

=> Show further examples in Excel

midpoi Abs. Rel. Cumuati

The cumulative distribution can be displayed

the Q50 quantile cuts the

Q25= Q50= Q75=

the mean value is sensitive

180 absolute deviations =

different ways to compute (displacement law)

• Unit is the unit of the meassure data squared

Solution=> Standard Error => Standard deviation of the expected mean

A living histogram from

probability density functionf(x)

Probability density function f(x)

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 36

Integral of the proba-

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 37

Distribution function F(x)

Milk yield kg/year

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 38

-> (cumulative) distribution

function gives the area under the

Distribution function F(x)

• Because of the symmetric

distribution, negative values 0.6

table. The simple relation 0.2

can be used instead

a distinct area. 0.2

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 45

Areas under the normal distribution curve indicate probabilities

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 46

Skew = 0 Skew<0 Skew>0

Bell shaped Left skewed Right skewed

term (-3) considered distributions

normal Negative excess Positive excess

unimodal bimodal multimodal

C. Kluth Practical Statistics and Experimental Design • 02 Descriptive Stats, Distribution 49

[from Clewer & Scarisbrick 2001]

No. Scale Meassure of Meassure of Mathmatical Example

2 Ordinal Disease rating

No. Scale Measure of Measure of Mathmatical Example

Das könnte Ihnen auch gefallen