Beruflich Dokumente
Kultur Dokumente
Population Sample
a b cd b c
ef gh i jk l m n gi n
o p q rs t u v w o r u
x y z y
7
Descriptive statistics
8
Grouped Relative Frequency
Distribution
Relative Frequency Distribution of IQ for Two Classes
80 89 3 12.5 12.5
90 99 5 20.8 33.3
100 1096 25.0 58.3
110 119 3 12.5 70.8
120 1293 12.5 83.3
130 1392 8.3 91.6
140 1491 4.2 95.8
150 and over 1 4.2 100.0
10
Descriptive Statistics
Histograms
A graphical summary tool that permits
sorting of data into cells. It is especially
useful for finding population tendencies
(location and dispersion). Requires
multiple (20-30) observations to allow
process responses to exhibit their
tendencies. Also data specifics are lost
within the cell boundaries.
11
Histogram
8. 10
0
1 0.IQ
. 12 40
1. 16
.
0
2
Fn
y
u
c
q
re
3
4
5
isto
H
6 ra
gm
o
fIQ
S
co
re
sfo
rT
w
C
la
se
Descriptive Statistics
The drawback is that within each cell, we
lose the data point values contained by
the cell. For example, we would not be
able to see from the graphical
representation that the 80s cell data all
lie within the range of 80 89. Therefore,
cell selection can be an art form requiring
some care.
13
Stem and Leaf Plot
Stem and Leaf Plot of IQ for Two Classes
Stem Leaf
8 279
9 3678
10 235679
11 159
12 078
13 1
14 0
15
16 2
Descriptive Statistics
Summarizing Data:
Y-bar = Yi
n
Mean
Class A--IQs of 13 Students Class B--IQs of 13 Students
102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109
Yi = 1437 Yi = 1433
Y-barA = Yi = 1437 = 110.54 Y-barB = Yi = 1433 =
110.23
n 13 n 13
Mean
The mean is the balance point.
Each persons score is like 1 kg placed at the scores
position on a see-saw. Below, on a 200 cm see-saw, the
mean equals 110, the place on the see-saw where a
fulcrum finds balance:
1 kg at 1 kg at 1 kg at
93 cm 106 cm 110 cm 131 cm
17 21
4
units
units 0
above
below units
units
below
The scale is balanced because
17 + 4 on the left = 21 on the right
Mean
1. Means can be badly affected by outliers
(data points with extreme values unlike the
rest)
2. Outliers can make the mean a bad measure
of central tendency or common experience
Income in India
Ambani
All of Us
Mean Outlier
Median
The middle value when a variables values are ranked
in order; the point that divides a distribution into two
equal halves.
All of Us Ambani
outlier
Median
2. If the recorded values for a variable form a
symmetric distribution, the median and
mean are identical.
3. In skewed data, the mean lies further
toward the skew than the median.
Symmetric Skewed
Mean Mean
Median Median
Median
The middle score or measurement in a set of ranked
scores or measurements; the point that divides a
distribution into two equal halves.
mode!!
n
C
u
o
.6
1
t
.8
1
.0
2
center of a
distribution.
Mean
Median
Mode Mode MedianMean
Descriptive Statistics
Summarizing Data:
To get the range for a variable, you subtract its lowest value
from its highest value.
Class A--IQs of 13 Students Class B--IQs of 13 Students
102 115 127 162
128 109 131 103
131 89 96 111
98 106 80 109
140 119 93 87
93 97 120 105
110 109
Class A Range = 140 - 89 = 51 Class B Range = 162 - 80 = 82
Interquartile Range
A quartile is the value that marks one of the divisions that breaks a series of values into
four equal parts.
25th percentile is a quartile that divides the first of cases from the latter .
75th percentile is a quartile that divides the first of cases from the latter .
The interquartile range is the distance or range between the 25th percentile and the 75th
percentile. Below, what is the interquartile range?
The larger the variance, the further the individual cases are from the
mean.
Mean
The smaller the variance, the closer the individual scores are to the
mean.
Mean
Variance
Variance is a number that at first seems
complex to calculate.
Yi Y-bar
Variance
The deviation of 102 from 110.54 is? Deviation of 115?
235.45 = 15.34
Review:
1. Deviation
2. Deviation squared
3. Sum of squares
4. Variance
5. Standard deviation
Standard Deviation
1. Larger s.d. = greater amounts of variation around the mean.
For example:
19 25 31 13 25 37
Y = 25 Y = 25
s.d. = 3 s.d. = 6
2. s.d. = 0 only when all values are the same (only when you have a
constant and not a variable)
3. If you were to rescale a variable, the s.d. would change by the same
magnitudeif we changed units above so the mean equaled 250, the s.d.
on the left would be 30, and on the right, 60
4. Like the mean, the s.d. will be inflated by an outlier case value.
Box-Plots
IQR = 27;
There is no
outlier.
162
18.5
M=110.5 106.5
96.5
17.5
82
16.5
Form 1 Form 2
Probability functions
A probability function maps the possible
values of x against their respective
probabilities of occurrence, p(x)
p(x) is a number from 0 to 1.0.
The area under a probability function is
always 1.
Discrete example: roll of a die
p(x)
1/6
x
1 2 3 4 5 6
P(x) 1
all x
Probability mass function
(pmf)
x p(x)
1 p(x=1)=1
/6
2 p(x=2)=1
/6
3 p(x=3)=1
/6
4 p(x=4)=1
/6
5 p(x=5)=1
/6
6 p(x=6)=1
/6
1.0
Cumulative distribution
function (CDF)
1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative distribution
function
x P(xA)
1 P(x1)=1/6
2 P(x2)=2/6
3 P(x3)=3/6
4 P(x4)=4/6
5 P(x5)=5/6
6 P(x6)=6/6
Review Question 1
a. 1/6
b. 1/3
c. 1/2
d. 5/6
e. 1.0
Review Question 2
a. 1/5
b. 2/3
c. 1/2
d. 5/6
e. 1.0
Review Question 2
e
x x
e 0 1 1
0
0
Continuous case: probability
density function (pdf)
p(x)=e-x
x
1 2
2 2
x x
P(1 x 2) e e e 2 e 1 .135 .368 .23
1
1
Example 2: Uniform
distribution
The uniform distribution: all values are equally likely.
f(x)= 1 , for 1 x 0
p(x)
x
1
1 x
1 0 1
0
0
Example: Uniform
distribution
Whats the probability that x is between 0 and ?
p(x)
0 x
1
P( x 0)=
Expected Value and Variance
All probability distributions are
characterized by an expected value
(mean) and a variance (standard
deviation squared).
Expected value of a random variab
Discrete case:
E( X ) x p(x )
all x
i i
Continuous case:
E( X )
all x
xi p(xi )dx
Symbol Interlude
E(X) =
these symbols are used interchangeably
Example: expected value
x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1
2 Var ( x) E[( x ) 2 ]
all x
( xi ) 2 p(xi )
Variance, continuous
Discrete case:
Var ( X ) (x
all x
i ) p(xi )
2
Continuous case?:
( xi ) p(xi )dx
2
Var ( X )
all x
Symbol Interlude
Var(X)= 2
SD(X) =
these symbols are used interchangeably
Review Question 3
The expected value and variance of a
coin toss (H=1, T=0) are?
a. .50, .50
b. .50, .25
c. .25, .50
d. .25, .25
Review Question 3
The expected value and variance of
a coin toss are?
a. .50, .50
b. .50, .25
c. .25, .50
d. .25, .25
Important discrete
probability distribution:
The binomial
Binomial Probability
Distribution
A fixed number of observations (trials), n
e.g., 15 tosses of a coin; 20 patients; 1000 people surveyed
A binary outcome
e.g., head or tail in each toss of a coin; disease or no disease
Generally called success and failure
Probability of success is p, probability of failure is 1 p
Constant probability for each observation
e.g., Probability of getting a tail is the same each time we
toss the coin
Binomial distribution
Take the example of 5 coin tosses.
Whats the probability that you flip
exactly 3 heads in 5 coin tosses?
Outcome Probability
THHHT (1/2)3x(1/2)2
HHHTT (1/2)3x(1/2)2
TTHHH (1/2)3x(1/2)2
HTTHH (1/2)3x(1/2)2 The probability
ways to
5 arrange 3
HHTTH (1/2)3x(1/2)2 of each unique
heads in
HTHHT
THTHH
(1/2)3x(1/2)2
(1/2)3x(1/2)2
outcome (note:
3 5 trials HTHTH (1/2)3x(1/2)2
they are all
equal)
HHTHT (1/2)3x(1/2)2
THHTH (1/2)3x(1/2)2
10 arrangements x (1/2)3x(1/2)2
5
C3 = 5!/3!2! = 10
10x()5=31.25%
Binomial distribution function:
X= the number of heads tossed in 5 coin tosses
p(x)
x
0 1 2 3 4 5
number of heads
Binomial distribution,
generally
Notethegeneralpatternemergingifyouhaveonlytwopossible
outcomes(callthem1/0oryes/noorsuccess/failure)innindependent
trials,thentheprobabilityofexactlyXsuccesses=
n = number of trials
n
X n X
p (1 p )
X
1-p = probability
of failure
X=# p=
successes probability of
out of n success
trials
Binomial distribution:
example
Then:
Note: the variance will
E(X) = np always lie between
0*N-.25 *N
Var (X) = np(1-p) p(1-p) reaches maximum at
p=.5
SD (X)= np (1 p )
P(1-p)=.25
Review Question
10
a. 0
5
(.50) (.50)
5
b. 10
5
(.50) (.50)
5
5
c. 10
10
(.50) (.50)
5
d.
5
10
10 0
(.50) (.50)
10
Review Question 5
10
a. 0
5
(.50) (.50)
5
b. 10
5
(.50) (.50)
5
5
c. 10
10
(.50) (.50)
5
d.
5
10
10 0
(.50) (.50)
10
Review Question 6
a. .5, .25
b. 1.0, 1.0
c. 1.5, .5
d. .25, .5
e. .5, .5
Review Question
a. .5, .25
b. 1.0, 1.0
c. 1.5, .5
d. .25, .5
e. .5, .5
Review Question
a. 5, 5
b. 10, 5
c. 2.5, 5
d. 5, 2.5
e. 2.5, 10
Review Question
a. 5, 5
b. 10, 5
c. 2.5, 5
d. 5, 2.5
e. 2.5, 10
Continuous Probability Distributions
Uniform
Triangular
Normal
Exponential
The Uniform Distribution
Equally Likely chances of
occurrences of RV values f(x)
between a maximum and a
minimum
1/(b-a)
Mean = (b+a)/2
Variance = (b-a)2/12
a is a location parameter
a b x
b-a is a scale parameter
no shape parameter
The Uniform Distribution
Symmetric
a c b x
The Triangular Distribution
f(x) Skewed (+) to the Right
a c b x
The Triangular Distribution
f(x) Skewed (-) to the Left
a c b x
The Triangular Distribution
Probability Distribution Function
2 x a
f x if a x c
b a c a
2 b x
f x if c x b
b a b c
f x 0 otherwise
The Triangular Distribution
Distribution Function
F x 0 if x a
x a
2
F x if a x c
b a c a
b x
2
F x 1 if c x b
b a b c
F x 1 if x b
The Triangular Distribution
Parameters:Minimum a, maximum b, most likely c
Symmetric or skewed in either direction
a location parameter
(b-a) scale parameter
c shape parameter
Mean = (a+b+c) / 3
Variance = (a2 + b2 + c2 - ab- ac-bc)/18
Used as rough approximation of other distributions
The Normal Distribution
Bell Shaped
Symmetrical f(X)
f X
2
e
2 2
There are
an Infinite
Number
X
c d
Which Table?
Each distribution
has its own table?
f ( x)dx
a a 2
e 2 2
dx
98
Normal Distribution
99
Normal Distribution
100
Normal Distribution
101
Normal Distribution
x
The Standardized Variable is
102
Normal Distribution
x
Z is a standard normal r.v. [Z~N(0,1)]
Note:
X x x x
P( X x) P( ) P( Z ) ( )
103
Solution: The Cumulative
Standardized Normal Distribution
Cumulative Standardized Normal
Distribution Table (Portion)
0 and 1 .5478
Z .00 .01 .02
0.0 .5000 .5040 .5080 Shaded Area
Exaggerated
0.1 .5398 .5438 .5478
0.2 .5793 .5832 .5871
Z = 0.12
0.3 .5179 .5217 .5255 Probabilities
Only One Table is Needed
Standardizing Example
X 6.2 5
Z 0.12
10
Normal Standardized
Distribution Normal Distribution
= 10 Z = 1
= 5 6.2 X = 0 .12 Z
Shaded Area Exaggerated
Example:
P(2.9 < X < 7.1) = .1664
x 2.9 5
z .21
10
Normal x 7.1 5
z .21 Standardized
Distribution 10 Normal Distribution
= 10 Z = 1
.1664
.0832 .0832
= 10 =1
.5000
.3821
.1179
=5 8 X = 0 .30 Z
Shaded Area Exaggerated
Finding Z Values
for Known Probabilities
What Is Z Given Standardized Normal
Probability = 0.1217? Probability Table (Portion)
= 10 =1
.1217 .1217
=5 ? X = 0 .31 Z
X Z= 5 + (0.31)(10) = 8.1
Shaded Area Exaggerated
T-distribution
Use of the t-distribution is similar to
the use of the standard normal
distribution, except that the degrees
of freedom must be accounted for.
The estimation of the true process
mean by the experimental mean
creates the loss of one degree of
freedom in estimating the true
process standard deviation by s.
110
T-Distribution
t-distribution or students t-distribution
Using S for in computing standardized z-values to look up on the
normal table is not trustworthy for small sample sizes (n<30).