Beruflich Dokumente
Kultur Dokumente
PART 1 - DATA
Introduction
Probability and Statistics
● Compare a table:
Flights
Not much
can be
gained by
reading it.
Visualizing Data
● to a graph:
Flights
The graph uncovers
two distinct trends - an
increase in passengers
flying over the years
and a greater number
of passengers flying in
the summer months.
Analyze Visualizations Critically!
Nominal
● Predetermined categories
● Can’t be sorted
Ordinal
● Can be sorted
● Lacks scale
Survey responses
Levels of Measurement
Interval
● Provides scale
Temperature
Levels of Measurement
Ratio
● Values have a true zero point
𝒙𝟓 = 𝑥 × 𝑥 × 𝑥 × 𝑥 × 𝑥
1 2 3 4 5
EXAMPLE: 34 = 3 × 3 × 3 × 3 = 81
Exponents – special cases
1
𝑥 −3
=
𝑥×𝑥×𝑥
1 1
EXAMPLE: 2−3 = = = 0.125
2×2×2 8
1
𝑛
𝑥 𝑛 = 𝑥
1
3
EXAMPLE: 8 3 = 8=2
Factorials
𝒙! = 𝑥 × 𝑥 − 1 × 𝑥 − 2 × ⋯ × 1
EXAMPLE: 6! = 6 × 5 × 4 × 3 × 2 × 1 = 720
5! 5×4×3×2×1
EXAMPLE: = = 5 × 4 = 20
3! 3×2×1
Simple Sums
𝒏
𝒙 = 1 +2 + 3 + ⋯+ 𝑛
𝒙=𝟏
EXAMPLE: σ4𝑥=1 𝑥 = 1 + 2 + 3 + 4 = 10
EXAMPLE: σ4𝑥=1 𝑥 2 = 1 + 4 + 9 + 16 = 30
Series Sums
𝒏
𝒙𝒊 = 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
𝒊=𝟏
EXAMPLE: 𝑥 = {5,3,2,8}
𝑛 = # 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑥 = 4
𝟒
𝒙𝒊 = 5 + 3 + 2 + 8 = 18
𝒊=𝟏
Equation Example
7 + 8 + 9 + 10 34
= = = 8.5
4 4
Measurement Types
Central Tendency
Measurements of Data
9 10 10 11 13 15 16 19 19 21 23 28 30 33 34 36 44
= 19
Median - even number of values
10 10 11 13 15 16 19 19 21 23 28 30 33 34 36 44
19 + 21
= 20
2
Mean vs. Median
10 10 11 13 15 16 16 16 21 23 28 30 33 34 36 44
= 16
Measurement Types
Dispersion
Measures of Dispersion
(range, variance, standard deviation)
9 10 11 13 15 16 19 19 21 23 28 30 33 34 36 39
9 10 11 13 15 16 19 19 21 23 28 30 33 34 36 39
𝛴 𝑥−𝑥ҧ 2
SAMPLE VARIANCE: 2
𝑠 =
𝑛−1
𝛴 𝑋−𝜇 2
POPULATION VARIANCE: 𝜎2 =
𝑁
Sample Variance
2
𝛴 𝑥 − 𝑥ҧ
𝑠2 =
𝑛−1
4 + 7 + 9 + 8 + 11 39
4 7 9 8 11 𝑥ҧ = = = 7.8
sample
mean
5 5
𝑛−1
Sample: 4 + 7 + 9 + 8 + 11 39 sample
𝑥ҧ = = = 7.8 mean
4 7 9 8 11 5 5
Standard Deviation 𝑁
Population: 4 + 7 + 9 + 8 + 11 39 population
𝜇= = = 7.8 mean
4 7 9 8 11 5 5
9 10 10 11 13 15 16 19 19 21 23 28 30 33 34 36 44 45 47 60
1st quartile 2nd quartile 3rd quartile
or median
1. Divide the series
2. Divide each subseries
3. These become quartiles
Quartiles and IQR
9 10 10 11 13 15 16 19 19 21 23 28 30 33 34 36 44 45 47 60
1st quartile 2nd quartile 3rd quartile
or median
1st quartile = 14
2nd quartile = 22
3rd quartile = 35
Plot the Quartiles
9 10 10 11 13 15 16 19 19 21 23 28 30 33 34 36 44 45 47 60
Quartile
ranges are
seldom the
same size!
Fences & Outliers
In this set,
60 is not
an outlier,
but 70
would be
Fences & Outliers
9 10 10 11 13 15 16 19 19 21 23 28 30 33 34 36 44 45 47 70
fence at 1.5 IQR
Here 70
is a true
outlier
Positive Negative or
correlation Inverse
correlation
Covariance
1 𝑁
𝑐𝑜𝑣 𝑋, 𝑌 = (𝑥𝑖 − 𝑥)(
ҧ 𝑦𝑖 − 𝑦)
ത
𝑁 𝑖=1
Covariance Exercise
1 4 1 5
2 6 2 9
3 5 3 7
4 7 4 4
5 9 5 8
6 8 6 6
Covariance Exercise
● Plot them:
x y x y
1 4 1 5
2 6 2 9
3 5 3 7
4 7 4 4
5 9 5 8
6 8 6 6
Covariance Exercise x̅ = 3.5, y̅ = 6.5
1 4 1+2+3+4+5+6 1 5 1+2+3+4+5+6
𝑥ҧ = = 3.5 𝑥ҧ = = 3.5
6 6
2 6 2 9
3 5 4+6+5+7+9+8 3 7 5+9+7+4+8+6
𝑦ത = = 6.5 𝑦ത = = 6.5
4 7 6 4 4 6
5 9 5 8
6 8 6 6
Covariance Exercise x̅ = 3.5, y̅ = 6.5
● Calculate sums:
x y (x - x̅) (y - y̅) (x - x̅)(y - y̅) x y (x - x̅) (y - y̅) (x - x̅)(y - y̅)
● Calculate covariance:
1 𝑁 1 𝑁
x y 𝑐𝑜𝑣 𝑋, 𝑌 = (𝑥𝑖 − 𝑥)(
ҧ 𝑦𝑖 − 𝑦)
ത x y 𝑐𝑜𝑣 𝑋, 𝑌 = (𝑥𝑖 − 𝑥)(
ҧ 𝑦𝑖 − 𝑦)
ത
𝑁 𝑖=1 𝑁 𝑖=1
1 4 1 5
2 6 15.5 2 9 −0.5
= = 𝟐. 𝟓𝟖𝟑 = = −𝟎. 𝟎𝟖𝟑
6 6
3 5 3 7
4 7 4 4
5 9 5 8
6 8 6 6
15.5 -0.5
Covariance Exercise
● Compare covariances:
x y x y
1 4 1 5
2 6 2 9
3 5 3 7
4 7 4 4
5 9 5 8
6 8 6 6
cov(x,y) = 2.583 cov(x,y) = -0.083
Pearson Correlation
Coefficient
Pearson Correlation Coefficient
11 57
15 49
19 48
22 39
Correlation Exercise
11 57
15 49
19 48
22 39
Correlation Exercise
𝑥ҧ = 15.4 𝑦ത = 49.6
3. Calculate 𝑥 − 𝑥ҧ and 𝑦 − 𝑦ത :
Price Units Sold (x − x̅) (y − y̅)
(USD) (thousands)
10 55 -5.4 5.4
11 57 -4.4 7.4
15 49 -0.4 -0.6
19 48 3.6 -1.6
22 39 6.6 -10.6
σ(𝑥 − 𝑥)(𝑦
ҧ − 𝑦)
ത
Correlation Exercise 𝜌𝑋,𝑌 =
𝛴 𝑥 − 𝑥ҧ 2 𝛴 𝑦 − 𝑦ത 2
𝑥ҧ = 15.4 𝑦ത = 49.6
4. Calculate 𝑥 − 𝑥ҧ 𝑦 − 𝑦ത :
Price Units Sold (x − x̅) (y − y̅) (x − x̅)(y − y̅)
(USD) (thousands)
10 55 -5.4 5.4 -29.16
𝑥ҧ = 15.4 𝑦ത = 49.6
5. Calculate 𝑥 − 𝑥ҧ 2
and 𝑦 − 𝑦ത 2
:
Price Units Sold (x − x̅) (y − y̅) (x − x̅)(y − y̅) (x − x̅)2 (y − y̅)2
(USD) (thousands)
10 55 -5.4 5.4 -29.16 29.16 29.16
𝑥ҧ = 15.4 𝑦ത = 49.6
𝑥ҧ = 15.4 𝑦ത = 49.6
𝑥ҧ = 15.4 𝑦ത = 49.6
σ(𝑥 − 𝑥)(𝑦
ҧ − 𝑦)
ത −137.2
𝜌𝑋,𝑌 = =
𝛴 𝑥 − 𝑥ҧ 2 𝛴 𝑦 − 𝑦ത 2 105.2 199.2
−137.2 −137.2
= = = −𝟎. 𝟗𝟒𝟖
10.26 × 14.11 144.8