Beruflich Dokumente
Kultur Dokumente
Numerical Summary
Numerical Summaries of Data
(primarily for quantitative variables)
Location
Variation
Mode
Measures of Location
Give middle or typical values or central tendency.
Measures of Location
1. Mean 2. Median 3. Mode
Measures of Variation
Describe spread or scatter or dispersion in the data.
Measures of Location
1. Mean
the center of gravity of the data (histogram).
Sample Mean = X Population Mean =
Measures of Location
1. Mean
the center of gravity of the data (histogram).
m
Sample Mean = X Population Mean =
8/26/2013
Sample Mean
X=
=
S Xi
n
X1 + X2 + +Xn
Example 1
Note: For n observations, the median is located at the n+1 -th observation 2
q
Example 2
median example
Example 3
94
q q q
Median is
8/26/2013
Numerical Summary
Average is meaningless; look at percentages of occurrences. q If variable is quantitative, first look at a graph:
l Skewed or outliers? l More or less symmetric? Use median. Use mean.
Measures of Variation
1. Range 2. Variance & Standard Deviation 3. Interquartile Range (IQR)
Fund A: X = 10.0%
8.0 12.0 10.0
Fund B: X = 10.0%
60.0 -20.0 -10.0
8/26/2013
1. Range
Highest minus lowest value in the sample.
Example 4:
3, 4, 1, 7, 4, 5
4 5
3 4
Example 4:
3, 4, 1, 7, 4, 5 (miles) X = 4.0
+3 -1 +1
-3
Note: q The average of the deviations from the mean will always be zero.
We cannot let the negatives cancel out the positives.
s2
=
x
3 4 1 7 4 5
S(Xi - X)2
n-1
-1 0 -3 3 0 1 0 1 0 9 9 0 1 20
x - x (x x)2
s2 = 6-1
=
20
4.0 miles
2
Standard deviation:
Total 24 x = 4.0
s= s
= 2.0 mi.
8/26/2013
Variance
Advantages: Good properties; uses all the data. Disadvantages: Units are squared. Not resistant.
s =
2
For a sample:
S( Xi m )2
N
s2 =
S(Xi X )2
n-1
Standard Deviation
S= = S
2
The square root of the variance. Advantage: Easier to interpret than variance, Units same as data.
4.0
Example: You are told you scored 47; then you hear 47 is at the 82nd percentile. 82% of the sample have scores 47, AND 18% have scores 47.
= 2.0 miles
Quartiles:
1st Quartile (25th percentile) :
25% of the data values
lie at
or below it.
or below it.
8/26/2013
Quartiles
Q1: 25% of the data set is below the first quartile Q2: 50% of the data set is below the second quartile Q3: 75% of the data set is below the third quartile
Q1
25% 25%
Q2
25%
Q3
Q1 =
Q3 =
median of observations below the position of the median. median of observations above the position of the median.
IQR = 8.5
IQR is the range of the middle 50% of the data. Observations more than 1.5 IQRs beyond quartiles are considered outliers.
Symmetric: Use mean, & std. dev. Skewed right: Use median, & IQR.
8/26/2013
Example 6
Boxplot
A graphically display of the five number summary
(also called a box-and-whiskers plot)
Ordered data: 12, 14, 16, 18, 19, 21, 22, 25, 27
Q1 = 15.0 Q3 = 23.5
IQR = 8.5
8/26/2013
28 26 24 22 20 18 16 14 12
Maximum Q3 Median Q1
Minimum