Sie sind auf Seite 1von 8

8/26/2013

Numerical Summary
Numerical Summaries of Data
(primarily for quantitative variables)

Location

Variation

Mode

Measures of Location
Give middle or typical values or central tendency.

Measures of Location
1. Mean 2. Median 3. Mode

Measures of Variation
Describe spread or scatter or dispersion in the data.

Measures of Location
1. Mean
the center of gravity of the data (histogram).
Sample Mean = X Population Mean =

Measures of Location
1. Mean
the center of gravity of the data (histogram).
m
Sample Mean = X Population Mean =

8/26/2013

formula for mean

Sample Mean

Sum of observations divided by sample size

Median - midpoint of distribution


At least half of the observations are less than or equal to the median, and at least half are greater than or equal to the median.

X=
=

S Xi
n
X1 + X2 + +Xn

Example 1

Note: For n observations, the median is located at the n+1 -th observation 2
q

Data: 14, 18, 20, 12, 24, 15, 14


(n = 7 odd)

Step 1: Order the data:

12, 14, 14, 15, 18, 20, 24


Median is the middle value. At least half the values are at or greater; at least half are at or lower.
7+1 = 4th location of median 2
q

in the ordered sample.

Example 2

median example

Example 3

Data: 14, 18, 20, 12, 24, 15, 14 94 (outlier) (n = 7 odd)


Step 1: Order the data:

Data: 14, 18, 20, 12, 24, 15, 14, 214


(n = 8 even, outlier)

1st: Order the data:

12, 14, 14, 15, 18, 20, 24

94
q q q

12, 14, 14, 15, 18, 20, 24, 214 16.5


Median is the average of the two middle values. Exactly half the values are greater, half lower.
8+1 = 4.5th location of median 2

still the middle value. q Median is resistant to outliers.


q

Median is

Original: X = 16.71 with outlier: X = 26.71

8/26/2013

Summary for Finding Median


1. Order the data. 2. For odd n, the median is the center observation. 3. For even n, the median is the average of the two center observations.

3. Mode - most frequently occurring value


In a histogram, modal class is the one having largest frequency, i.e., highest bar.
Good for a discrete quant variable (few values) or a categorical variable.

What type of variable is it?


q If categorical, use the mode.

Numerical Summary

Average is meaningless; look at percentages of occurrences. q If variable is quantitative, first look at a graph:
l Skewed or outliers? l More or less symmetric? Use median. Use mean.

Location Mean Median Mode


Percentiles Quartiles

Variation Range Std. Deviation IQR

Why does variation matter?

Measures of Variation
1. Range 2. Variance & Standard Deviation 3. Interquartile Range (IQR)

Mutual Fund Selection


Two mutual funds; annual returns for last three years from each.

Which fund would you choose?

Fund A: X = 10.0%
8.0 12.0 10.0

Fund B: X = 10.0%
60.0 -20.0 -10.0

8/26/2013

1. Range
Highest minus lowest value in the sample.

Example 4:

3, 4, 1, 7, 4, 5

Range = Hi - Lo = 7-1 = 6 Example 5: 1, 1, 1, 7, 7, 7 Range = Hi - Lo = 7-1 = 6

4 5

3 4

2. Variance & Standard Deviation


How far are the data from the middle, on average?
Notation:
Sample Variance = s2 Sample Std. Dev. = s Population Variance = s2 Population Std. Dev. = s

Example 4:

3, 4, 1, 7, 4, 5 (miles) X = 4.0
+3 -1 +1

-3

Note: q The average of the deviations from the mean will always be zero.
We cannot let the negatives cancel out the positives.

Equation for Variance (for a sample):


Example 4 data:

s2

=
x
3 4 1 7 4 5

S(Xi - X)2
n-1
-1 0 -3 3 0 1 0 1 0 9 9 0 1 20

x - x (x x)2

s2 = 6-1
=

20

Avoid this by using either 1. absolute value or 2. squaring of the differences.

4.0 miles
2

Standard deviation:

Total 24 x = 4.0

s= s

= 2.0 mi.

8/26/2013

Equation for Variance:


For a population:

Variance
Advantages: Good properties; uses all the data. Disadvantages: Units are squared. Not resistant.

s =
2
For a sample:

S( Xi m )2
N

s2 =

S(Xi X )2
n-1

Standard Deviation
S= = S
2

Sample 100pth percentile:


If x = the 100pth percentile, then
at least 100p% of data is x, at least 100(1-p)% of data is x.

The square root of the variance. Advantage: Easier to interpret than variance, Units same as data.

4.0

Example: You are told you scored 47; then you hear 47 is at the 82nd percentile. 82% of the sample have scores 47, AND 18% have scores 47.

= 2.0 miles

Five Number Summary


1. 2. 3. 4. 5. Minimum 1st Quartile, Q1 = 25th ptile Median 3rd Quartile, Q3 = 75th ptile Maximum

Quartiles:
1st Quartile (25th percentile) :
25% of the data values
lie at

or below it.

3rd Quartile (75th percentile) :


75% of the data values lie at

or below it.

8/26/2013

Quartiles
Q1: 25% of the data set is below the first quartile Q2: 50% of the data set is below the second quartile Q3: 75% of the data set is below the third quartile

Method 1: Percentile method


Q1 located at position (n+1)*1/4 Q2 located at position (n+1)*2/4 Q3 located at position (n+1)*3/4
n 5 8 25% 11 3 6 9 Q1 1.5 2.25 Q2 3 4.5 Q3 4.5 6.75

Q1
25% 25%

Q2
25%

Q3

Method 2: Median method

Example 6 Ordered data:

Q1 =
Q3 =

median of observations below the position of the median. median of observations above the position of the median.

12, 14, 16, 18, 19, 21, 22, 25, 27


Q1 = 15.0 Q3 = 23.5

Max = Q3 = Median = Q1 = Min =

27.0 23.5 19.0 15.0 12.0

IQR = 8.5

3. Interquartile Range (IQR)


IQR = Q 3 - Q 1
q

Which summary statistics should I use?

IQR is the range of the middle 50% of the data. Observations more than 1.5 IQRs beyond quartiles are considered outliers.
Symmetric: Use mean, & std. dev. Skewed right: Use median, & IQR.

8/26/2013

Example 6

Boxplot
A graphically display of the five number summary
(also called a box-and-whiskers plot)

Ordered data: 12, 14, 16, 18, 19, 21, 22, 25, 27
Q1 = 15.0 Q3 = 23.5

Max = 27.0 Q3 = 23.5 Median = 19.0 Q1 = 15.0 Min = 12.0

IQR = 8.5

8/26/2013

28 26 24 22 20 18 16 14 12

Maximum Q3 Median Q1

Max = Q3 = Median = Q1 = Min =

27.0 23.5 19.0 IQR = 8.5 15.0 12.0

Note: Middle 50% of data are within the box

Minimum

Das könnte Ihnen auch gefallen