Sie sind auf Seite 1von 16

Statistics for Describing, Exploring, and Comparing Data 3-1: Review and Preview

Statistical ways to summarize data. Measures of central tendency: mean, median, mode, and midrange o Averages o Center of the distribution o Location of the center, middle o Although it describes all of us it describes none of us Measures of variation !dis"ersion#: range, variance, and standard deviation. o S"read of the data o Cluster around the center$ !small value, variance# o More widely s"read out$ !larger value, variance# Measures of "osition: "ercentiles, deciles, %uartiles. o &here a s"ecific data value falls within the set. o Com"arison of data value to the set. o Called norms 'hese three ty"es of measures are called traditional statistics. (sed to confirm con)ectures about the data. *+"loratory data analysis , see what the data will show. o -o+ "lot o . number summary

3-2: Measures of Center


RECALL: RECALL:
&e ta/e a sam"le from a "o"ulation. 0f we have a small "o"ulation we can sam"le the whole "o"ulation !census# or if our "o"ulation is large we ta/e a sam"le. 1arameter is a measure found by using all the data values in the "o"ulation , a characteristic of the "o"ulation. A statistic is a measure found by using data values from a sam"le , a characteristic of the sam"le.

General Rounding Rule: 2ounding should not ta/e "lace until the final answer is calculated. 0ntermediate
rounding increases error in your final calculation. &e write down on "a"er about 3 or 4 decimal "laces to show our wor/ but /ee" all the digits on your calculator to ensure an e+act answer. A is a value at the middle or center of the data set. 'here are several ways to calculate the center of a data set and they don5t all agree. *ach measure of center has its benefits as well as its drawbac/s. &e will loo/ at each measure below:

measure of center
'he mean : o

Arithmetic average is found by adding u" all data values and dividing by the total number of
values, n. An arithmetic average in statistics is referred to as either:

the

sample mean ,
x=

x , and it is calculated as follows:

x1 + x2 + x3 + ... + xn = n

x
n

Math 121 Chapter 3

Page 1

n is the number of sam"le data values.

6r, the

population mean,
=
x1 + x2 + x3 + ... + xN = N

and is calculated as follows:

x
N

7 is the number of values in the "o"ulation. 'he mean, median, and midrange should

Rounding Rule for the Mean, Median, and Midrange:


always have one more decimal "lace than the raw data.

EXAMPLE 1: 2emember our data set of miles traveled from home to be at 8($ &e will find the average distance traveled. 9:;< :== ;<4 3:< 4;9 3>3 =33 :;; :4=? 3:9

x=

x = 9:;< + :== + ;<4 + ... + 3:9 = ;43. = ;43..


n 9? 9?

EXAMPLE 2: 0n a "articular math class of thirty math students, they too/ an e+am over Cha"ter 3. @ind the mean e+am score for the scores below: 9< >; ; => 43 34 <> ;> <. << =; ;> << <; 3. 4. <= ;> ;4 <? :< ;> ;> 4. =? 4: ;> <= ;< =?

Finding the mean for grouped data !in a fre%uency tableAdistribution#.


@ind the class mid"oint for each class. Multi"ly each class mid"oint by the fre%uency for its class. Sum u" those "roducts and divide that sum by the total fre%uency !n#.

x=

f X
n

EXAMPLE 3: @ind the mean of the grou"ed data in a fre%uency distribution below Class Limits 9 B.?? .?9B 9??? 9??9 , 9.?? 9.?9 , :??? :??9 B :.?? 'otal @re%uency > : 9 ? 9 9? Class Mid"oint :.?.. ;.?.. 9:.?.. 9;.?.. ::.?..

Xm

9.?3 9.?9 9:.?.. ? ::.?.. >.?.

Math 121 Chapter 3

Page 2

x=

f X
n

6505 = 650.5 10

'he median:
o o o o 'he median is the halfway "oint in the data set. 'he median is the mid"oint of the data array. &e must arrange the data values in order !data array#. 'hen the median will be the middle value for a data set with an odd number of data values or the average of the middle two data values for a data set with an even number of data values.

EXAMPLE 4: @ind the median for the data set below. Raw Data: 9:;< :== 3:< 4;9 =33 :;; 3:9 Data Array: :;; :== 3:9 3:< 3>3 4;9

;<4 3>3 :4=?

;<4

=33

9:;<

:4=?

Median: !even number of values, average of middle two# Median C !3>3 D 4;9#A: C =34A: C 49; EXAMPLE 5: 0n a "articular math class of thirty math students, they too/ an e+am over Cha"ter 3. 'he e+am scores are as follows: 9< >; ; => 43 34 <> ;> <. << =; ;> << <; 3. 4. <= ;> ;4 <? :< ;> ;> 4. =? 4: ;> <= ;< =?

@ind the median test score for the class.

'he mode :
o o o Ealue that occurs the most often. (nimodal , one value occurs more than any other value. -imodal , : values occur with the same fre%uency but more often than the rest.

Math 121 Chapter 3

Page 3

o o

Multimodal , more than : values occur with the same fre%uency and more often than other data values. 7o Mode , no data value occurs more than once. :== 4;9 :;; ;<4 3>3 :4=?

EXAMPLE 6: @ind the mode of the data set. 9:;< 3:< =33 3:9

EXAMPLE 7: 0n a "articular math class of thirty math students, they too/ an e+am over Cha"ter 3. 'he e+am scores are as follows: 9< >; ; => 43 34 <> ;> <. << =; ;> << <; 3. 4. <= ;> ;4 <? :< ;> ;> 4. =? 4: ;> <= ;< =?

@ind the mode for the Math e+am over Cha"ter 3.

'he midrange:
o o 2ough estimate of the middle Add the lowest and highest value sin the data set and divide by :.

MR=
o

lowest value + highest value 2

Affected by e+treme valuesFFF !Gence, really rough estimateF#

EXAMPLE 8: @ind the midrange of the data set for miles traveled from home to college. 9:;< 3:< =33 3:9 :== 4;9 :;; ;<4 3>3 :4=?

Midrange=

lowest value + highest value 277 + 2480 = = 1378.5 2 2

Math 121 Chapter 3

Page 4

Weighted Average:
o o o

x=

w9x9 + w : x: + ... + w n xn = w9 + w : + ... + w n

'hin/ about weighted means in a tableHsort of li/e calculating a mean for grou"ed data. 'he classic e+am"le of weighted average is calculating 81AF

( w x ) w

EXAMPLE 9: Iou too/ . classes: a 3 credit stats class and earned an A, a 4 credit Chemistry class and earned a -, A fitness and nutrition class that was : credits and earned an A, a religion class that was 3 credits and earned a C, and a sociology class that was 3 credits and earned an A. &hat was your semester 81A for that semester$ Cla Stats Chem 1* 2eligion Sociology *otal Credit !"ei#$t% 3 4 : 3 3 15
x = wx / w = 50 / 15 = 3.333

&rade !'% A !4.?# - !3.?# A !4.?# C !:.?# A !4.?#

"ei#$t (al)e 9: 9: = > 9: 5+

EXAMPLE 1+: 'hirty automobiles were tested for fuel efficiency !in miles "er gallon#. 'his fre%uency distribution was obtained. Cla ,o)ndarie -re.)en/y 3 . 9. . : 3? Cla Mid0oint 9? 9. :? :. 3? BB

;.. , 9:.. 9:.. , 9;.. 9;.. , ::.. ::.. , :;.. :;.. , 3:.. *otal a. @ind the mean.

Xm

3!9?#C3? ;. 3?? 9:. >? .<?

b. &hat is the modal class$

EXAMPLE 11: 'he heights of :? highest waterfalls in the world are shown here. @ind the mean, median, mode, and midrange. 3:9: 9<?4 :=?? 9=49 :>:. 9>.? :.4? 9>9: :4<< 9.3> :4:. 93== :3?; 9:9. :9.9 99<= :9:3 99=: :??? 99;?

EXAMPLE 9: A recent survey of a new diet cola re"orted the following "ercentages of "eo"le who li/e the taste. @ind the weighted mean of the "ercentages.

Math 121 Chapter 3

Page 5

AREA 9 : 3 *otal

1 -a2ored !2al)e% 4? 3? .? BB

3)45er 6)r2eyed !wei#$t% 9??? 3??? =?? 4=??

"ei#$t

(al)e

4?!9???#C4???? <???? 4???? 9;????

Skewness
A com"arison of the mean, median, and mode can give us information "ertaining to the A distribution is said to be mirror images of one another.

skewness.

skewed

if it is not symmetric. 'his means the left and right halves of the distribution are not

0f a distribution is mean and median are to the left of the mode.

left skewed (negatively skewed)

then the distribution tails off on the leftBhand side and the

0f a distribution is the mean and median are to the right of the mode.

right skewed (positively skewed)

then the distribution tails off on the rightBhand side and

&hen a distribution is symmetric, the mean, median, and mode are e%ual. Jee" in mind that the mean, median, and mode cannot always be used to determine the sha"e of the distribution.

Math 121 Chapter 3

Page 6

3.3 Measure of Variation


EXAMPLE 1: -elow are scores on %uizzes in : classes. &hich class did better$ Cla 3 > = > 1 9? . = 4 Cla 2 . ; = > ; > . > mode C >,= median C >
x =6.3 x =6.3

mode C > median C >

&e cannot tell by )ust the measures of central tendency. So we must loo/ at the measures of variation. 'he consistency or s"read of the data should be ta/en into account.

Round-Off Rule for Measures of Variation:


raw data.

&e carry one more decimal "lace than is "resent in the

'he

range

of a set of data is the difference between the ma+imum value and minimum value. 'he range is easy to com"uteK however the range does not ta/e into account all values. So, outliers can affect the range.

EXAMPLE 2: @ind the range for the data set: 1279 329 833 321 288 471 277 794 363 2480

2ange C Ma+ , Min C :4=? , :;; C ::?3

'he

standard deviation

of a set of sam"le values is a measure of variation of values about the mean.

Sample standard deviation:

s=

xx n 1

Short-Cut formula for Standard deviation:

s=

n x 2 ( x) n( n 1)

( )

Math 121 Chapter 3

Page 7

o o o o

'he standard deviation is a measure of variation of all values from the mean. 6utliers can dramatically increase the standard deviation. 'he units of the standard deviation, s, are the same as the units of the original data values. 'he value of the standard deviation is "ositive.

EXAMPLE 3: @ind the mean and standard deviation of the given data !in minutes# by hand and then with calculator: a# .?, .?, .?, .?, .? x .? .? .? .? .? *otal
x x

(x x )

b# 4>, .?, .?, .?, .4 x (min) 4> .? .? .? .4 *otal


x x

(x x )

c# ., .?, .?, .?, <. x (min) . .? .? .? <. *otal


x x

(x x )
:?:. ? ? ? :?:. 4?.?

B4. ? ? ? 4.

Population Standard Deviation


'he

( x ) N

variance

of a set of values is a measure of variation e%ual to the s%uare of the standard deviation.

6a40le (arian/e: s 2 Po0)lation (arian/e: 2

EXAMPLE 4: 0n the "receding e+am"le we found that for 4>, .?, .?, .?, .4 !minutes#, the standard deviation was :.= minutes. @ind the variance of that same e+am"le.

Math 121 Chapter 3

Page 8

Range Rule of Thumb:


For Estimating a Value of the Standard Deviation, s: 'o roughly estimate the standard deviation from a collection of /nown sam"le data, use s

range 4

EXAMPLE 5: 0n the "receding e+am"le we found that for 4>, .?, .?, .?, .4 !minutes#, the standard deviation was :.= minutes. (se the range rule of thumb to determine if this standard deviation is a reasonable value.

For Interpreting a Known Value of the Standard Deviation: 0f the standard deviation is /nown, use it to find rough estimates of the minimum and ma+imum usual sam"le values by using the following: Minimum usual value C x , 2 s Ma+imum usual value C x D 2 s

EXAMPLE 6: 1ast results from the 7ational Gealth Survey suggest that the "ulse rates !beats "er minute# for women have a mean of ;>.? and a standard deviation of 9:... @ind the minimum and ma+imum usual "ulse rates. 'hen determine whether a "ulse rate of 993 would be considered unusual.

Empirical (or 68-95-99.7) Rule for Data with a Bell-Shaped Distriubtion


'his rule states that for data sets having a distribution that is approximately bell-shaped , the following "ro"erties a""ly. About >=L of all values fall within 9 standard deviation of the mean. About <.L of all values fall within : standard deviation of the mean. About <<.;L of all values fall within 3 standard deviation of the mean.

Math 121 Chapter 3

Page 9

EXAMPLE 7: 0M scores have a bellBsha"ed distribution with a mean of 9?? and variance of ::.. &hat "ercentage of 0M scores are between ;? and 93?$

Chebyshevs Theorem
'he "ro"ortion !or fraction# of any set of data lying within K standard deviations of the mean is always at least

1 , where K is any "ositive number greater than 9. @or J C : and J C 3, we get the following: K2

At least N !or ;.L# of all values lie within : standard deviations of the mean. At least

8 !or =<L# of all values lie within 3 standard deviations of the mean. 9

EXAMPLE 9: 6n a "articular e+am, the average score has been >. with a standard deviation of .. According to Chebyshev5s 'heorem, find the "ercentage of students having a score between 4? and <?.

'he for a set of nonnegative sam"le or "o"ulation data, e+"ressed as a "ercent, describes the standard deviation relative to the mean, and is given by the following:

coefficient of variation (or CVar)


Sample:

CVar =

Population:

s 100% x CVar = 100%

Note:
Geight:

&e cannot com"are the standard deviations between data sets of the different units

EXAMPLE 1+: (sing the sam"le height and weight data for the 4? males we find the statistics below:

;?.= >=.3

>>.: >..>

;9.; >3.?

>=.; >=.3

>;.> ;3.9

><.: >;.>

>>.. >=.?

>;.: ;9.?

Math 121 Chapter 3

Page 10

>9.3 >=.3 >=.? &eight: 9><.9 9;..: 9;>.; :?4.> 9;3.3

;>.: ><.4 >=.;

>>.3 ><.: ;?.3

><.; >=.? >3.;

>..4 ;9.< ;9.9

;?.? >>.9 >..>

>:.< ;:.4 >=.3

>=.. ;3.? >>.3

944.: 93<.? ::?.> 9<3.= :94..

9;<.3 9.>.3 9>>.9 9;:.< 93;.9

9;..= 9=>.> 93;.4 9>9.< 99<..

9.:.> 9<9.9 9>4.: 9;4.= 9=<.9

9>>.= 9.9.3 9>:.4 9><.= 9>4.;

93..? :?<.4 9.9.= :93.3 9;?.9

:?9.. :3;.9 944.9 9<=.? 9.9.?

8ei#$t "ei#$t

Mean7 x >=.34 in 9;:... lb

6tandard De2iation7 3.?: in :>.33 lb

-ecause we can5t com"are standard deviations of different units, we need to find the coefficient of variance. 'he weight is more varied than the height with a CEar of 9..3 com"ared to a CEar of 4.4 for the height of the 4? men.

3.4 Measures of Relative Standing and Boxplots


&hen we tal/ about relative standing within a data set, we are tal/ing about the location of a data value in com"arison with the other data values. Iou are "robably familiar with "ercentiles as your results on standardized e+ams !SA', etc.# and childhood growth !height, weight, etc.# are most often given as a "ercentile. 0n addition to "ercentiles, we will discuss %uartiles and zB scores.

Z-Score
A is sim"ly a standardized score that is found by converting a value to a standardized scale. 'his means we are really determining how many standard deviations that a "articular data value, +, falls from the mean.

z-score

z=

value mean xx = standa d deviation s


&e round a zBscore to two decimal "laces !that matches the zBchart we will use

Rounding Rule for Z-scores:


later#.

EXAMPLE 1: 2eferring bac/ to *OAM1L* 9? in section 3.3, we would li/e to /now if it more e+treme for a man to be ; feet tall, . feet tall, or weight 3?? "ounds. 0n order to do this we cannot ma/e a direct com"arison as these are two different ty"es of measures. Gowever, if we standardize each value then we can ma/e a com"arison. 2ecall the following summary data for each set: Mean7 x >=.34 in 6tandard De2iation7 3.?: in

8ei#$t

Math 121 Chapter 3

Page 11

"ei#$t

9;:... lb

:>.33 lb

7ow we will calculate the zBscore for each: Geight of =4 :

z=

x x 84 68.34 = = 5.19 s 3.02 x x 60 68.34 = = 2.76 s 3.02 z= x x 300 175.55 = = 4.73 s 26.33

Geight of >? :

z=

&eight of 3?? lbs:

&hen we loo/ at the zBscores we can say that a height of ; feet !=4 # is the most e+treme because it falls ..9< standard deviations above !D# the mean. A height of . feet !>? # is :.;> standard deviations below !B# the mean and a weight of 3?? "ounds is 4.;3 standard deviations above !D# the mean.

Usual and Unusual Values:


(sual values have zBscores that are between B: and :, inclusive. (nusual values fall outside of that range , zBscores below B: or above :.

NOTE:

A negative zBscore means the data value is less than the mean !below the mean# and a "ositive zBscore means the data values e+ceeds the mean !above the mean#. EXAMPLE 2: 2eferring bac/ to *OAM1L* 9, is a height of ;: usual or unusual$

z=

x x 72 68.34 = = 1.21 s 3.02

Since this value falls between B: and : we say it is a usual or ty"ical value. !'he %uestion is as/ing is it unusual to find a man that is > feet tall and we /now from common sense that this is 76' unusual so it should be usual.#

Percentiles
A "ercentile is a measure of location which divides the data set into 9?? grou"s with 9L of the values in each grou". &e th denote "ercentiles as P 1, P 2, P 3 ,..., P 99 . 0f we tal/ about P 35 , then we mean the 3. "ercentile and this is the means that
th about 3.L of the data values will li/e below this value. 0f we tal/ about P 50 , then we mean the .? "ercentile and this means that .?L of the data values lie below this value , the median. Additionally, )ust as when finding the median, when we find "ercentiles we must order the data set first.

'here are two ways that we want to tal/ about "ercentiles: 9. &e may want to /now what "ercentile corres"onds to a /nown data valueK or :. &e may want to /now what data value corres"onds to a "articular "ercentile.

To find the percentile corres"onding to a /nown data value we com"ute


$e %entile o" value # & num!e o" values less than # 100 !2ound to the nearest whole number# total num!e o" data values

Math 121 Chapter 3

Page 12

EXAMPLE 3: 0n a "articular math class of thirty math students, they too/ an e+am over Cha"ter 3. 'he e+am scores are as follows: 9< >; ; => 43 34 <> ;> <. << =; ;> << <; 3. 4. <= ;> ;4 <? :< ;> ;> 4. =? 4: ;> <= ;< =?

&hat "ercentile corres"onds to an e+am score of ;3$

&hat "ercentile corres"onds to an e+am score of 4.$

To find the data value corres"onding to a stated "ercentile we must define a few notations:
n & total num!e o" values in the data set k & $e %entile !eing used L & lo%ation o" the desi ed data value in the '()*(*) data set Pk & k th $e %entile
@irst, we calculate the value of the location or "osition, L, as follows:

L=

k n 100

7ow, there are two o"tions for L: 9. 0t is 76' a whole number and then we round 9P to the ne+t whole number and locate the data value that occu"ies that "ositionK :R :. 0t is a whole number and then we find the average of the Lth and !L D 9#st numbers in the ordered data set. EXAMPLE 4: 0n a "articular math class of thirty math students, they too/ an e+am over Cha"ter 3. 'he e+am scores are as follows: 9< >; ; => 43 34 <> ;> <. << =; ;> << <; 3. 4. <= ;> ;4 <? :< ;> ;> 4. =? 4: ;> <= ;< =?

@ind the :.th "ercentile for the Cha"ter 3 e+am scores.

Math 121 Chapter 3

Page 13

@ind the ;.th "ercentile for the Cha"ter 3 e+am scores.

Quartiles
2ecall that there are << "ercentiles that divide the data set u" into 9?? grou"s. 'here are 3

quartiles that divide the

data set u" into 4 grou"s BB Q1 , Q2 , and Q3 . Muartiles are measures of location, )ust as "ercentiles, but each grou" contains about :.L of the data. Gow can we connect %uartiles bac/ to "ercentiles$

Q1 = P25 Q2 = P50 = median Q3 = P75


So, when we find %uartiles, really we aren5t finding anything new )ust thin/ing bac/ to the corres"onding "ercentile. EXAMPLE 5: 0n a "articular math class of thirty math students, they too/ an e+am over Cha"ter 3. 'he e+am scores are as follows: 9< >; ; => 43 34 <> ;> <. << =; ;> << <; 3. 4. <= ;> ;4 <? :< ;> ;> 4. =? 4: ;> <= ;< =?

@ind the %uartiles for the data set.

'he interquartile

range (IQR) is the difference between the u""er ! Q3 # and lower ! Q1 # %uartiles. range is the 0M2 divided by :.
Q3 Q1 2

A""ro+imately half the data values fall within the inter%uartile range.

IQR = Q3 Q1

'he semi-interquartile

+emi,inte -ua tile &

'he midquartile is the sum of the u""er and lower %uartiles divide by :. Math 121 Chapter 3 Page 14

mid-ua tile &

Q3 + Q1 2

&e use a diagram called a boxplot visually dis"lay the e+treme values !minimum and ma+imum#, the %uartiles 'he 5
!lower, median, u""er# and the 0M2 over a number line. &e draw a bo+ that shows the 0M2 with a line through the bo+ at the median. 'hen we draw in whis/ers that e+tend from the bo+ out to the e+treme values.

number summary
!minimum,

is )ust an ordered listing of the im"ortant values that are used in the bo+ "lot in

"arentheses:

Q1 , Q2 , Q3 , ma+imum#

EXAMPLE 6: Consider the data set and find the minimum value, the lower %uartile, the median, the u""er %uartiles, and the ma+imum value. @ind the inter%uartile range. Minimum: PPPPPPPPPPPPPPP ? 3 4 . ; 9 3 4 . ; 9 3 4 > ; : 3 . > = : 4 . > = : 4 . > <

Lower Muartile: PPPPPPPPPPP Median: PPPPPPPPPPPPPPPP (""er Muartile: PPPPPPPPPP Ma+imum: PPPPPPPPPPPPPP 0M2: PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP

EXAMPLE 7: (se the values we found in *OAM1L* > and construct a bo+"lot for the data set.

EXAMPLE 8: @ind the . number summary and construct a bo+ "lot for the Math e+am scores.

Math 121 Chapter 3

Page 15

Outliers and Modified Boxplots


An outlier is a value that that falls far from what would be considered normal data values. &e will define
outliers in terms of the inter%uartile range !0M2#. A data value will %ualify as an outlier if either of the following conditions are met:

1. 'he data value is larger than Q3 + 1.5 IQR . o 2. 'he data value is smaller than Q1 1.5 IQR .

A modified

is constructed in a manner that is much the same as a regular or s/eletal bo+ "lot but with the following modifications: 9. An asteris/ is used to identify all outlier data values. :. 'he whis/ers are only e+tend as far as the ma+imum andAor minimum data value!s# that are not considered outliers.

boxplot

EXAMPLE 9: Construct a modified bo+"lot for the data below: ? : 4 4 4 . . . . . <

Math 121 Chapter 3

Page 16

Das könnte Ihnen auch gefallen