Sie sind auf Seite 1von 12

Section 3.

3-1

CHAPTER 3

Descriptive Measures
GENERAL
OBJECTIVE

LESSON
OUTLINE

In the last chapter, you learned how to summarize and organize data using
tables, graphs, and charts. Descriptive measures are numbers calculated from
the data that describe certain characteristics of the data. This chapter will
focus on calculating some common descriptive measures. You should be
familiar with Chapter 3 of your textbook before beginning this chapter.
3.1
3.2
3.3
3.4
3.5

Measures of Center
Measures of Variation
The Five-Number Summary; Boxplots
Descriptive Measures for Populations; Use of Samples
Problems

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

3-2

Descriptive Measures

3.1 Measures of Center


Descriptive measures that indicate where the center or most typical value of a
data set lies are called measures of central tendency or, more simply,
measures of center. Measures of center are often referred to as averages. In
this section, we will calculate the most common measures of central tendency,
the mean, the median, and the mode.
The mean (or average) of a data set is defined as the sum of the observations
divided by the number of observations. The median of a data set is the
number that divides the bottom half of the data set from the top half of the
data set. The mode is the value that occurs most frequently in the data set.

Finding the Mean, Median, and Mode


Examples
3.1 3.3

Weekly Salaries: Professor Hassett spent one summer working for a small
mathematical consulting firm. The firm employed a few senior consultants,
who made between $800 and $1050 per week; a few junior consultants, who
made between $400 and $450 per week; and several clerical workers, who
made $300 per week.
Because the first half of the summer was busier than the second half, more
employees were required during the first half. Table 3 1 displays typical
lists of weekly earnings for the two halves of the summer.

Table 3 1

Data Set I

300
300

300
400

300
450

940
800

300
450

Data Set II

300
400

300
300

940
300

450
1050

450
300

300
1050

400

Determine the mean, median, and mode for both sets of weekly salaries.
Solution

To calculate the mean, median, and mode for this data set we use the now
familiar Frequencies dialog box.
1. Enter the data into a variable named, SET_I.
2. Choose Analyze > Descriptive Statistics > Frequencies to open the
Frequencies dialog box.
3. Paste the variable, SET_I, into the Variable(s) box.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 3.1

3-3

4. Click the Statistics button to open the Frequencies: Statistics dialog


box (Figure 3 1).
Choose the descriptive measures to be included in the analysis by selecting the
checkboxes in front of their name. For this analysis,
5. Choose all the checkboxes for Mean, Median, Mode, and Sum.
The sum is not needed for the current example but will be used in the next
section.
Figure 3 1
Frequencies:
Statistics
dialog box

6. Click the Continue button to return to the Frequencies dialog box.


7. Click the OK button.
The descriptive measures for the variable, SET_I, will be displayed in the
Viewer window (Figure 3 2).
Figure 3 2
Descriptive
measures for
Data Set I

Statistics
SET_I
N

Valid
Missing

13
0

Mean

483.8462

Median

400.0000

Mode

300.00

Sum

6290.00

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

3-4

Descriptive Measures

Next, we calculate the same descriptive measures for the second half of the
summer. We are naturally interested in knowing if the typical weekly salary
for the first half of summer is less, more, or the same as the salary in the
second half of summer. The mean, median, and mode can give us insight into
this question and tell us more about the two data sets.
1. Enter the data into a variable named, SET_II.
2. Follow the same procedure as above to calculate the mean, median, and
mode for this data set.
The descriptive measures for the variable, SET_II, are displayed in Figure
3 3.
Figure 3 3
Descriptive
measures for
Data Set II

Figures 3 2 and 3 3 give the number of cases in the data set that are Valid
and that are Missing. In order for a case to be Valid (only valid cases area
used in calculations) SPSS requires that all the variables for that case have a
value. There are no missing observations in either data file and all the
observations are valid.
We can see from the descriptive measures in Figures 3 2 and 3 3, that the
mean and median weekly salaries in the first half of summer are both larger
than the second half of summer. We also note that the mode weekly salary
(the most common weekly salary) is $300. This is the salary of the clerical
workers.
Another observation that we can make is that both data sets are right skewed.
We can infer this because in both data sets the mean is larger than the median.
This is because the mean is more strongly affected by the comparatively large
salaries of the senior consultants.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 3.2

3-5

3.2 Measures of Variation


Up to this point, we have only discussed descriptive measures of the center.
We now discuss descriptive measures of variation (sometimes referred to as
measures of spread or measures of dispersion). SPSS uses the term
measures of dispersion. The most common of these measures are range,
standard deviation, and variance.
The range is defined as the difference between the largest value and the
smallest value in the data file. The sample standard deviation measures the
variation of a sample by determining how far, on the average, that
observations are from the sample mean. The square of the sample standard
deviation is called the sample variance.

Finding the Range, Sample Standard Deviation and Sample Variance


Example 3.6

Children of Diabetic Mothers: The paper Correlation Between the


Intrauterine Metabolic Environment and Blood Pressure in Adolescent
Offspring of Diabetic Mothers (The Journal of Pediatrics, Vol.136, Issue 5,
pp.587-592) by Cho et al. presents findings of research on children of diabetic
mothers.
Table 3 2 lists the arterial blood pressures, in millimeters of mercury (mm
Hg), for a sample of 16 children of diabetic mothers. Determine the sample
mean, range, standard deviation, and variance of the arterial blood pressures.

Table 3 2
Arterial
blood
pressures
Solution

81.6
82.0
84.6
69.4

84.1
88.9
104.9
78.9

87.6
86.7
90.8
75.2

82.8
96.4
94.0
91.0

Type the data into a new data file named PRESSURE. The measures of
variation could be calculated by choosing Analyze > Descriptive Statistics >
Frequencies as we did in the previous example. Simply choose the
checkboxes for Mean, Std. deviation, Variance, and Range in the
Frequencies: Statistics dialog box (see Figure 3 1).
Alternatively, the Explore dialog box will calculate several common
measures of center and variation automatically.
1. Choose Analyze > Descriptive Statistics > Explore to open the
Explore dialog box (Figure 3 4).

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

3-6

Descriptive Measures

2. Paste the variable, PRESSURE, into the Dependent List box.


Figure 3 4
Explore
dialog box

3. Click the Statistics button to open the Explore: Statistics dialog box
(Figure 3 5).
Checking the Descriptives button has SPSS calculate the mean, median,
mode, variance, standard deviation, minimum, maximum, range, and a
number of other descriptive measures, some of which will be discussed later
in the text.
4. Choose the checkbox for Descriptives and click the Continue button.
Figure 3 5
Explore:
Statistics
dialog box

5. Click the OK button in the Explore dialog box (Figure 3 4).


The computed descriptive measures will be displayed in the Descriptives
table (Figure 3 6) in the Viewer window.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 3.3

3-7

Figure 3 6
Descriptives
table from
the Explore
procedure

The mean and median (measures of the center) are displayed along with the
variance, standard deviation, and range (measures of variation). In later
chapters, we will discuss some of the other descriptive measures that are
displayed.

3.3 The Five-Number Summary; Boxplots


Percentiles are descriptive measures of location or position within the data.
The percentiles of a data set divide the values in the data set into one hundred
equal parts. The first percentile, P1, has 1% of the data below it and 99%
above it. The median has 50% of the data below it and 50% of the data above
it. This means that another name for the median is the 50th percentile or P50.
Three important percentiles, P25, P50, and P75, are known as the quartiles of a
data set. These three percentiles divide the values in the data file into four
equal parts. P25 is known as the first quartile, Q1, P50 is known as the second
quartile, Q2, and P75 is known as the third quartile, Q3, respectively. A third
name for the median is Q2. The quartiles give us information about the shape
of the distribution of some numbers. For example, symmetric distributions
have the first and third quartiles about the same distance from the median.
While for a right skewed distribution, the first quartile Q3 is further above the
median than the first quartile Q1 is below the median. The Interquartile
Range, the difference between Q3 and Q1, is a measure of variation.
The five-number summary for a data set consists of the minimum,
maximum, and quartiles written in increasing order: minimum, Q1, Q2, Q3,
and maximum. A boxplot is a graph of the five-number summary. The shape
of the distribution can be determined from the boxplot of a set of data.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

3-8

Descriptive Measures

Finding the Five Number Summary


Example
3.17

TV-viewing Times: The A.C. Nielsen Company publishes data on TVviewing habits of Americans by various characteristics in Nielsen Report on
Television. Table 3 3 shows the weekly viewing times, in hours, for a
sample of 20 people. Determine and interpret the five-number summary for
these data.

Table 3 3
Weekly
TV-viewing
times

25
66
34
30

Solution

41
35
26
38

27
31
32
30

32
15
38
20

43
5
16
21

Type the data into a new data file named TIMES. The five-number summary
can be calculated by choosing Analyze > Descriptive Statistics >
Frequencies as before
1. Choose Analyze > Descriptive Statistics > Frequencies to open the
Frequencies dialog box.
2. Paste the variable, TIMES, into the Variable(s) box.
3. Click the Statistics button to open the Frequencies: Statistics dialog
box (Figure 3 1).
4. Choose the checkboxes for Quartiles, Median, Minimum, and
Maximum and then click the Continue button.
5. Click the OK button.
The five-number summary will be displayed in the Viewer window.
(Figure 3 7).

Figure 3 7
Five-number
summary:
Weekly
TV-viewing
times

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 3.3

3-9

Figure 3 7 gives the five-number summary as 5.00, 22.00, 30.50, 37.25, and
66.00. We did not have to choose the checkbox for Median, since the median
and Q2 are the same number. The measure of the center, the median, implies
that half of the TV-viewing times are less than 30.50 and half of the times are
greater. We can further infer from the results that 25% of the TV-viewing
times are between 5.0 hours and 22.0 hours, 25% are between 22.0 hours and
30.5 hours, 25% are between 30.5 hours and 37.25 hours, and 25% are
between 37.5 hours and 66.0 hours. The Interquartile range is found to be
15.25 = 37.25 22.00 hours. Notice that the variation in the fourth quarter,
maximum Q3 = 28.75, is larger than the variation in the first quarter, Q1
minimum = 17.00. Right-skewed data will have the variation in the fourth
quarter larger than the variation in the first quarter. It is possible that this data
set has a distribution that is right-skewed but further analysis is needed. It is
easier to see the shape of the distribution from the boxplot.

Constructing a Boxplot
Example
3.19

TV-viewing Times: Make a boxplot for the TV-viewing Times data in


Example 3.17.

Solution

To make the boxplot,


1. Type the data into a new data file named TIMES.
2. Choose Analyze > Descriptive Statistics > Explore to open the
Explore dialog box (Figure 3 4).
3. Paste the variable, TIMES, into the Dependent List box.
4. Click the Plots button to open the Explore: Plots dialog box
(Figure 3 8).

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

3-10

Descriptive Measures

Figure 3 8
Explore:
Plots dialog
box

The Boxplots section of the Explore: Plots dialog box controls how boxplots
are displayed when there is more than one dependent variable. The bullet for
Factor levels together generates a separate boxplot for each dependent
variable. The bullet for Dependents together generates a separate boxplot for
each group defined by a factor variable.
5. Choose the bullet for Factor levels together and click the Continue
button to return to the Explore dialog box (Figure 3 4).
6. Click the OK button to display the boxplot in the Viewer window
(Figure 3 9).
Figure 3 9
Boxplot:
Weekly
TV-viewing
times

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Section 3.5

3-11

SPSS makes a modified boxplot. The circle with a 6 beside it indicates that
the 6th case, which has TIMES equal to 66, is an outlier. In Example 3.17, we
suspected that the distribution might be right-skewed, but now it is clear that
the data is left-skewed with an outlier. This reminds us that a picture is worth
a thousand words.

3.4 Descriptive Measures for Populations; Use of


Samples
A parameter is a descriptive measure for a population. For example, the
following are parameters:

= population mean
= population standard deviation
2 = population variance
A statistic is a descriptive measure for a sample. For example, the following
are statistics:

x = sample mean
s = sample standard deviation
s2 = sample variance

3.5 Problems
Problem 3.15

Table 3 4
Time to
Hatch
Problem 3.71

Amphibian Embryos: In a study of the effects of radiation on amphibian


embryos titled Shedding Light on Ultraviolet Radiation and Amphibian
Embryos (BioScience, Vol. 53, No. 6, pp. 551561), L. Licht recorded the
time it took for a sample of seven different species of frogs and toads eggs to
hatch. Table 34 shows the times to hatch, in days. Find the mean, median
and mode of the sample.

11

11

Refer to problem 3.15. Determine the range and sample standard deviation.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

3-12

Descriptive Measures

Problem 3.22

Table 3 5
Router
Horsepower

Router Horsepower: In the article Router Roundup (Popular Mechanics,


Vol. 180, No. 12, pp. 104109), T. Klenck reported on tests of seven fixedbase routers for performance, features, and handling. The following table
gives the horsepower for each of the seven routers tested. Find the mean,
median and mode(s) of the sample.

1.75

2.25

2.25

2.25

1.75

2.00

1.50

Problem 3.78

Refer to Problem 3.22, determine the range and sample standard deviation.

Problem 3.16

Hurricanes: A recent article by D. Schaefer et.al. (Journal of Tropical


Ecology, 16, pp. 189-207, 2000) reported on a long-term study of the effects
of hurricanes on tropical streams of Luquillo Experimental Forest in Puerto
Rico. The study shows that Hurricane Hugo had a significant impact on
stream water chemistry. A sample of 10 ammonia fluxes in the first year after
Hugo is given below. Data in Table 3 6 are in kilograms per hectare per
year. Determine the sample mean, median, mode of the sample of ammonia
fluxes in the first year after Hugo.

Table 3 6
Ammonia
Fluxes

96
116

66
57

147
154

147
88

175
154

Problem 3.72

Refer to Problem 3. 16. Use SPSS to determine the standard deviation, and
range of the sample of ammonia fluxes in the first year after Hugo.

Problem 3.123

Hospital Stays: The U.S. National Center for Health Statistics compiles data
on the length of stay by patients in short term hospitals and publishes its
findings in Vital and Health Statistics. A random sample of 21 patients
yielded the data on length of stay, in days given in Table 3 7.

Table 3 7
Length of
Stay

4
3
10

4
6
13

12
15
5

18
7
7

9
3
1

6
55
23

12
1
9

Obtain and interpret the quartiles, determine and interpret the interquartile
range, find and interpret the five-number summary. Then identify potential
outliers, if any, and construct and interpret a boxplot.

Copyright 2012 Pearson Education, Inc. Publishing as Addison-Wesley.

Das könnte Ihnen auch gefallen