Sie sind auf Seite 1von 10

Objectives

o Learn how to calculate measures of central tendency--mean, median, and


mode

o Know what each measure of central tendency says about a data set

Lesson

To characterize or describe a data set, we must learn the meaning and purpose of
several different types of statistical values. Two important statistics are measures of
central tendency and dispersion. As the name indicates, a measure of central
tendency attempts to describe the "center" of a data set--this center might be the
most common value, the value that lies in the middle of the range of values in the
data set, or some average of the values in the data set. (You've probably heard of
and used averages before; we will here delve into averages and similar measures in
greater detail.) This lesson is devoted to measures of central tendency; later, we will
also consider dispersion, which is a measure of the "spread" of data around some
center, and asymmetry (skewness), which measures how data is "skewed" to either
side of the center.

A data set such as that shown in the following histogram displays a fairly obvious
center: the center bar. If you are familiar with averages (means), you can probably
already point to the average of the data, which is the central (and tallest) bar in the
graph (assuming that the data values to which the bars correspond are evenly
distributed, as would be the case in a histogram)
What if the data isn't symmetrically distributed, though? Consider the data set
below.
In this data set, the tallest peak is not at the center. If you calculated the average of
this data set, you'd also find that the average doesn't correspond with the tallest
peak. As a result, we must not only do our math carefully, we must also carefully
select what kind of math we do so that we accurately represent the data. Having
shown why an average is not always the best statistic to use when characterizing a
data set, we can now turn to the definition and use of (this and) other measures of
central tendency.

Mean (Average)

A mean (average) is perhaps the most well known measure of central tendency. In
baseball, fans might talk about a pitcher's earned run average (ERA); students in a
class might be interested in their grade point average (GPA). The average (also
called the arithmetic mean--this is the typical sense when just the word mean is
used) of a data set is the sum of all the data values divided by the total number of
values in the set. Algebraically, a data set {x1, x2, x3,.,xN} has a mean μ defined as
follows:

(Note that we use the Greek character μ, indicating that this is a population mean;
the same formula applies when calculating the sample mean--you might see the
sample mean expressed using in this case, for instance. The bar notation simply
indicates a mean.) More generally, if we have a set of values {x1, x2, x3,., xN} with
associated frequencies {f1, f2, f3,., fN} (recall how we defined a frequency in the
previous lesson-here, we are simply saying that the data value xi occurs fi times in
the data set), then we can define the mean μ as follows:

The numerator of this expression simply says that the sum consists of each value
multiplied by the number of times it occurs in the data set. The denominator is simply
the total number of data values in the set (each value may occur more than once, so
the denominator does not equal N).

A mean is best suited to cases where the data are symmetrically distributed, as with
the first bar graph shown above. If the data is skewed, as with the second bar graph
above, the mean is not as helpful. Consider the data tables below; the table on the
left is a symmetrical distribution, like the first bar graph, and the table on the right is a
skewed distribution, like the second bar graph. (You may want to try graphing these
distributions to get a sense of how the tables and graphs relate.)

Data
Frequency
Value
1 1
2 2
3 4
Table 1
4 8
5 16
6 8
7 4
8 2
9 1
Data
Frequency
Value
1 1
2 7
3 20
4 15
12

Interested in
learning
5
more? Why
not take an
online class
in
Statistics?
6 9
7 6
8 3
9 1

Table 2

Using the mean formula for data with associated frequencies, we calculate the mean
of the data in Table 1 as 5. The mean for the data in Table 2 is 4.38. Obviously, the
mean in the case of Table 1 does a good job of describing the data: the data value 5
is the most frequent value, and the other values show progressively lower
frequencies. Thus, the mean shows the central tendency of the data set in this case.
In the case of Table 2, the mean doesn't do such a good job: the most frequent value
is 3, but the mean is between two less frequent values (4 and 5). As such, we must
consider other measures of central tendency for non-symmetric data sets.

Practice Problem: Calculate the mean of the following data set:

{1, 2, 3, 4, 5, 7, 10, 15, 21, 22, 23, 24, 25, 26}

Solution: Simply use the formula for the mean μ as given above. The result is the
same regardless of whether the data corresponds to a population or a sample. Note
that this data set contains 14 data values.

Thus, the mean of the data set is about 13.4.

Mode
The mode is a measure of central tendency that corresponds to the most frequent
data value. Referring once more to the example data tables above, the mode of the
data in Table 1 is 5, and the mode of the data in Table 2 is 3. The mode always
selects the "peak" of the frequency graph. In some cases, however, a data set may
have more than one value that is the mode; this situation occurs when two or more
values both have the same frequency and have the greatest frequency of any value
in the set.

Practice Problem: What is the mode of the following data set?

{8, 1, 2, 0, 3, 6, 2, 8, 4, 5, 6, 1, 8, 6, 3, 9, 0, 9}

Solution: The mode is the data value (or values) that occurs most frequently. One
way to find the mode is to draw a graph of the data (such as a histogram) and find
the highest point on the graph. Alternatively, we can order the data set and look to
see which value is the mode.

{0, 0, 1, 1, 2, 2, 3, 3, 4, 5, 6, 6, 6, 8, 8, 8, 9, 9}

By inspection, we can see that both 6 and 8 correspond to the mode of the data set.
Note that if each value in a data set occurs the same number of times, the mode is
not helpful.
Median

Another measure of central tendency is the median. The median is the value that
corresponds to the middle of an ordered set of data; that is to say, exactly half the
data values in a set are below the median and exactly half are above the median.
The easiest (conceptually, anyhow) method of calculating the median of a data set is
to write the data in ascending order, then find the middle value. If the data set has an
odd number of values, the median is a clear single value; if the data set has an even
number of values, there is no single middle value. Instead, in this latter case, the
median can be defined as the mean of the two middle values. Thus, given an
ordered data set {x1, x2, x3,., xN} with N members, we can write the median M
algebraically as

The median is a useful measure of central tendency in cases where a few data
values at one extreme or another have a disproportionate effect on the mean.
Consider the data set below, which might correspond to the incomes (in thousands
of dollars) of a certain group of people.

{24; 42; 64; 38; 49; 30; 34; 29; 2,350; 1,932, 61, 52, 51, 19, 28}
This set has 15 data values, so we do not need to calculate a mean of two middle
values. To find the median, let's first rewrite the data set in ascending order. Next,
we'll identify the middle value: this is the eighth data value, since there are seven
values above it and seven values below it. The median is underlined in the ordered
set below.

{19; 24; 28; 29; 30; 34; 38; 42; 49; 51; 52; 61; 64; 1,932; 2,350}

Let's now compare this result, 42, with the mean. Using the formula given above, we
calculate the mean of this data set as approximately 320. Note carefully that the
mean in this case is well above the incomes of the majority of the people from whom
these data were taken--only 2 people in the group make at least the mean income,
whereas 13 people (the vast majority) make far less than the mean income. The
median income, however, does a much better job of expressing the central tendency
of the data. If we were to ignore the two individuals with extremely high incomes, we
would find the mean income of the remaining individuals to be about 40, which is
close to the median income.

A slightly more difficult problem arises when the data values have associated
frequencies; in such cases, writing a list of values may be quite difficult, since the
number of values can be large. Nevertheless, the median can be identified without
too much difficulty if an ordered list of values and associated frequencies is either
available or is constructed. We know that in an ordered list of N values, the median
is the value that falls in the middle. If the ordered list has associated frequencies,
then the median is the value for which the cumulative frequency is N/2 (for even N)
or (N + 1)/2 (for odd N). Of course, the index (N/2, for instance) of the median may
not be equal to the cumulative frequency of a particular value; the index of the
median, however, must be both less than the cumulative frequency of the median as
well as greater than the cumulative frequency of the immediately preceding value.
This concept is best illustrated by example, so consider the following practice
problems.

Practice Problem: Find the median of the data set below.

{102, 403, 729, 843, 920, 360, 842, 941, 357, 483, 207, 670, 471, 109}

Solution: First, order the data. Note that because the set has 14 members, the
median is the mean of two central values. These values are underlined in the
ordered set below.

{102, 109, 207, 357, 360, 403, 471, 483, 670, 729, 842, 843, 920, 941}

Now, calculate the median M by finding the mean of 471 and 483.

The median of this data set is thus 477.

Das könnte Ihnen auch gefallen