Sie sind auf Seite 1von 5

http://www.stats.gla.ac.uk/steps/glossary/alphabet.

ht
ml
Central Tendency

The term central tendency refers to the "middle" value or perhaps a typical
value of the data, and is measured using the mean, median, or mode. Each of
these measures is calculated differently, and the one that is best to use depends
upon the situation.

Mean

The mean is the most commonly-used measure of central tendency. When we


talk about an "average", we usually are referring to the mean. The mean is
simply the sum of the values divided by the total number of items in the set. The
result is referred to as the arithmetic mean. Sometimes it is useful to give more
weighting to certain data points, in which case the result is called the weighted
arithmetic mean.

The notation used to express the mean depends on whether we are talking about
the population mean or the sample mean:

= population mean

= sample mean

The population mean then is defined as:

where

= number of data points in the population

= value of each data point i.

The mean is valid only for interval data or ratio data. Since it uses the values of
all of the data points in the population or sample, the mean is influenced by
outliers that may be at the extremes of the data set.
Median

The median is determined by sorting the data set from lowest to highest values
and taking the data point in the middle of the sequence. There is an equal
number of points above and below the median. For example, in the data set
{1,2,3,4,5} the median is 3; there are two data points greater than this value and
two data points less than this value. In this case, the median is equal to the
mean. But consider the data set {1,2,3,4,10}. In this dataset, the median still is
three, but the mean is equal to 4. If there is an even number of data points in the
set, then there is no single point at the middle and the median is calculated by
taking the mean of the two middle points.

The median can be determined for ordinal data as well as interval and ratio data.
Unlike the mean, the median is not influenced by outliers at the extremes of the
data set. For this reason, the median often is used when there are a few extreme
values that could greatly influence the mean and distort what might be
considered typical. This often is the case with home prices and with income data
for a group of people, which often is very skewed. For such data, the median
often is reported instead of the mean. For example, in a group of people, if the
salary of one person is 10 times the mean, the mean salary of the group will be
higher because of the unusually large salary. In this case, the median may better
represent the typical salary level of the group.

Mode

The mode is the most frequently occurring value in the data set. For example, in
the data set {1,2,3,4,4}, the mode is equal to 4. A data set can have more than a
single mode, in which case it is multimodal. In the data set {1,1,2,3,3} there are
two modes: 1 and 3.

The mode can be very useful for dealing with categorical data. For example, if a
sandwich shop sells 10 different types of sandwiches, the mode would represent
the most popular sandwich. The mode also can be used with ordinal, interval,
and ratio data. However, in interval and ratio scales, the data may be spread
thinly with no data points having the same value. In such cases, the mode may
not exist or may not be very meaningful.

When to use Mean, Median, and Mode

The following table summarizes the appropriate methods of determining the


middle or typical value of a data set based on the measurement scale of the
data.

Measurement Scale Best Measure of the "Middle"


Nominal
Mode
(Categorical)

Ordinal Median

Symmetrical data: Mean


Interval
Skewed data: Median

Symmetrical data: Mean


Ratio
Skewed data: Median

Standard Deviation and Variance

A commonly used measure of dispersion is the standard deviation, which is


simply the square root of the variance. The variance of a data set is calculated
by taking the arithmetic mean of the squared differences between each value
and the mean value. Squaring the difference has at least three advantages:

1. Squaring makes each term positive so that values above the mean do not
cancel values below the mean.
2. Squaring adds more weighting to the larger differences, and in many
cases this extra weighting is appropriate since points further from the
mean may be more significant.
3. The mathematics are relatively manageable when using this measure in
subsequent statisitical calculations.

Because the differences are squared, the units of variance are not the same as
the units of the data. Therefore, the standard deviation is reported as the square
root of the variance and the units then correspond to those of the data set.

The calculation and notation of the variance and standard deviation depends on
whether we are considering the entire population or a sample set. Following the
general convention of using Greek characters to express population parameters
and Arabic characters to express sample statistics, the notation for standard
deviation and variance is as follows:

= population standard deviation

= population variance

s = estimate of population standard deviation based on sampled data


s2 = estimate of population variance based on sampled data

The population variance is defined as:

The population standard deviation is the square root of this value.

The variance of a sampled subset of observations is calculated in a similar


manner, using the appropriate notation for sample mean and number of
observations. However, while the sample mean is an unbiased estimator of the
population mean, the same is not true for the sample variance if it is calculated in
the same manner as the population variance. If one took all possible samples of
n members and calculated the sample variance of each combination using n in
the denominator and averaged the results, the value would not be equal to the
true value of the population variance; that is, it would be biased. This bias can be
corrected by using ( n - 1 ) in the denominator instead of just n, in which case the
sample variance becomes an unbiased estimator of the population variance.

This corrected sample variance is defined as:

s2 =

The sample standard deviation is the square root of this value.

Standard deviation and variance are commonly used measures of dispersion.


Additional measures include the range and average deviation.

Dispersion

Without knowing something about how data is dispersed, measures of central


tendency may be misleading. For example, a residential street with 20 homes on
it having a mean value of $200,000 with little variation from the mean would be
very different from a street with the same mean home value but with 3 homes
having a value of $1 million and the other 17 clustered around $60,000.
Measures of dispersion provide a more complete picture. Dispersion measures
include the range, average deviation, variance, and standard deviation.
Range

The simplest measure of dispersion is the range. The range is calculated by


simply taking the difference between the maximum and minimum values in the
data set. However, the range only provides information about the maximum and
minimum values and does not say anything about the values in between.

Average Deviation

Another method is to calculate the average difference between each data point
and the mean value, and divide by the number of points to calcuate the average
deviation (mean deviation). However, performing this calcuation will result in an
average deviation of zero since the values above the mean will cancel the values
below the mean. If this method is used, the absolute value of the difference is
taken so that only positive values are obtained, and the result sometimes is
called the mean absolute deviation. The average deviation is not very difficult to
calculate, and it is intuitively appealing. However, the mathematics are very
complex when using it in subsequent statistical analysis. Because of this
complexity, the average deviation is not a very commonly used measure of
dispersion.

Variance and Standard Deviation

A better way to measure dispersion is to square the differences before averaging


them. This measure of dispersion is known as the variance, and the square root
of the variance is known as the standard deviation. The standard deviation and
variance are widely used measures of dispersion.

Das könnte Ihnen auch gefallen