Sie sind auf Seite 1von 19

NUMERICAL SUMMARY STATISTICS

NUMERICAL SUMMARY MEASURES


• Numerical summary measures describe the major properties of a data
set, namely its:
1. central tendency (or location),
2. variability (or dispersion, spread),
3. shape.
1. MEASURES OF CENTRAL TENDENCY
– They locate the ‘centrality’ of the data set.

• Arithmetic mean (variable X)


For the population:  (mu) For a sample: x-bar
N n

x i
Add up the values x i
μ i 1
x i 1
N Divide by the number n
of values
Population size Sample size
2
Properties of the mean:

+ Each quantitative data set has one and only one mean;
+ It is the most comprehensive measure of central location
(i.e. it is computed from all available data values);
+ The (sample) mean is used extensively in inferential statistics;
– It can be distorted by outliers (or extreme values).

Uncharacteristically small or large values.

• Median: The middle value of an ordered array.

50% 50%

smallest largest
Median

3
How to find the median ‘manually’?
i. Sort the data from smallest to largest.
ii. Choose the middle value if n (N) is odd,
or take the average of the two middle values if n (N) is even.

Properties of the median:

+ Each quantitative data set has one and only one median;
+ It is unaffected by outliers;
 It is computed from at most two data points;
– It has limited application and mathematical potential.

• Mode: The most frequently occurring value of a data set.

Properties of the mode:

+ It can be used to describe both quantitative and qualitative data;


+ It is unaffected by outliers;
– It might not be unique or useful;
– It has limited application and mathematical potential.
4
2. MEASURES OF VARIABILITY (DISPERSION)
– How much is the data spread out around its centre?

• Range: largest – smallest

Properties of the range:

+ Each quantitative data set has one and only one range;
 It is computed from only two data points;
– It is affected by outliers;
– It has limited application and mathematical potential.

• Variance: ‘average’ of the squared deviations from the mean.

For the population:  2 (sigma) For a sample: s 2


N n

 i  i
Sum of squared
( x  μ ) 2
deviations ( x  x ) 2

σ2  i 1
s2  i 1

N divided by N, n-1 n 1
5
Properties of the variance:

+ Each quantitative data set has one and only one variance;
+ It is a comprehensive measure of dispersion;
– It is affected by outliers;
– It is conceptually complicated;
– It is hard to interpret since it is given in ‘squared’ units of the
observations.

• Standard deviation: ‘average’ deviation from the mean,


the positive square root of the variance.

For the population:  For a sample: s 2

σ  σ2 s  s2

The standard deviation has similar properties than the variance, but
+ It is easier to interpret since it is given in the original units;
+ s is used extensively in inferential statistics.

6
• The range, the variance and the standard deviation are all ‘useless’ for
comparing the dispersions of data sets that are measured in different units
(e.g. kg and cm), or have markedly different magnitudes.

• Coefficient of variation: the standard deviation divided


by the mean.

For the population: For a sample:


σ s
cv  ( 100 %) cv  (100 %)
μ x

Properties of the coefficient of variation:

+ It measures relative variability since it does not depend on the


original unit of measurement;
– It does not exist when the mean is zero, and can be misleading
when some of the values are positive and some others are negative.

7
• Percentile: the p th percentile separates the lower p% of the
observations from the upper (100-p)%.

p% (100-p)%

smallest largest
pth percentile

• Quartiles: the 25th (Q1), 50th (Q2 or median) and 75th (Q3)
percentiles.

Properties of the percentiles:

+ They measure non-central location (unless p=50);


– They are really useful for only large data sets.

8
• Locating Percentiles: the following formula allows us to approximate
the location of any percentile, Lp is the location of the P th percentile:

P
L p  (n  1)
100
• Calculate the 75th percentile of the data 0 0 5 7 8 9 12 14 22 33

75
L75  (10  1)  8.25
100
The 75th percentile is between the 8th and 9th data observations.
i.e. between 14 and 22.
0.25 (22-14) = 2
Therefore the 75th percentile is 14+2 = 16

9
• Inter-quartile range: IQR = Q3 – Q1
i.e. the range of the middle 50% of the data.

IQR (50%)
25% 25%

smallest Q1 Q3 largest
Q2 = Median

Properties of the inter-quartile range:

+ It is unaffected by outliers;
– It has limited application and mathematical potential,
but it is used to identify outliers.

10
3. DESCRIBING THE SHAPE OF A DATA SET

The shape of a distribution is described by its degree of symmetry


(skewness) and its peakedness (Kurtosis).

• There are three ways to determine whether the distribution of a data


set is symmetrical or skewed:

• (1) Plot the data using an histogram or polygon and observe its shape.
The distribution is said to be skewed, i.e. not symmetrical, if the
tails are not of the same length (approximately).

The distribution is skewed to the left The distribution is skewed to the right
(negatively skewed), if the left tail is (positively skewed), if the right tail is
longer than the right tail. longer than the left tail.

An indication of the presence of a An indication of the presence of a


small proportion of relatively small small proportion of relatively large
values. values.
11
A Bell shaped Symmetrical Histogram
– Zero skewness

12
Negatively (or left) skewed

Positively (or right) skewed

13
• (2) Compare the mean and the median.

Three possibilities:
i. mean = median  Distribution is symmetrical
ii. mean < median  Distribution is skewed to the left
iii. mean > median  Distribution is skewed to the right

• (3) Compute the skewness measure using MS Excel.

If the skewness measure is (approximately) zero,


the distribution is symmetrical,
If the skewness measure is negative,
the distribution is skewed to the left,
or positive ( skewed to the right)?

14
Kurtosis
Kurtosis measures the peakedness of a distribution. It can be
computed using MS Excel.

A peaked distribution – Positive Kurtosis

A bell shaped distribution has a kurtosis value of approximately


zero, and a peaked distribution has a positive kurtosis value.
15
Kurtosis

A flat distribution – Negative Kurtosis

16
Ex 4:
We consider the price to earnings ratio and the dividend yield for 20 listed
shares. The data was downloaded from Selvanathan Case 3.1 and summarised
using MS Excel. We get the following results:

P/E ratio Div yield Mean: For the 20 listed shares the
average P/E ratio is 15.3, and the
average dividend yield is 4.4%.
Mean 15.3 Mean 4.4
Standard Error 1.2 Standard Error 0.4 Median: 50% of the shares have P/E
Median 13.9 Median 4.4 ratios less than 13.9 and the other
50% have P/E ratios more than 13.9.
Mode 15.0 Mode 5.5 50% of the shares have dividend
Standard Deviation 5.4 Standard Deviation 1.8 yields less than 4.4 and the other 50%
Sample Variance 29.0 Sample Variance 3.2 have dividend yields more than 4.4.
Kurtosis 3.0 Kurtosis 0.4
Skewness: The mean P/E ratio is
Skewness 1.8 Skewness -0.3
larger than the median, and so its
Range 21.1 Range 7.4 distribution is positively skewed.
Minimum 8.8 Minimum 0.3 Note, the skewness figure is positive
Maximum 29.9 Maximum 7.7 1.8. The mean dividend yield is the
same as the median, and so its
Sum 306.1 Sum 88.8
distribution is symmetrical. Note, the
Count 20 Count 20 skewness figure is very close to
zero.
17
Ex 4 Range: The range of P/E ratios is 21.1 Standard deviation: The
and the range of dividend yields is 7.4 average deviation of P/E
Continued:
ratios from the mean is
measured as 5.4, and that of
P/E ratio Div yield dividend yield as 1.8.

Mean 15.3 Mean 4.4 Coefficient of variation: The


average deviation of P/E ratios
Standard Error 1.2 Standard Error 0.4 from its mean is 35.29% of the
Median 13.9 Median 4.4 mean P/E ratio, and the average
Mode 15.0 Mode 5.5 deviation of dividend yields from
Standard Deviation 5.4 Standard Deviation 1.8 its mean is 40.91% of the mean
dividend yield.
Sample Variance 29.0 Sample Variance 3.2
Though the standard deviation
Kurtosis 3.0 Kurtosis 0.4
and the range shows that P/E
Skewness 1.8 Skewness -0.3 ratio has a greater average
Range 21.1 Range 7.4 deviation from the mean than
Minimum 8.8 Minimum 0.3 the dividend yield, the cv shows
the opposite for deviations
Maximum 29.9 Maximum 7.7
relative to the mean.
Sum 306.1 Sum 88.8
Count 20 Count 20 Kurtosis: The distribution of P/E
ratios is peaked (Kurtosis = 3.0)
5.4 1.8
cv   0.3529 cv   0.4091 while that of the dividend yields is
15.3 4.4 almost symmetrical (Kurtosis = 0.4).

18
Identifying extreme values (outliers)

Any data point smaller than Q1–1.5×IQR or greater than Q3+1.5×IQR


can be considered an unusually small or large value, i.e. an outlier or
extreme value.
The (Q1–1.5×IQR ; Q3+1.5×IQR) interval for the P/E ratio is
calculated as follows using MS Excel:

Q 1  11.975 Q 3  15
IQR  15  11.975  3.025

11.975  1.5  3.025  7.437


15  1.5  3.025  19.5375
i.e. any data point outside this interval can be considered an outlier.
19

Das könnte Ihnen auch gefallen