Beruflich Dokumente
Kultur Dokumente
LECTURE 2
Descriptive Statistics
Use of numerical information to summarize, simplify, and present masses of data. Data may come from studies of populations (often called a census study) or samples Use of summary measurestypically measures of central tendency and spread Used for both ungrouped and grouped data.
Data properties
Shapes
Represents normality if all points are equally spaced. Positive skew if Q2 is close to Q1 and right whisker is larger than the left one. Negative skew if Q2 is close to Q3 and right whisker is smaller than left one.
IQR
Q1 Q2 Min. value
Q3
Max. value
6 4 2 0 10 20 30 40 50
30 - 40 40 - 50
4 2
Groups
Central Tendency
MEAN average MEDIAN -- middle value MODE -- most frequently observed value(s).
Population Mean
Definition: For ungrouped data, the population mean is the sum of all the population values divided by the total number of population values. To compute the population mean, use the following formula.
Sigma
mu
X N
Population Size
Individual value
Definition: For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values. To compute the sample mean, use the following formula.
X-bar Sigma Individual value
X X n
Sample Size
The mean of a sample of data organized in a frequency distribution is computed by the following formula:
X values X-bar f - class frequency
Xf Xf X n f
Sum of frequencies Sample size
Mean An Example
Groups
0 - 10 10 - 20
f
2 4
X
5 15
fX
10 60
20 - 30
30 - 40 40 - 50
8
4 2
25
35 45
200
140 90 500
fX fX X n f
= 500/20 = 25 mm length
20
Median
The most central value in arranged data. Essentially the median has 50% observations above and below it. In ungrouped data:
For even values the value at n/2th position is median. For odd values, the value at (n+1)/2th position is median.
In grouped data, median can be computed using formula: median = L + h/f (f/2 - C)
Median An example
Groups f C
0 - 10
10 - 20 20 - 30 30 - 40 40 - 50
2
4 8 4 2
2
6 14 18 20
20
Groups 0 - 10 10 - 20
f 2 4 8 4 2
C 2 6 14 18 20
Q1 = 10 + 10/4 (5 2) = 17.5
20 - 30 30 - 40
3 f/4 = 15
Q3 = 30 + 10/4 (15 14) = 32.5
40 - 50
20
Dispersion
RANGE
highest to lowest values STANDARD DEVIATION how closely do values cluster around the mean value SKEWNESS refers to symmetry of curve Kurtosis refers to the peak of the curve
Range
Range: For ungrouped data, the range is the
difference between the highest and lowest values in a set of data. To compute the range, use the following formula.
RANGE = HIGHEST VALUE - LOWEST VALUE
Example : For the given data on excessive
length lowest value is 5 and highest value is 43. hence, Range = 43 5 = 38.
Variance
Population Variance: The population variance for ungrouped data is the arithmetic mean of the squared deviations from the population mean. It is computed from the formula below:
Individual value Population mean
2
Sigma square
( X ) 2 N
Population size
Variance An Example
Groups f X fX (X-) (X - )
0 - 10
10 - 20
2
4
5
15
10
60
-20
-10
400
100 = (X)/N = 500/20 = 25.
20 - 30
30 - 40 40 - 50
8
4 2
25
35 45
200 0
140 10 90 20
0
100 400 1000
20
500 0
Standard Deviation
Population Standard Deviation: The
population standard deviation () is the square root of the population variance. For the previous example, the population standard deviation is = 7.07 (square root of 50).
Note: If you are given the population standard
deviation, just square that number to get the population variance. Standard deviation is always a positive number > 0.
bell-shaped distribution, approximately 68% of the observations will lie within 1s of the mean (m); approximately 98% within 2s of the mean (m); and approximately 99.7% within 3s of the mean (m).
m3s
m-2s m-1s m
m+1s m+2s m+ 3s
Coefficient of Variation
Coefficient of Variation: The ratio of the standard deviation to the arithmetic mean, expressed as a percentage. It is a measure of relative dispersion.
s (100%) CV X
Practical Exercise