Beruflich Dokumente
Kultur Dokumente
What is ‘statistics’?
Statistics is the science that deals with collecting, classifying, presenting,
describing, analysing and interpreting data to enable us to draw
conclusions and making reasonable decisions.
1
Descriptive statistics
The activity of collecting, classifying, presenting and describing
quantitative data.
Methods for organising (frequency table), representing (graphs) and
summarising data (central tendency and variability).
Inferential statistics
The part dealing with technique and method of interpretation of the
results obtained from the descriptive statistics.
Population Sample
Population is the entire A portion of population selected
(complete) collection of data for study.
whose properties are analysed.
It contains all the subjects of A sample is any set of entities,
interest. cases, subjects, items or
experimental units chosen from
Can be of any size, its items
the population.
need not be uniform but must
share at least one measurable
feature.
2
Random Sample
A random sample is a sample selected in such a way that each element
of the population has the same chance of being selected.
Parameter
Parameter is a numerical measurement describing some characteristics
of a population.
Eg. the population mean and variance
Statistic
Statistic is a numerical measurement describing some characteristics of
a sample.
Eg. the sample mean and variance
5
Variable
Any measured characteristic or attribute that differs for different
elements.
For example, if the weight of 30 people were measured, then weight
would be a variable.
Can be classified as quantitative or qualitative.
Quantitative Variable
The variable being studied is numeric and measured on an ordinal,
interval or ratio scale.
Eg. Ambient temperature, vehicular speed and walking distance.
3
Qualitative Variable
The variable being studied is non-numeric and measured on a nominal
scale.
Also called ‘categorical’ variable.
Eg. Gender, eye colour and educational level.
Data
A set of data is a collection of observations, measurements or
information obtained for a study.
It can be classified as qualitative data or quantitative data.
4
• Data that can only take exact and countable
values.
Discrete Data • Eg. number of students in a class, number of
cars sold in a day, number of persons in a
family.
Ungrouped Data
Raw data that is not in the term of interval.
Frequency distribution has been arranged in order.
Example:
Weight of seven students: 56, 74, 68, 90, 52, 48, 65
Number of cars owned per household:
10
5
Grouped Data
Data is grouped according to class intervals before the frequency
distribution is assigned.
Example:
Height of students in a class:
11
Measures of Location
Median
Mean Percentile
Measures
Mode of Quartile
Location
12
6
Measures of Central Tendency
Central tendency is a
statistical measure that Mean
determines a single value
that accurately describes the
center of the distribution and Mode Median
represents the entire
distribution of scores.
The goal is to identify the Measures
single value that is the best of Central
Tendency
representative for the entire
set of data.
13
• The median is the middle score for a data set that has
Median
been arranged in order of magnitude.
14
7
Quartiles
Quartiles are values that divide a data set into four parts containing an
approximately equal number of observations.
The total of 100% is split into four equal parts (four quarters):
Q1 Q2 Q3
Interquartile Range = Q3 – Q1
Percentiles
Percentiles divide a set of data which are arranged in ascending order
into 100 equal parts.
A percentile is a measure used to indicate the value below which a given
percentage of observations in a group of observations fall.
For example, the 25th percentile is the value below which 25% of the
observations may be found.
Note:
25th percentile (P25) = First quartile (Q1)
50th percentile (P50) = Second quartile (Q2), which is also the median
75th percentile (P75) = Third quartile (Q3)
16
8
Measures of Dispersion
Variance
Standard
Range
Deviation
Measures
of
Dispersion
17
Measures of Dispersion
Measures of dispersion (or variation) describe how spread out a set of
data is, or the extent of the variability in individual items of the distribution.
Let us look at the following data sets to see how measures of central
tendency is different from measures of dispersion:
Most of the numbers in data set 1 are close to the mean value, while in
data set 2 the numbers are spread away from the mean. The difference in
the spread can be determined by a measure of dispersion.
18
9
Measures of Dispersion
However, range is not a good measure of dispersion because it is
influenced by the extreme values and the calculation does not cover all
observations.
Variance and standard deviation are most useful and widely used
measures of dispersion. Although they are influenced by the extreme
values, the calculations cover all the observations.
19
Variance
Variance (s2 or s2) is the average of the squared differences from the
mean.
Standard Deviation
Standard deviation (s or s) a measure of dispersion of observations
within a data set. It is simply the square root of the variance.
If the observations are all close to the mean, then the standard deviation
is close to zero.
If many observations are far from the mean, then the standard deviation
is far from zero.
If all the observations are equal, then the standard deviation is zero.
20
10
The equation for variance (s2) is given below:
∑ 𝑥 − 𝑥̅
𝑠 =
𝑛−1
∑ 𝑥 − 𝑥̅
𝑠=
𝑛−1
21
Stem-and-Leaf Diagram
Stem Leaf
22
11
To construct a stem-and-leaf diagram:
1. Arrange the data in order of magnitude (ascending order).
2. Place the stems in order, vertically from smallest to largest.
3. Place the leaves in order, in each row from smallest to largest.
4. Create a key for the stem-and-leaf diagram so that people know how
to interpret the diagram.
23
Distribution of Data
A symmetric curve (bell-shaped) is one in which both sides of the
distribution would exactly match the other if the figure were folded over
its central point. This is called a normal distribution.
An example is shown below:
24
12
A distribution is said to be skewed to the right, or positively skewed,
when most of the data are concentrated on the left of the distribution.
The right tail clearly extends farther from the distribution's centre than
the left tail, as shown below:
Positive skew
25
Negative skew
26
13
Interpreting Distribution of Data from Stem-and-Leaf Diagram
If the stem-and-leaf diagram is turned on its side, it will look like the
following:
The distribution shows that most data are clustered at the right. The left
tail extends farther from the data centre than the right tail. Therefore, the
distribution is skewed to the left or negatively skewed.
27
Box-and-Whisker Plot
A box-and-whisker plot (also called a box plot) displays the five-number
summary of a set of data.
In a box plot, we draw a box from the first quartile to the third quartile. A
vertical line goes through the box at the median.
28
14
70
max
Horizontal Box-and-Whisker 60
Q1 Q2 Q3 50
min max
40 Q3
0 10 20 30 40 50 60 70
30
Q2
20
Vertical Box-and-Whisker 10
min
0
29
30
15
Lower inner fence Upper inner fence
min max
Q1 Q2 Q3
10 20 30 40 50 60 70 80 90 100
The data lies within the upper and lower inner fence, so the data has no outlier.
min max
Q1 Q2 Q3
10 20 30 40 50 60 70 80 90 100
Q1 Q2 Q3
min max
32
16
The diagram below shows a positively skewed distribution (skewed to
the right). The left ‘whisker’ is shorter than the right ‘whisker’ and the
median (Q2) is nearer to Q1.
Q1 Q2 Q3
min max
33
Q1 Q2 Q3
min max
34
17
Analysing Grouped Data
Median Percentile
Mean Quartile
Measures
Mode Decile
of Location
35
Standard Interquartile
Deviation Range
Variance Range
Measures
of
Dispersion
36
18
Formula
∑
Mean, 𝑥̅ = ∑
where x = data and f = frequency
Mode = 𝐿 + c
37
Median = 𝐿 + c
Quartile, 𝑄 = 𝐿 + 𝑐
38
19
Percentile, 𝑃 = 𝐿 + 𝑐
Decile, 𝐷 = 𝐿 + 𝑐
where k = 1, 2, 3, …
Lk = lower boundary of the class where Qk, Pk, Dk lies
n = total number of observations
FL = cumulative frequency of the class before the Qk, Pk, Dk class
fk = frequency of the class where Qk, Pk, Dk lies
ck = size of the class where Qk, Pk, Dk lies
39
∑
∑
∑
Variance, 𝑠 = ∑
∑
∑
∑
Standard Deviation, 𝑠 = ∑
40
20