Beruflich Dokumente
Kultur Dokumente
2
Organize data and display data using tables and graphs
a) presentation of qualitative data
b) presentation of quantitative data
3
4
Definition:
Data recorded in the sequence in which they
are collected and before they are processed
or ranked are called raw data
21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24
5
Qualitative Data
Data that cannot be measured but can be
classified into different categories
Example: gender, status of a students,
nationality, races
Quantitative Data
Data that can be measured numerically
Example: income, heights, gross sales, prices of
homes, numbers of cars owned and numbers of
accident
6
a) Organizing qualitative data
◦ (i) Frequency distributions
◦ (ii)Relative frequency and percentage
distributions
b) Graphing qualitative data
◦ (i) Bar graphs
◦ (ii)Pie charts
7
A frequency distribution for
qualitative data lists all categories
and the number of elements that
belong to each of the categories.
8
A sample of 20 employees from large
companies was selected and these employees
were asked how stressful their jobs were. The
responses are recorded as very represents very
stressful, somewhat means somewhat stressful
and none stands for not stressful at all.
somewhat none somewhat very very
none very somewhat somewhat very
somewhat somewhat very somewhat none
very none somewhat somewhat very
9
Stress on job Tally Frequency (f )
Very |||| || 7
Somewhat |||| |||| 9
None |||| 4
Sum = 20
10
Frequency of that category
Relative frequency of a category =
Sum of all frequencies
11
Stress on job Relative frequency Percentage (%)
Very 7/20 = 0.35 0.35(100) = 35
Somewhat 9/20 = 0.45 0.45(100) = 45
None 4/20 = 0.20 0.20(100) = 20
Sum = 1.00 Sum = 100%
12
A graph made of bars where the categories
are on the horizontal axis and the frequencies
(or relative frequencies) are on the vertical
axis.
60
40
20
0
heart cancer stroke CLRD accident
13
A circle divided into portions that represent the
relative frequencies or percentages of a
population or a sample belonging to different
categories is called a pie chart.
14
Numerical
Data
1 2 3
Types of Frequency
Array Distributions
quantitative data
a b c d
17
Raw Data: Yards Produced by 30 Carpet Looms
16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8
15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0
16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3
(ungrouped data)
Chapter 3 18
Raw Data: Yards Produced by 30 Carpet Looms
16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8
15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0
16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3
Chapter 3 19
Discrete data - integer values 0, 1, 2
Example: number of children, cars,..
Continuous data
Example: weight, length, time, area,
price, 256.312 grams
20
A frequency distribution for quantitative data
lists all the classes and the number of values
that belong to each class. Data presented in the
form of a frequency distribution are called
grouped data
21
Weekly Earnings Num of
variable (RM) Employees (f )
401 - 600 9 frequency
third class column
601 - 800 12
801 - 1000 39
classes frequencies
1001 - 1200 15
1201 - 1400 9
lower limit 1401 - 1600 6
of sixth class
upper limit of sixth class
22
Class boundary = upper limit + lower limit of
next class
2
Ex: Upper boundary of first class
(600+601)/2 = 600.5
Lower boundary of second class
(601+600)/2 = 600.5
23
Class width = upper boundary - lower boundary
Example:
Width of first class
600.5 - 400.5 = 200
Width of second class
800.5 - 600.5 = 200
24
Class midpoint = lower limit + upper limit
2
25
Height (cm) Number of
Students
60 - 62 10
class 63 - 65 18 frequency
interval 66 - 68 42
69 - 71 27
72 – 74 8
Total 105
26
Weekly Earnings Num of
(RM) Employees (f )
400 - 600 9
600 - 800 12
800 - 1000 39
1000 - 1200 15
1200 - 1400 9
1400 - 1600 6
Chapter 3 29
When constructing a frequency distribution
table, we need to make the following three
major decisions :
Number of Classes
Class Width
Lower Limit of the First Class / Starting Point
30
Number of Classes
k = 1 + 3.3 log n
Class width
i ≥ Largest Value – Smallest Value
Number of classes (k)
Lower Limit of the First Class/ Starting Point
◦ Any convenient number that is equal to or less than
the smallest value in the data set can be used as the
lower limit of the first class.
31
1. Determine the Class Interval Size or
Class Width)
Example: Given the following data
100 74 84 95 95 110 99 87
100 108 85 103 99 83 91 91
84 110 113 105 100 98 100 108
100 98 100 107 79 86 123 107
87 105 88 85 99 101 93 99
u R = 123 - 74 = 49
32
Number of Classes
k = 1 + 3.3 log n
= 1 + 3.3 log 40
= 6.3
≈ 6
33
Class Width
i ≥ Largest Value – Smallest Value
Number of classes (k)
≥ 49/6
≥ 9
34
Grouped Cumulative
Frequency Class Frequency Relative Frequency %
Distribution (1) (1) 40
71 - 80
81 - 90 Class Interval Midpoint
91 - 100 (71 + 80)/2 = 75.5
6 classes Upper Limit
101 - 110
100
111 - 120
121 - 130 Lower Limit
91
Class width = 130.5 – 120.5
= 10
Chapter 3 35
Class Boundary – Is given by the mid-point of
the upper limit of one class and the lower limit
of the next class. Class boundaries are also call
real class limit.
36
• Histogram is a certain kind of graph that can
be drawn for a frequency distribution, a
relative frequency distribution or a percentage
distribution.
• To draw histogram, mark horizontal axis as
classes and vertical axis as frequencies (or
relative frequencies or percentage).
• A histogram is called a frequency histogram, a
relative frequency histogram or a percentage
histogram depending on the vertical axis
37
Class Frequency
15.2-15.5 2
12 15.5-15.8 5
15.8-16.1 11
10
Frequency
16.1-16.4 6
8 16.4-16.7 3
16.7-16.10 3
6
4
2
0
15.2 15.5 15.8 16.1 16.4 16.7
15.5 15.8 16.1 16.4 16.7 16.10
38
• A graph formed by joining the midpoints of the
tops of successive bars in a histogram with
straight lines is called a polygon.
• A graph of polygon consist of class midpoints
on the horizontal axis and the frequencies,
relative frequencies or percentages on the
vertical axis.
• A histogram is called a frequency histogram, a
relative frequency histogram or a percentage
histogram depending on the vertical axis
39
12
10
Frequency
8
6
4
2
0
15.0 15.3 15.6 15.9 16.2 16.5 16.8 17.1
Production Level in Yards
40
• Ogive is a curve drawn for the cumulative
frequency distribution by joining with straight lines
the dots marked above the upper boundaries of
classes at heights equal to the cumulative
frequencies of respective classes.
41
• Each value is divided into two portions (a
stem and a leaf). The leaves for each stem are
shown separately is a display.
• An advantage of a stem and leaf display is we
do not lose information on individual
observations
• only for quantitative data
42
The following are scores of 30 college students
on a statistics test:
75 52 80 96 65 79 71 87
93 95 69 72 81 61 76 86
79 68 50 92 83 84 77 64
71 87 72 92 57 98
43
1. Split each score into two parts
2. First part contains first digit which called stem
3. Second part contains the second digit which called
the leaf
4. Arranged in increasing order.
5 2
stem 6
leaves
7 5
8
9
44
Thecomplete stem and leaf display for scores is
shown below:
5 2 0 7
6 5 9 1 8 4
7 5 9 1 2 6 9 7 1 2
8 0 7 1 6 3 4 7
9 6 3 5 2 2 8
From the figure, the stem 7 has the highest
frequency followed by stem 8,9,6 and 5
45
The leaves for each stem are ranked in increasing
order as below:
5 0 2 7
6 1 4 5 8 9
7 1 1 2 2 5 6 7 9 9
8 0 1 3 4 6 7 7
9 2 2 3 5 6 8
46
Diastolic blood pressure on 120 people.
60 Type A people vs. 60 Type B people
47
Type A: Extremely hostile, competitive, impatient
53, 57, 58, 59, 59, 60, …
48
5 37899 5 1299
6 00001111 6 0001122233333
6 2223333 6 4445555555777
6 444455555 6 888889
6 666777778888 7 0000111
7 0000111 7 222333466899
7 333444789 8 0000
8 011 9 3
49
5 37899
6 00001111
6 2223333
6 444455555
6 666777778888
Modes
7 0000111
7 333444789
8 011
50
51
distinguish among the measures of central
tendency, measures of dispersion and
measures of skewness.
calculate values for common measures of
location, including the arithmetic mean,
median and mode.
calculate values for common measures of
dispersion, including range, variance, standard
deviation and quartile deviation
calculate values for measures of skewness.
52
Measure
of central
measure of asymmetry: tendency
to show frequency
measure of location:
distribution symmetrical
to show where the centre
about the mean or skewed
of the data
Statistical
Measures
Measure of Measure of
skewness dispersion
measure of spread:
to show how spread out
the data are around the
centre
53
MEASURE OF CENTRAL TENDENCY
a) Set of values, x =
x
1. Mean ( average, x) n
- Add all observation b) Simple frequency distribution
Divide this sum by the
-
fx
number of observation x=
f
c) Grouped frequency
x=
fx
f
( x = class midpoint)
54
MEASURE OF CENTRAL TENDENCY
it might be distorted by extremely high or low values.
55
MEASURE OF CENTRAL TENDENCY
◦ Advantages
it is widely understood
the value of every item is included in the computation of
the mean.
it is well suited to further statistical analysis.
◦ Disadvantages
its value may not correspond to any actual value.
it might be affected by extremely high or low values.
56
MEASURE OF CENTRAL TENDENCY
Example
a. The arithmetic mean (mean) of the number 8, 3, 5, 12,
and 10 is..
a. x x 8 3 5 12 10 7.6
n 5
b. x
fx 5(3) 8(2) 6(4) 2(1)
5.7
f 3 2 4 1
c. Class f x (midpoint) fx
1-3 1 2 2
4-6 4 5 20
7-9 8 8 64
10 - 12 6 11 66
13 - 15 3 14 42 x
fx 211 9.17
16 - 18 1 17 17 f 23
f fx
23 211
58
MEASURE OF CENTRAL TENDENCY
a) Set of data
2. Median (middle value
of a distribution or array)
b) Simple frequency distribution
n 1 n n
- Arrange the observations odd and 1
2 2 2
in order of increasing size even
- Find the number of observations ( n = sample size )
and the middle observation
c) Grouped frequency
- Identify the median as this middle
(i) Graphical method
value
(ii) Interpolation method
59
MEASURE OF CENTRAL TENDENCY
Median = 700
60
MEASURE OF CENTRAL TENDENCY
n
2 Fm 1
Lm Cm
Median = fm
Where:
Lm = the lower boundary of the class containing the median.
n = the total frequencies.
Fm-1 = the cumulative frequency in the classes immediately
preceding the class containing the median.
fm = the frequency in the class containing the median.
61
MEASURE OF CENTRAL TENDENCY
62
MEASURE OF CENTRAL TENDENCY
Advantages
it is unaffected by extremely high or low values.
can be used when certain end values of a set or
distribution are difficult, expensive or impossible to
obtain, particularly appropriate to ‘life’ data.
can be used with non-numeric data if desired, providing
the measurements can be naturally ordered.
will often assume a value equal to one of the original data.
Disadvantages
it is difficult to handle theoretically in more advanced
statistical work, so its use is restricted to analysis at a
basic level.
it fails to reflect the full range of values.
63
MEASURE OF CENTRAL TENDENCY
Example
a. The times taken to inspect five units coming from a
production line
are recorded as 13, 14, 11, 17 and 11 minutes. What is
the median?
b. Find the median of the following frequency distribution
Class Frequency
118 - 126 3
127 - 135 5
136 - 144 9
145 - 153 12
154 - 162 5
163 - 171 4
172 - 180 2
64
MEASURE OF CENTRAL TENDENCY
a. n 1 5 1
11, 11, 13, 14, 17 median 3
2 2
median 13
b. Class f F
118 - 126 3 3
127 - 135 5 8 n
Fm1
2
136 - 144 9 17 median Lm Cm
145 - 153 12 29 fm
154 - 162 5 34
163 - 171 4 38 40
17
172 - 180 2 40
=144.5+ 2 (153.5 144.5)
12
n 40
median class 20
2 2 147
65
MEASURE OF CENTRAL TENDENCY
66
MEASURE OF CENTRAL TENDENCY
16
14
12
No. of cars
10
8
Mode = 146
6
4
2
0
110 - 120 120 - 130 130 - 140 140 - 150 150 - 160 160 - 170 170 - 180
Mileage (km)
67
MEASURE OF CENTRAL TENDENCY
68
MEASURE OF CENTRAL TENDENCY
Mode
69
MEASURE OF CENTRAL TENDENCY
◦ Advantages
it is more appropriate average to use in situations where it is useful
to know the most common value.
easy to understand, not difficult to calculate and can be used when
a distribution has opened-ended classes.
it is not affected by extreme values.
◦ Disadvantages
it ignores dispersion around the modal value and it does not take all
the values into account.
it is unsuitable for further statistical analysis.
although it ignores extreme values, it is thought to be too much
affected by the most popular class when a distribution is
significantly skewed.
70
MEASURE OF CENTRAL TENDENCY
Example
a. Find the mode of the following frequency distribution
Class Frequency
1-3 1
4-6 4
7-9 8
10 - 12 6
13 - 15 3
16 - 18 1
71
MEASURE OF CENTRAL TENDENCY
Class Frequency
1-3 1
4-6 4
7-9 8 mode class
10 - 12 6
13 - 15 3
16 - 18 1
D1
mode L C
D1 D2
84
6.5 (9.5 6.5)
(8 4) (8 6)
8.5
72
MEASURE OF DISPERSION
1. Range
73
MEASURE OF DISPERSION
a) Set of data
2. Standard deviation
x x
2
x - x
2 2
-
n
s= s=
- Calculate the mean value n n
- find the deviation of each
observation from this mean b) Simple frequency distribution
-
f
- add the squares s=
f
- divide this sum by num of
observations c) Grouped frequency
- Square root of the value
fx fx
2
2
obtained
-
f
s=
f
where x = class mid-point
74
MEASURE OF DISPERSION
75
MEASURE OF DISPERSION
a) Set of data
3. Variance
fx
2
x - x
2 2
x
-
n
v= v=
n n
fx fx
2
2
2
s = -
f f
c) Grouped frequency
fx fx
2
2
2
s = -
f f
where x = class 76
MEASURE OF DISPERSION
Example
Class Frequency
0 - 4.9 3
5 - 9.9 5
10 - 14.9 7
15 - 19.9 6
20 - 24.9 2
77
MEASURE OF DISPERSION
Class f x x2 fx fx2
fx
2
2
fx
s
2
-
f f
4215.5575 281.35
2 s s2
5.8
23 23
183.2851 149.6367
33.6484
33.65 78
MEASURE OF DISPERSION
4. Chebyshev’s Theorem
- By using the mean and standard deviation, we can find the proportion
or percentage of the total observation that fall within a given interval
about the mean using Chebyshev’s theorem.
k k
k k 79
MEASURE OF DISPERSION
Example
The average systolic blood pressure for 4000 women who were
screened for high blood pressure was found to be 187 with a
standard deviation of 22. Using Chebyshev’s theorem, find at
least what percentage of women in this group have a systolic
blood pressure between 143 and 231.
80
MEASURE OF DISPERSION
Solution:
187 and 22
To find the percentage of blood pressure between 143 and 231
81
MEASURE OF DISPERSION
At least 75% of the women have systolic blood pressure between
143 and 231
At least 75% of the women
have systolic blood pressure
between 143 and 231.
82
MEASURE OF DISPERSION
5. Empirical Rule
- The empirical rule applies only to a specific type of distribution called
a bell-shaped distribution also known as normal curve.
99.7%
95%
68%
3 2 2 3
83
MEASURE OF DISPERSION
Example 1
84
MEASURE OF DISPERSION
Solution:
x 40 and s 12
To find the percentage of age between 16 and 64
16 - 40 = -24 64 - 40 = 24
16 x 40 64
24
2
12
85
MEASURE OF DISPERSION
16 - 40 = -24 64 - 40 = 24
= -2s = 2s
16 x 40 64
x 2s x 2s
86
MEASURE OF DISPERSION
Example 2
Assuming the incomes for all single parent household last year
produces a bell shaped distribution with mean RM23,500 and
standard deviation of RM4,500. Determine the range of
income if it is distributed for
68% = (RM19,000,RM28,000)
95% = (RM14,500,RM32,500)
99.7% = (RM10,000,RM37,000)
87
MEASURE OF DISPERSION
6. Coefficient of variation
88
MEASURE OF DISPERSION
Comparing coefficient of variation
the higher the coefficient
of variation, the more
dispersed are the data
89
MEASURE OF DISPERSION
Example 2
90
MEASURE OF DISPERSION
3n/4
n/4
x
Q1 Q3
92
MEASURE OF DISPERSION
n
4 - F Q1-1
Q1 = LQ + CQ
1 fQ 1
1
Where:
LQ1 = the lower boundary of the class containing Q1.
93
MEASURE OF DISPERSION
3n
- FQ
4 3-1
Q3 = L Q + CQ
3 fQ 3
3
Where:
LQ3 = the lower boundary of the class containing Q3.
94
MEASURE OF DISPERSION
Example
Class Frequency
0 - 9.9 5
10 - 19.9 19
20 - 29.9 38
30 - 39.9 43
40 - 49.9 34
50 - 59.9 17
60 - 69.9 4
95
MEASURE OF DISPERSION
Class f F
0 - 9.9 5 5
10 - 19.9 19 24 n
Q =
20 - 29.9 38 62 1 4
30 - 39.9 43 105
3n
40 - 49.9 34 139 Q =
3 4
50 - 59.9 17 156
60 - 69.9 4 160
n
- FQ
3n
- FQ
Q1 = LQ + 4 1-1 C 4 3-1
1 Q1 Q3 = L Q +
3 C Q3
fQ
1 f Q3
160 3(160)
24 105
19.95 4
38
10 39.95 4 10
34
24.16 44.36
96
MEASURE OF DISPERSION
97
MEASURE OF SKEWNESS
Mean
Mode Mode
Median Median
Mean
b. 11 1 2 2 3 3 4 5 6 7
32 4 4 5 6 6
13 1 2 2 2 3
04 0 1
Based on the stem-and-leaf plots above, find the
i) median,
ii) mode,
iii) mean and
iv) describe the shape of the distribution.
Answer :
i) 24 ii) 32 iii) 23.76 iv) Negative skewed distribution
100
MEASURE OF SKEWNESS
c. Class Frequency
0 - 100 5
100 - 200 19
200 - 300 38
300 - 400 43
400 - 500 34
101
MEASURE OF SKEWNESS
Curve C
Curve A
Curve B
102
MEASURE OF SKEWNESS
Curve A:
Curve B:
103
MEASURE OF SKEWNESS
Curve A: Curve B:
Positively Skewed Negatively Skewed
104
BOX-AND-WHISKER PLOT
Minimum Q1 Q2 Q3 Maximum
105
BOX-AND-WHISKER PLOT
Example
The following data are the incomes (in thousands of dollars) for
a sample of 12 households.
35 29 44 72 34 64 41 50 54
104 39 58
onstruct a box-and-whisker plot for these data.
Chapter 3 10
6
BOX-AND-WHISKER PLOT
Solution:
Q1 median
Q3
44 50
median 47
2
35 39
Q1 37
2
58 64
Q3 61
2
IQR(Q3 Q1 ) 61 37
107
ep 2: Determine the lower and upper inner fences
1.5 IQR 1.5 24 36
Lower inner fence Q1 36 37 36 1
Upper inner fence Q3 36 61 36 97
median
First quartile Third quartile
25 35 45 55 65 75 85 95 105
108
: called whiskers
Step 5:
median
First quartile Third quartile largest value
within the two an outlier
smallest value inner fences
within the two
inner fences
*
25 35 45 55 65 75 85 95 105
109
BOX-AND-WHISKER PLOT
Q1 Q2 Q3 Q1Q2Q3 Q1 Q2 Q3
111
BOX-AND-WHISKER PLOT
112
BOX-AND-WHISKER PLOT
113