Chapter 1 Descriptive Data

Chapter 3 1
◦ Methods for organising, displaying and

describing data using tables, graphs and
summary measures
 Raw data is made more manageable
 Raw data is presented in a logical form
 Patterns can be seen from organised data
 Frequency tables
 Graphical techniques
 Measures of Central Tendency
 Measures of Spread (variability)
2
 Organize data and display data using tables and graphs
 a) presentation of qualitative data
 b) presentation of quantitative data
 Describe the characteristics of data set using statistical

measures
 a) measures of central tendency
 b) measures of dispersion
 c) measures of skewness
d) Box and whisker plot
 e) Population vs sample
3
4
Definition:
Data recorded in the sequence in which they
are collected and before they are processed
or ranked are called raw data
21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24
Ex: Ages of 50 students
5
 Qualitative Data
 Data that cannot be measured but can be
classified into different categories
 Example: gender, status of a students,
nationality, races
 Quantitative Data
 Data that can be measured numerically
 Example: income, heights, gross sales, prices of
homes, numbers of cars owned and numbers of
accident
6
 a) Organizing qualitative data
◦ (i) Frequency distributions
◦ (ii)Relative frequency and percentage
distributions
 b) Graphing qualitative data
◦ (i) Bar graphs
◦ (ii)Pie charts
7
A frequency distribution for
qualitative data lists all categories
and the number of elements that
belong to each of the categories.
8
A sample of 20 employees from large
companies was selected and these employees
were asked how stressful their jobs were. The
responses are recorded as very represents very
stressful, somewhat means somewhat stressful
and none stands for not stressful at all.
somewhat none somewhat very very
none very somewhat somewhat very
somewhat somewhat very somewhat none
very none somewhat somewhat very
9
Stress on job Tally Frequency (f )
Very |||| || 7
Somewhat |||| |||| 9
None |||| 4
Sum = 20
Frequency Distribution of Stress on Job
10
Frequency of that category
Relative frequency of a category =
Sum of all frequencies
Percentage = (Relative ferquency) 100
11
Stress on job Relative frequency Percentage (%)
Very 7/20 = 0.35 0.35(100) = 35
Somewhat 9/20 = 0.45 0.45(100) = 45
None 4/20 = 0.20 0.20(100) = 20
Sum = 1.00 Sum = 100%
Relative frequency and percentage distributions of

stress on job
12
A graph made of bars where the categories
are on the horizontal axis and the frequencies
(or relative frequencies) are on the vertical
axis.
60
40
20
0
heart cancer stroke CLRD accident
13
A circle divided into portions that represent the
relative frequencies or percentages of a
population or a sample belonging to different
categories is called a pie chart.
heart cancer stroke CLRD accident
14
Numerical
Data
1 2 3
Types of Frequency
Array Distributions
quantitative data
a b c d
Histogram Polygon Ogive Stem & Leaf

15
 1. Organizes data to focus on major
features
i. Ascending
Example: 1, 2, 3, 4, 5,….
ii. Descending
Example: 10, 9, 8, 7, 6,….
iii. Range (difference between the largest and
smallest)
Example: largest height is 74 inch
smallest height is 60 inch
range is 74 – 60 = 14 inch
16
 o Quickly notice lowest and highest values in
the data
 o Easily divide data into sections
 o Easily see values that occur frequently
 o Observe variability in the data
17
Raw Data: Yards Produced by 30 Carpet Looms
16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8
15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0
16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3
(ungrouped data)
Chapter 3 18
Raw Data: Yards Produced by 30 Carpet Looms
16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8
15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0
16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3
Data Array: 15.2 15.7 15.9 16.0 16.2 16.4

Daily Production in 15.4 15.7 15.9 16.0 16.3 16.6
Yards of 30 Carpet 15.6 15.8 15.9 16.0 16.3 16.8
Looms 15.6 15.8 15.9 16.1 16.3 16.8
15.6 15.8 16.0 16.2 16.4 16.9
Chapter 3 19
 Discrete data - integer values 0, 1, 2
 Example: number of children, cars,..
 Continuous data
 Example: weight, length, time, area,
price, 256.312 grams
20
A frequency distribution for quantitative data
lists all the classes and the number of values
that belong to each class. Data presented in the
form of a frequency distribution are called
grouped data
21
Weekly Earnings Num of
variable (RM) Employees (f )
401 - 600 9 frequency
third class column
601 - 800 12
801 - 1000 39
classes frequencies
1001 - 1200 15
1201 - 1400 9
lower limit 1401 - 1600 6
of sixth class
upper limit of sixth class
22
 Class boundary = upper limit + lower limit of
next class
2
Ex: Upper boundary of first class
(600+601)/2 = 600.5
Lower boundary of second class
(601+600)/2 = 600.5
Upper boundary one class = Lower boundary next class
23
 Class width = upper boundary - lower boundary
Example:
Width of first class
600.5 - 400.5 = 200
Width of second class
800.5 - 600.5 = 200
24
 Class midpoint = lower limit + upper limit
2
Ex: Midpoint of the first class

(401 + 600)/2 = 500.5
Ex: Midpoint of the second class
(601 + 800)/2 = 700.5
25
Height (cm) Number of
Students
60 - 62 10
class 63 - 65 18 frequency
interval 66 - 68 42
69 - 71 27
72 – 74 8
Total 105
i. First class limits. Lower class limit = 60

Upper class limit = 62
ii. First class boundary. Upper boundary = 62.5
Lower boundary = 59.5
iii. Class width. Example: c = 62.5 - 59.5 = 3
iv. First class midpoint = (60 + 62)/2 = 61
v. Class frequency = number of students
26
Weekly Earnings Num of
(RM) Employees (f )
400 - 600 9
600 - 800 12
800 - 1000 39
1000 - 1200 15
1200 - 1400 9
1400 - 1600 6
Class limit = Class boundary 27

15.2 15.2 15.3 15.3 15.3 15.3 15.3 15.4 15.4 15.4
Raw Data: 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.5 15.5
15.5 15.5 15.5 15.5 15.6 15.6 15.6 15.7 15.7 15.7
Frequency Class Tallies Frequency

Distribution
15.2 // 2
15.3 //// 5
15.4 //// //// / 11
15.5 //// / 6
15.6 /// 3
15.7 /// 3
Relative Cumulative
Frequency Class Frequency Relative Freq. Relative
Distribution (1) (1)  30 Frequency
15.2 2 0.07 0.07

15.3 5 0.16 0.23
15.4 11 0.37 0.60
15.5 6 0.20 0.80
15.6 3 0.10 0.90
15.7 3 0.10 1.00
30 1.00
Chapter 3 29
 When constructing a frequency distribution
table, we need to make the following three
major decisions :
 Number of Classes
 Class Width
 Lower Limit of the First Class / Starting Point
30
 Number of Classes
k = 1 + 3.3 log n
 Class width
i ≥ Largest Value – Smallest Value
Number of classes (k)
 Lower Limit of the First Class/ Starting Point
◦ Any convenient number that is equal to or less than
the smallest value in the data set can be used as the
lower limit of the first class.
31
 1. Determine the Class Interval Size or
Class Width)
Example: Given the following data
100 74 84 95 95 110 99 87
100 108 85 103 99 83 91 91
84 110 113 105 100 98 100 108
100 98 100 107 79 86 123 107
87 105 88 85 99 101 93 99
u R = 123 - 74 = 49
32
 Number of Classes
k = 1 + 3.3 log n
= 1 + 3.3 log 40
= 6.3
≈ 6
33
 Class Width
 i ≥ Largest Value – Smallest Value
 Number of classes (k)
 ≥ 49/6
 ≥ 9
34
Grouped Cumulative
Frequency Class Frequency Relative Frequency %
Distribution (1) (1)  40
71 - 80
81 - 90 Class Interval Midpoint
91 - 100 (71 + 80)/2 = 75.5
6 classes Upper Limit
101 - 110
100
111 - 120
121 - 130 Lower Limit
91
Class width = 130.5 – 120.5
= 10
Chapter 3 35
 Class Boundary – Is given by the mid-point of
the upper limit of one class and the lower limit
of the next class. Class boundaries are also call
real class limit.
36
• Histogram is a certain kind of graph that can
be drawn for a frequency distribution, a
relative frequency distribution or a percentage
distribution.
• To draw histogram, mark horizontal axis as
classes and vertical axis as frequencies (or
relative frequencies or percentage).
• A histogram is called a frequency histogram, a
relative frequency histogram or a percentage
histogram depending on the vertical axis
37
Class Frequency
15.2-15.5 2
12 15.5-15.8 5
15.8-16.1 11
10
Frequency
16.1-16.4 6
8 16.4-16.7 3
16.7-16.10 3
6
4
2
0
15.2 15.5 15.8 16.1 16.4 16.7
15.5 15.8 16.1 16.4 16.7 16.10
38
• A graph formed by joining the midpoints of the
tops of successive bars in a histogram with
straight lines is called a polygon.
• A graph of polygon consist of class midpoints
on the horizontal axis and the frequencies,
relative frequencies or percentages on the
vertical axis.
• A histogram is called a frequency histogram, a
relative frequency histogram or a percentage
histogram depending on the vertical axis
39
12
10
Frequency
8
6
4
2
0
15.0 15.3 15.6 15.9 16.2 16.5 16.8 17.1
Production Level in Yards
40
• Ogive is a curve drawn for the cumulative
frequency distribution by joining with straight lines
the dots marked above the upper boundaries of
classes at heights equal to the cumulative
frequencies of respective classes.
41
• Each value is divided into two portions (a
stem and a leaf). The leaves for each stem are
shown separately is a display.
• An advantage of a stem and leaf display is we
do not lose information on individual
observations
• only for quantitative data
42
The following are scores of 30 college students
on a statistics test:
75 52 80 96 65 79 71 87
93 95 69 72 81 61 76 86
79 68 50 92 83 84 77 64
71 87 72 92 57 98
Construct a stem and leaf display.
43
1. Split each score into two parts
2. First part contains first digit which called stem
3. Second part contains the second digit which called
the leaf
4. Arranged in increasing order.
5 2
stem 6
leaves
7 5
8
9
44
Thecomplete stem and leaf display for scores is
shown below:
5 2 0 7
6 5 9 1 8 4
7 5 9 1 2 6 9 7 1 2
8 0 7 1 6 3 4 7
9 6 3 5 2 2 8
From the figure, the stem 7 has the highest
frequency followed by stem 8,9,6 and 5
45
The leaves for each stem are ranked in increasing
order as below:
5 0 2 7
6 1 4 5 8 9
7 1 1 2 2 5 6 7 9 9
8 0 1 3 4 6 7 7
9 2 2 3 5 6 8
46
 Diastolic blood pressure on 120 people.
 60 Type A people vs. 60 Type B people
 Type A: Extremely hostile, competitive,

impatient
 Type B: Laid back people
47
Type A: Extremely hostile, competitive, impatient
53, 57, 58, 59, 59, 60, …
Type B: Laid back people

51, 52, 59, 59, 60, …
48
5 37899 5 1299
6 00001111 6 0001122233333
6 2223333 6 4445555555777
6 444455555 6 888889
6 666777778888 7 0000111
7 0000111 7 222333466899
7 333444789 8 0000
8 011 9 3
49
5 37899
6 00001111
6 2223333
6 444455555
6 666777778888
Modes
7 0000111
7 333444789
8 011
50
51
 distinguish among the measures of central
tendency, measures of dispersion and
measures of skewness.
 calculate values for common measures of
location, including the arithmetic mean,
median and mode.
 calculate values for common measures of
dispersion, including range, variance, standard
deviation and quartile deviation
 calculate values for measures of skewness.
52
Measure
of central
measure of asymmetry: tendency
to show frequency
measure of location:
distribution symmetrical
to show where the centre
about the mean or skewed
of the data
Statistical
Measures
Measure of Measure of
skewness dispersion
measure of spread:
to show how spread out
the data are around the
centre
53
MEASURE OF CENTRAL TENDENCY
a) Set of values, x =
 x
1. Mean ( average, x) n
- Add all observation b) Simple frequency distribution
Divide this sum by the

-
fx
number of observation x=
f
c) Grouped frequency
x=
 fx
f
( x = class midpoint)
54
 it might be distorted by extremely high or low values.
55
◦ Advantages
 it is widely understood
 the value of every item is included in the computation of
the mean.
 it is well suited to further statistical analysis.
◦ Disadvantages
 its value may not correspond to any actual value.
 it might be affected by extremely high or low values.
56
Example
a. The arithmetic mean (mean) of the number 8, 3, 5, 12,
and 10 is..
b. If 5, 8, 6, and 2 occur with frequencies 3, 2, 4 and 1,

the mean is..
c. Find the mean of the following frequency distribution

Class Frequency
1-3 1
4-6 4
7-9 8
10 - 12 6
13 - 15 3
16 - 18 1
57
a. x   x  8  3  5  12  10  7.6
n 5
b. x
 fx 5(3)  8(2)  6(4)  2(1)
  5.7
f 3  2  4 1
c. Class f x (midpoint) fx
1-3 1 2 2
4-6 4 5 20
7-9 8 8 64
10 - 12 6 11 66
13 - 15 3 14 42 x
 fx  211  9.17
16 - 18 1 17 17  f 23
f  fx
 23  211
58
a) Set of data
2. Median (middle value
of a distribution or array)
b) Simple frequency distribution
n 1 n n
- Arrange the observations odd and  1
2 2 2
in order of increasing size even
- Find the number of observations ( n = sample size )
and the middle observation
- Identify the median as this middle
(i) Graphical method
value
(ii) Interpolation method
59
(i) Graphical Method
Median = 700
60
(ii) Interpolation Method
n 
 2  Fm 1 
Lm   Cm
Median = fm
 
 
Where:
Lm = the lower boundary of the class containing the median.
n = the total frequencies.
Fm-1 = the cumulative frequency in the classes immediately
preceding the class containing the median.
fm = the frequency in the class containing the median.
Cm = the width of the class in which the median lies.
61
 it is unaffected by extremely high or low values.
62
 Advantages
 it is unaffected by extremely high or low values.
 can be used when certain end values of a set or
distribution are difficult, expensive or impossible to
obtain, particularly appropriate to ‘life’ data.
 can be used with non-numeric data if desired, providing
the measurements can be naturally ordered.
 will often assume a value equal to one of the original data.
 Disadvantages
 it is difficult to handle theoretically in more advanced
statistical work, so its use is restricted to analysis at a
basic level.
 it fails to reflect the full range of values.
63
Example
a. The times taken to inspect five units coming from a
production line
are recorded as 13, 14, 11, 17 and 11 minutes. What is
the median?
b. Find the median of the following frequency distribution
Class Frequency
118 - 126 3
127 - 135 5
136 - 144 9
145 - 153 12
154 - 162 5
163 - 171 4
172 - 180 2
64
a. n 1 5 1
11, 11, 13, 14, 17 median   3
2 2
median  13
b. Class f F
118 - 126 3 3
127 - 135 5 8 n 
 Fm1
2 
136 - 144 9 17 median  Lm    Cm
145 - 153 12 29  fm 
154 - 162 5 34  
163 - 171 4 38  40 
 17
172 - 180 2 40  
=144.5+  2  (153.5  144.5)
 12 
n 40  
median class    20
2 2  147
65
3. Mode (value which occurs a) Set of data

most often)

- Draw a frequency table
Mode = value that
for the data
appears most frequently
- Identify the mode as the
most frequent value

66
16
14
12
No. of cars
10
8
Mode = 146
6
4
2
0
110 - 120 120 - 130 130 - 140 140 - 150 150 - 160 160 - 170 170 - 180
Mileage (km)
67

Mode =
 D1 
L C
D D 
 1 2
Where:
L = The lower class boundary of class containing the
mode.
C = The class width for class containing the mode.
D1 = Difference between the largest frequency and the
frequency immediately preceding it (f0 – f-).
D2 = Difference between the largest frequency and the
frequency immediately following it (f0 – f+).
68
Mode
 the mode of a set of data is that value which

occurs most often, or, equivalently , has the
largest frequency.
69
◦ Advantages
 it is more appropriate average to use in situations where it is useful
to know the most common value.
 easy to understand, not difficult to calculate and can be used when
a distribution has opened-ended classes.
 it is not affected by extreme values.
◦ Disadvantages
 it ignores dispersion around the modal value and it does not take all
the values into account.
 it is unsuitable for further statistical analysis.
 although it ignores extreme values, it is thought to be too much
affected by the most popular class when a distribution is
significantly skewed.
70
Example
a. Find the mode of the following frequency distribution
Class Frequency
1-3 1
4-6 4
7-9 8
10 - 12 6
13 - 15 3
16 - 18 1
71
Class Frequency
1-3 1
4-6 4
7-9 8 mode class
10 - 12 6
13 - 15 3
16 - 18 1
 D1 
mode  L   C
 D1  D2 
 84 
 6.5    (9.5  6.5)
 (8  4)  (8  6) 
 8.5
72
MEASURE OF DISPERSION
1. Range
maximum value – minimum value
73
a) Set of data
2. Standard deviation
x  x 
2
 x - x
2 2
-
  n 
s= s=
- Calculate the mean value n n  
- find the deviation of each
observation from this mean b) Simple frequency distribution
- Square these deviations

 fx   fx 
2
2
-
  f 
- add the squares s=
f  
- divide this sum by num of
observations c) Grouped frequency
- Square root of the value
 fx   fx 
2
2
obtained
-
  f 
s=
f  
where x = class mid-point
74
Comparing standard deviation
75
a) Set of data
3. Variance
   fx 
2
  x - x
2 2
x
-
  n 
v= v=
n n  

variance =  standard deviation 
2
 fx   fx 
2
2
2
s = - 
f  f 

 fx   fx 
2
2
2
s = - 
f  f 

where x = class 76
Example
a. Find the variance and standard deviation of the following data:
Class Frequency
0 - 4.9 3
5 - 9.9 5
10 - 14.9 7
15 - 19.9 6
20 - 24.9 2
77
Class f x x2 fx fx2
0 - 4.9 3 2.45 6.0025 7.35 18.0075

5 - 9.9 5 7.45 55.5025 37.25 277.5125
10 - 14.9 7 12.45 155.0025 87.15 1085.0175
15 - 19.9 6 17.45 304.5025 104.7 1827.015
20 - 24.9 2 22.45 504.0025 44.9 1008.005
 f  fx  fx 2
 23  281.35  4215.5575
   fx 
2
2
fx
s 
2
- 
  
f  f
4215.5575  281.35 
2 s s2
    5.8
23  23 
 183.2851  149.6367
 33.6484
 33.65 78
4. Chebyshev’s Theorem
- By using the mean and standard deviation, we can find the proportion
or percentage of the total observation that fall within a given interval
about the mean using Chebyshev’s theorem.
For any number k greater than 1, at least (1  1 ) of the

k2
data values lie within k standard deviations of the mean.
At least (1-1/k2) of the

values lie in the shaded
areas.
  k    k
k k 79
Example
The average systolic blood pressure for 4000 women who were
screened for high blood pressure was found to be 187 with a
standard deviation of 22. Using Chebyshev’s theorem, find at
least what percentage of women in this group have a systolic
blood pressure between 143 and 231.
80
Solution:
  187 and   22
To find the percentage of blood pressure between 143 and 231
143 - 187 = -44 231 - 187 = 44

143   187 231
s obtained by dividing the distance between the mean by standard deviati

44
k 2
22
1 1
1 2
 1  2
 1  0.25  0.75
k (2)
81
At least 75% of the women have systolic blood pressure between
143 and 231
At least 75% of the women
have systolic blood pressure
between 143 and 231.
143 187 231

  2    2
82
5. Empirical Rule
- The empirical rule applies only to a specific type of distribution called
a bell-shaped distribution also known as normal curve.
• 68% of the observations lie within one standard deviation of the

mean
• 95% of the observations lie within two standard deviation of the
mean
• 99.7% of the observations lie within three standard deviation of
the mean
99.7%
95%
68%
  3   2         2   3
83
Example 1
The age distribution of a sample of 5000 person is bell-

shaped with a mean of 40 years and a standard deviation of
12 years. Determine the approximate percentage of people
who are 16 to 64 years old.
84
Solution:
x  40 and s  12
To find the percentage of age between 16 and 64
16 - 40 = -24 64 - 40 = 24
16 x  40 64
Dividing the distance,24 by the standard deviation,12 we have the

distance is equal 2s
24
2
12
85
16 - 40 = -24 64 - 40 = 24
= -2s = 2s
16 x  40 64
x  2s x  2s
Because the area within two standard deviations of the mean is

approximately 95% for a bell-shaped curve, approximately 95%
of the people in the sample are 16 to 64 years old.
86
Example 2
Assuming the incomes for all single parent household last year
produces a bell shaped distribution with mean RM23,500 and
standard deviation of RM4,500. Determine the range of
income if it is distributed for
68% = (RM19,000,RM28,000)
95% = (RM14,500,RM32,500)
99.7% = (RM10,000,RM37,000)
87
6. Coefficient of variation
standard deviation (s)

×100%
x
• The coefficient of variation represents the ratio of the standard

deviation to the mean, and it is a useful statistic for comparing
the degree of variation from one data series to another, even if
the means are drastically different from each other.
• Investopedia explains Coefficient Of Variation - CV
In the investing world, the coefficient of variation allows you to
determine how much volatility (risk) you are assuming
in comparison to the amount of return you can expect from your
investment. In simple language, the lower the ratio of standard
deviation to mean return, the better your risk-return tradeoff.
88
Comparing coefficient of variation
the higher the coefficient
of variation, the more
dispersed are the data
89
Example 2
New Car Used Car

Mean = RM20,100 Mean = RM5,485
Standard deviation = RM6,125 Standard dev.= RM2,730
90
7. Quartile Deviation a) Set of data

- Quartiles are defined as Q3 - Q1

Quartile Deviation =
value which are quarter 2
the data Inter-quartile range = Q3 -Q1
- Q1 - first quartile
 n  1 3  n  1
- value below 25% of Q1  Q3 
4 4
observations
- Q2 - second quartile
- half of the data(median)
- Q3 - third quartile
- value below 75% of
observation
91

F
n
3n/4
n/4
x
Q1 Q3
92
n 
 4 - F Q1-1 
Q1 = LQ +   CQ
1 fQ 1
 1 
Where:  
LQ1 = the lower boundary of the class containing Q1.
n = the total frequencies
FQ1-1 = the cumulative number of frequency in the classes

immediately preceding the class containing Q1.
fQ1 = the frequency in the class containing Q1.
CQ1 = the width of the class in which Q1 lies.
93
 3n 
- FQ
4 3-1 
Q3 = L Q +   CQ
3 fQ 3
 3 
Where:
 
LQ3 = the lower boundary of the class containing Q3.
n = the total frequencies.
FQ3-1 = the cumulative number of frequency in the classes

immediately preceding the class containing Q3.
fQ3 = the frequency in the class containing Q3.
CQ3 = the width of the class in which Q3 lies.
94
Example
a. Find the quartile deviation of the following data:
Class Frequency
0 - 9.9 5
10 - 19.9 19
20 - 29.9 38
30 - 39.9 43
40 - 49.9 34
50 - 59.9 17
60 - 69.9 4
95
Class f F
0 - 9.9 5 5
10 - 19.9 19 24 n
Q =
20 - 29.9 38 62 1 4
30 - 39.9 43 105
3n
40 - 49.9 34 139 Q =
3 4
50 - 59.9 17 156
60 - 69.9 4 160
n
- FQ
  3n 
 - FQ
Q1 = LQ +  4 1-1  C 4 3-1 
1  Q1 Q3 = L Q + 
3  C Q3
fQ
 1   f Q3 
   
 160   3(160) 
  24  105
19.95   4  
38
 10  39.95   4  10
   34 
   
 24.16  44.36
96
Therefore the quartile deviation is,

Q3 - Q1
Quartile Deviation =
2
44.36  24.16

2
 10.1
97
MEASURE OF SKEWNESS
•Skewness is the degree of asymmetry

•Method to describe data distribution
•Data which are not symmetrical may be either positively or
negatively skewed.
negative skewness positive skewness

98
MEASURE OF SKEWNESS
Mean
Mode Mode
Median Median
Mean
Symmetric Histogram Positive Skewed Histogram

Mode
Median
Mean
Negative Skewed Histogram

Chapter 3 99
MEASURE OF SKEWNESS
Example
a. What type of distribution is described by the following
information?Mean = 56 Median = 58.1 Mode = 63
Answer : Negatively skewed
b. 11 1 2 2 3 3 4 5 6 7
32 4 4 5 6 6
13 1 2 2 2 3
04 0 1
Based on the stem-and-leaf plots above, find the
i) median,
ii) mode,
iii) mean and
iv) describe the shape of the distribution.
Answer :
i) 24 ii) 32 iii) 23.76 iv) Negative skewed distribution
100
MEASURE OF SKEWNESS
c. Class Frequency
0 - 100 5
100 - 200 19
200 - 300 38
300 - 400 43
400 - 500 34
Based on the distribution table
i) construct a histogram, and

ii) describe the shape of the distribution.
101
MEASURE OF SKEWNESS
Curve C
Curve A
Curve B
102
MEASURE OF SKEWNESS
Curve A:
Curve B:
103
MEASURE OF SKEWNESS
Curve A: Curve B:
Positively Skewed Negatively Skewed
104
BOX-AND-WHISKER PLOT
A plot that show the center, spread and skewness of a data

set. It is constructed by drawing a box and two whiskers that
use the median,the first quartile, the third quartile and the
smallest and the largest values in the data set between the
lower and the upper inner fences.
Minimum Q1 Q2 Q3 Maximum
105
Example
The following data are the incomes (in thousands of dollars) for
a sample of 12 households.
35 29 44 72 34 64 41 50 54
104 39 58
onstruct a box-and-whisker plot for these data.
Chapter 3 10
6
Solution:
Step 1: Rank the data

29 34 35 39 41 44 50 54
58 64 72 104
Q1 median
Q3
44  50
median   47
2
35  39
Q1   37
2
58  64
Q3   61
2
IQR(Q3  Q1 )  61  37
107
ep 2: Determine the lower and upper inner fences
1.5  IQR  1.5  24  36
Lower inner fence  Q1  36  37  36  1
Upper inner fence  Q3  36  61  36  97
Step 3: Determine the smallest and the largest values in the

data set within the two inner fences
Smallest value = 29
Largest value = 72
Step 4: Draw
median
First quartile Third quartile
25 35 45 55 65 75 85 95 105
108
: called whiskers
Step 5:
median
First quartile Third quartile largest value
within the two an outlier
smallest value inner fences
within the two
inner fences
*
25 35 45 55 65 75 85 95 105
outlier : value that falls outside

the two inner fences (value that
are very small or very large
relative).
The data are skewed to the right
109
S<0 S=0 S>0
Negatively Symmetric Positively

Skewed (Not Skewed) Skewed
110
Left-Skewed Symmetric Right-Skewed
Q1 Q2 Q3 Q1Q2Q3 Q1 Q2 Q3
111
 Median close to the center of the box -- symmetrical

 Median close to the left of the center of the box --
positive skewed
 Median close to the right of the center of the box --
negative skewed
 Whiskers are the same length -- symmetrical
 Whisker is longer than the left whisker -- positive
skewed
 Whisker is longer than the right whisker -- negative
skewed
112
 A bimodal distribution has two modes.

 All classes occur with approximately the same
frequency in a uniform distribution.

 An outlier in any graph of data is an individual
observation that falls outside the overall

pattern of the graph.
113

Chapter 1 Descriptive Data

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Chapter 1 Descriptive Data

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 3 1

◦ Methods for organising, displaying and

 Describe the characteristics of data set using statistical

Ex: Ages of 50 students

Frequency Distribution of Stress on Job

Percentage = (Relative ferquency) 100

Relative frequency and percentage distributions of

heart cancer stroke CLRD accident

Histogram Polygon Ogive Stem & Leaf

Data Array: 15.2 15.7 15.9 16.0 16.2 16.4

Upper boundary one class = Lower boundary next class

Ex: Midpoint of the first class

i. First class limits. Lower class limit = 60

Class limit = Class boundary 27

Frequency Class Tallies Frequency

15.2 2 0.07 0.07

Construct a stem and leaf display.

 Type A: Extremely hostile, competitive,

Type B: Laid back people

b. If 5, 8, 6, and 2 occur with frequencies 3, 2, 4 and 1,

c. Find the mean of the following frequency distribution

(i) Graphical Method

(ii) Interpolation Method

Cm = the width of the class in which the median lies.

 it is unaffected by extremely high or low values.

3. Mode (value which occurs a) Set of data

b) Simple frequency distribution

(i) Graphical method

(i) Graphical Method

(ii) Interpolation Method

 the mode of a set of data is that value which

maximum value – minimum value

- Square these deviations

Comparing standard deviation

b) Simple frequency distribution

a. Find the variance and standard deviation of the following data:

0 - 4.9 3 2.45 6.0025 7.35 18.0075

For any number k greater than 1, at least (1  1 ) of the

At least (1-1/k2) of the

143 - 187 = -44 231 - 187 = 44

s obtained by dividing the distance between the mean by standard deviati

143 187 231

• 68% of the observations lie within one standard deviation of the

The age distribution of a sample of 5000 person is bell-

Dividing the distance,24 by the standard deviation,12 we have the

Because the area within two standard deviations of the mean is

standard deviation (s)

• The coefficient of variation represents the ratio of the standard

New Car Used Car

7. Quartile Deviation a) Set of data

- Quartiles are defined as Q3 - Q1

(i) Graphical Method

(ii) Interpolation Method

n = the total frequencies

FQ1-1 = the cumulative number of frequency in the classes

fQ1 = the frequency in the class containing Q1.

CQ1 = the width of the class in which Q1 lies.

n = the total frequencies.

FQ3-1 = the cumulative number of frequency in the classes

fQ3 = the frequency in the class containing Q3.

CQ3 = the width of the class in which Q3 lies.

a. Find the quartile deviation of the following data:

Therefore the quartile deviation is,