Sie sind auf Seite 1von 113

Chapter 3 1

◦ Methods for organising, displaying and


describing data using tables, graphs and
summary measures
 Raw data is made more manageable
 Raw data is presented in a logical form
 Patterns can be seen from organised data
 Frequency tables
 Graphical techniques
 Measures of Central Tendency
 Measures of Spread (variability)

2
 Organize data and display data using tables and graphs
 a) presentation of qualitative data
 b) presentation of quantitative data

 Describe the characteristics of data set using statistical


measures
 a) measures of central tendency
 b) measures of dispersion
 c) measures of skewness
d) Box and whisker plot
 e) Population vs sample

3
4
Definition:
Data recorded in the sequence in which they
are collected and before they are processed
or ranked are called raw data

21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24

Ex: Ages of 50 students

5
 Qualitative Data
 Data that cannot be measured but can be
classified into different categories
 Example: gender, status of a students,
nationality, races
 Quantitative Data
 Data that can be measured numerically
 Example: income, heights, gross sales, prices of
homes, numbers of cars owned and numbers of
accident

6
 a) Organizing qualitative data
◦ (i) Frequency distributions
◦ (ii)Relative frequency and percentage
distributions
 b) Graphing qualitative data
◦ (i) Bar graphs
◦ (ii)Pie charts

7
A frequency distribution for
qualitative data lists all categories
and the number of elements that
belong to each of the categories.

8
A sample of 20 employees from large
companies was selected and these employees
were asked how stressful their jobs were. The
responses are recorded as very represents very
stressful, somewhat means somewhat stressful
and none stands for not stressful at all.
somewhat none somewhat very very
none very somewhat somewhat very
somewhat somewhat very somewhat none
very none somewhat somewhat very

9
Stress on job Tally Frequency (f )
Very |||| || 7
Somewhat |||| |||| 9
None |||| 4
Sum = 20

Frequency Distribution of Stress on Job

10
Frequency of that category
Relative frequency of a category =
Sum of all frequencies

Percentage = (Relative ferquency) 100

11
Stress on job Relative frequency Percentage (%)
Very 7/20 = 0.35 0.35(100) = 35
Somewhat 9/20 = 0.45 0.45(100) = 45
None 4/20 = 0.20 0.20(100) = 20
Sum = 1.00 Sum = 100%

Relative frequency and percentage distributions of


stress on job

12
A graph made of bars where the categories
are on the horizontal axis and the frequencies
(or relative frequencies) are on the vertical
axis.
60

40

20

0
heart cancer stroke CLRD accident

13
A circle divided into portions that represent the
relative frequencies or percentages of a
population or a sample belonging to different
categories is called a pie chart.

heart cancer stroke CLRD accident

14
Numerical
Data

1 2 3

Types of Frequency
Array Distributions
quantitative data

a b c d

Histogram Polygon Ogive Stem & Leaf


15
 1. Organizes data to focus on major
features
i. Ascending
Example: 1, 2, 3, 4, 5,….
ii. Descending
Example: 10, 9, 8, 7, 6,….
iii. Range (difference between the largest and
smallest)
Example: largest height is 74 inch
smallest height is 60 inch
range is 74 – 60 = 14 inch
16
 o Quickly notice lowest and highest values in
the data
 o Easily divide data into sections
 o Easily see values that occur frequently
 o Observe variability in the data

17
Raw Data: Yards Produced by 30 Carpet Looms

16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8
15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0
16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3

(ungrouped data)

Chapter 3 18
Raw Data: Yards Produced by 30 Carpet Looms
16.2 15.4 16.0 16.6 15.9 15.8 16.0 16.8 16.9 16.8
15.7 16.4 15.2 15.8 15.9 16.1 15.6 15.9 15.6 16.0
16.4 15.8 15.7 16.2 15.6 15.9 16.3 16.3 16.0 16.3

Data Array: 15.2 15.7 15.9 16.0 16.2 16.4


Daily Production in 15.4 15.7 15.9 16.0 16.3 16.6
Yards of 30 Carpet 15.6 15.8 15.9 16.0 16.3 16.8
Looms 15.6 15.8 15.9 16.1 16.3 16.8
15.6 15.8 16.0 16.2 16.4 16.9

Chapter 3 19
 Discrete data - integer values 0, 1, 2
 Example: number of children, cars,..

 Continuous data
 Example: weight, length, time, area,
price, 256.312 grams

20
A frequency distribution for quantitative data
lists all the classes and the number of values
that belong to each class. Data presented in the
form of a frequency distribution are called
grouped data

21
Weekly Earnings Num of
variable (RM) Employees (f )
401 - 600 9 frequency
third class column
601 - 800 12
801 - 1000 39
classes frequencies
1001 - 1200 15
1201 - 1400 9
lower limit 1401 - 1600 6
of sixth class
upper limit of sixth class
22
 Class boundary = upper limit + lower limit of
next class
2
Ex: Upper boundary of first class
(600+601)/2 = 600.5
Lower boundary of second class
(601+600)/2 = 600.5

Upper boundary one class = Lower boundary next class

23
 Class width = upper boundary - lower boundary

Example:
Width of first class
600.5 - 400.5 = 200
Width of second class
800.5 - 600.5 = 200

24
 Class midpoint = lower limit + upper limit
2

Ex: Midpoint of the first class


(401 + 600)/2 = 500.5
Ex: Midpoint of the second class
(601 + 800)/2 = 700.5

25
Height (cm) Number of
Students
60 - 62 10
class 63 - 65 18 frequency
interval 66 - 68 42
69 - 71 27
72 – 74 8
Total 105

i. First class limits. Lower class limit = 60


Upper class limit = 62
ii. First class boundary. Upper boundary = 62.5
Lower boundary = 59.5
iii. Class width. Example: c = 62.5 - 59.5 = 3
iv. First class midpoint = (60 + 62)/2 = 61
v. Class frequency = number of students

26
Weekly Earnings Num of
(RM) Employees (f )
400 - 600 9
600 - 800 12
800 - 1000 39
1000 - 1200 15
1200 - 1400 9
1400 - 1600 6

Class limit = Class boundary 27


15.2 15.2 15.3 15.3 15.3 15.3 15.3 15.4 15.4 15.4
Raw Data: 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.4 15.5 15.5
15.5 15.5 15.5 15.5 15.6 15.6 15.6 15.7 15.7 15.7

Frequency Class Tallies Frequency


Distribution
15.2 // 2
15.3 //// 5
15.4 //// //// / 11
15.5 //// / 6
15.6 /// 3
15.7 /// 3
Relative Cumulative
Frequency Class Frequency Relative Freq. Relative
Distribution (1) (1)  30 Frequency

15.2 2 0.07 0.07


15.3 5 0.16 0.23
15.4 11 0.37 0.60
15.5 6 0.20 0.80
15.6 3 0.10 0.90
15.7 3 0.10 1.00
30 1.00

Chapter 3 29
 When constructing a frequency distribution
table, we need to make the following three
major decisions :
 Number of Classes
 Class Width
 Lower Limit of the First Class / Starting Point

30
 Number of Classes

k = 1 + 3.3 log n
 Class width
i ≥ Largest Value – Smallest Value
Number of classes (k)
 Lower Limit of the First Class/ Starting Point
◦ Any convenient number that is equal to or less than
the smallest value in the data set can be used as the
lower limit of the first class.

31
 1. Determine the Class Interval Size or
Class Width)
Example: Given the following data
100 74 84 95 95 110 99 87
100 108 85 103 99 83 91 91
84 110 113 105 100 98 100 108
100 98 100 107 79 86 123 107
87 105 88 85 99 101 93 99

u R = 123 - 74 = 49

32
 Number of Classes

k = 1 + 3.3 log n
= 1 + 3.3 log 40
= 6.3
≈ 6

33
 Class Width
 i ≥ Largest Value – Smallest Value
 Number of classes (k)
 ≥ 49/6
 ≥ 9

34
Grouped Cumulative
Frequency Class Frequency Relative Frequency %
Distribution (1) (1)  40
71 - 80
81 - 90 Class Interval Midpoint
91 - 100 (71 + 80)/2 = 75.5
6 classes Upper Limit
101 - 110
100
111 - 120
121 - 130 Lower Limit
91
Class width = 130.5 – 120.5
= 10

Chapter 3 35
 Class Boundary – Is given by the mid-point of
the upper limit of one class and the lower limit
of the next class. Class boundaries are also call
real class limit.

36
• Histogram is a certain kind of graph that can
be drawn for a frequency distribution, a
relative frequency distribution or a percentage
distribution.
• To draw histogram, mark horizontal axis as
classes and vertical axis as frequencies (or
relative frequencies or percentage).
• A histogram is called a frequency histogram, a
relative frequency histogram or a percentage
histogram depending on the vertical axis

37
Class Frequency

15.2-15.5 2
12 15.5-15.8 5
15.8-16.1 11
10
Frequency

16.1-16.4 6
8 16.4-16.7 3
16.7-16.10 3
6
4
2
0
15.2 15.5 15.8 16.1 16.4 16.7
15.5 15.8 16.1 16.4 16.7 16.10
38
• A graph formed by joining the midpoints of the
tops of successive bars in a histogram with
straight lines is called a polygon.
• A graph of polygon consist of class midpoints
on the horizontal axis and the frequencies,
relative frequencies or percentages on the
vertical axis.
• A histogram is called a frequency histogram, a
relative frequency histogram or a percentage
histogram depending on the vertical axis

39
12
10
Frequency

8
6
4
2
0
15.0 15.3 15.6 15.9 16.2 16.5 16.8 17.1
Production Level in Yards
40
• Ogive is a curve drawn for the cumulative
frequency distribution by joining with straight lines
the dots marked above the upper boundaries of
classes at heights equal to the cumulative
frequencies of respective classes.

41
• Each value is divided into two portions (a
stem and a leaf). The leaves for each stem are
shown separately is a display.
• An advantage of a stem and leaf display is we
do not lose information on individual
observations
• only for quantitative data

42
The following are scores of 30 college students
on a statistics test:

75 52 80 96 65 79 71 87
93 95 69 72 81 61 76 86
79 68 50 92 83 84 77 64
71 87 72 92 57 98

Construct a stem and leaf display.

43
1. Split each score into two parts
2. First part contains first digit which called stem
3. Second part contains the second digit which called
the leaf
4. Arranged in increasing order.
5 2
stem 6
leaves
7 5
8
9

44
Thecomplete stem and leaf display for scores is
shown below:
5 2 0 7
6 5 9 1 8 4
7 5 9 1 2 6 9 7 1 2
8 0 7 1 6 3 4 7
9 6 3 5 2 2 8
From the figure, the stem 7 has the highest
frequency followed by stem 8,9,6 and 5

45
The leaves for each stem are ranked in increasing
order as below:

5 0 2 7
6 1 4 5 8 9
7 1 1 2 2 5 6 7 9 9
8 0 1 3 4 6 7 7
9 2 2 3 5 6 8

46
 Diastolic blood pressure on 120 people.
 60 Type A people vs. 60 Type B people

 Type A: Extremely hostile, competitive,


impatient
 Type B: Laid back people

47
Type A: Extremely hostile, competitive, impatient
53, 57, 58, 59, 59, 60, …

Type B: Laid back people


51, 52, 59, 59, 60, …

48
5 37899 5 1299
6 00001111 6 0001122233333
6 2223333 6 4445555555777
6 444455555 6 888889
6 666777778888 7 0000111
7 0000111 7 222333466899
7 333444789 8 0000
8 011 9 3

49
5 37899
6 00001111
6 2223333
6 444455555
6 666777778888
Modes
7 0000111
7 333444789
8 011

50
51
 distinguish among the measures of central
tendency, measures of dispersion and
measures of skewness.
 calculate values for common measures of
location, including the arithmetic mean,
median and mode.
 calculate values for common measures of
dispersion, including range, variance, standard
deviation and quartile deviation
 calculate values for measures of skewness.

52
Measure
of central
measure of asymmetry: tendency
to show frequency
measure of location:
distribution symmetrical
to show where the centre
about the mean or skewed
of the data
Statistical
Measures
Measure of Measure of
skewness dispersion
measure of spread:
to show how spread out
the data are around the
centre
53
MEASURE OF CENTRAL TENDENCY

a) Set of values, x =
 x
1. Mean ( average, x) n
- Add all observation b) Simple frequency distribution
Divide this sum by the

-
fx
number of observation x=
f
c) Grouped frequency

x=
 fx
f
( x = class midpoint)

54
MEASURE OF CENTRAL TENDENCY
 it might be distorted by extremely high or low values.

55
MEASURE OF CENTRAL TENDENCY

◦ Advantages
 it is widely understood
 the value of every item is included in the computation of
the mean.
 it is well suited to further statistical analysis.

◦ Disadvantages
 its value may not correspond to any actual value.
 it might be affected by extremely high or low values.

56
MEASURE OF CENTRAL TENDENCY

Example
a. The arithmetic mean (mean) of the number 8, 3, 5, 12,
and 10 is..

b. If 5, 8, 6, and 2 occur with frequencies 3, 2, 4 and 1,


the mean is..

c. Find the mean of the following frequency distribution


Class Frequency
1-3 1
4-6 4
7-9 8
10 - 12 6
13 - 15 3
16 - 18 1
57
MEASURE OF CENTRAL TENDENCY

a. x   x  8  3  5  12  10  7.6
n 5

b. x
 fx 5(3)  8(2)  6(4)  2(1)
  5.7
f 3  2  4 1

c. Class f x (midpoint) fx
1-3 1 2 2
4-6 4 5 20
7-9 8 8 64
10 - 12 6 11 66
13 - 15 3 14 42 x
 fx  211  9.17
16 - 18 1 17 17  f 23
f  fx
 23  211

58
MEASURE OF CENTRAL TENDENCY

a) Set of data
2. Median (middle value
of a distribution or array)
b) Simple frequency distribution

n 1 n n
- Arrange the observations odd and  1
2 2 2
in order of increasing size even
- Find the number of observations ( n = sample size )
and the middle observation
c) Grouped frequency
- Identify the median as this middle
(i) Graphical method
value
(ii) Interpolation method

59
MEASURE OF CENTRAL TENDENCY

(i) Graphical Method

Median = 700

60
MEASURE OF CENTRAL TENDENCY

(ii) Interpolation Method

n 
 2  Fm 1 
Lm   Cm
Median = fm
 
 
Where:
Lm = the lower boundary of the class containing the median.
n = the total frequencies.
Fm-1 = the cumulative frequency in the classes immediately
preceding the class containing the median.
fm = the frequency in the class containing the median.

Cm = the width of the class in which the median lies.

61
MEASURE OF CENTRAL TENDENCY

 it is unaffected by extremely high or low values.

62
MEASURE OF CENTRAL TENDENCY

 Advantages
 it is unaffected by extremely high or low values.
 can be used when certain end values of a set or
distribution are difficult, expensive or impossible to
obtain, particularly appropriate to ‘life’ data.
 can be used with non-numeric data if desired, providing
the measurements can be naturally ordered.
 will often assume a value equal to one of the original data.
 Disadvantages
 it is difficult to handle theoretically in more advanced
statistical work, so its use is restricted to analysis at a
basic level.
 it fails to reflect the full range of values.

63
MEASURE OF CENTRAL TENDENCY

Example
a. The times taken to inspect five units coming from a

production line
are recorded as 13, 14, 11, 17 and 11 minutes. What is
the median?
b. Find the median of the following frequency distribution
Class Frequency
118 - 126 3
127 - 135 5
136 - 144 9
145 - 153 12
154 - 162 5
163 - 171 4
172 - 180 2
64
MEASURE OF CENTRAL TENDENCY

a. n 1 5 1
11, 11, 13, 14, 17 median   3
2 2
median  13

b. Class f F
118 - 126 3 3
127 - 135 5 8 n 
 Fm1
2 
136 - 144 9 17 median  Lm    Cm
145 - 153 12 29  fm 
154 - 162 5 34  
163 - 171 4 38  40 
 17
172 - 180 2 40  
=144.5+  2  (153.5  144.5)
 12 
n 40  
median class    20
2 2  147

65
MEASURE OF CENTRAL TENDENCY

3. Mode (value which occurs a) Set of data


most often)

b) Simple frequency distribution


- Draw a frequency table
Mode = value that
for the data
appears most frequently
- Identify the mode as the
most frequent value
c) Grouped frequency

(i) Graphical method


(ii) Interpolation method

66
MEASURE OF CENTRAL TENDENCY

(i) Graphical Method

16

14

12
No. of cars

10

8
Mode = 146
6

4
2
0
110 - 120 120 - 130 130 - 140 140 - 150 150 - 160 160 - 170 170 - 180

Mileage (km)

67
MEASURE OF CENTRAL TENDENCY

(ii) Interpolation Method


Mode =
 D1 
L C
D D 
 1 2
Where:
L = The lower class boundary of class containing the
mode.
C = The class width for class containing the mode.
D1 = Difference between the largest frequency and the
frequency immediately preceding it (f0 – f-).
D2 = Difference between the largest frequency and the
frequency immediately following it (f0 – f+).

68
MEASURE OF CENTRAL TENDENCY

Mode

 the mode of a set of data is that value which


occurs most often, or, equivalently , has the
largest frequency.

69
MEASURE OF CENTRAL TENDENCY

◦ Advantages
 it is more appropriate average to use in situations where it is useful
to know the most common value.
 easy to understand, not difficult to calculate and can be used when
a distribution has opened-ended classes.
 it is not affected by extreme values.

◦ Disadvantages
 it ignores dispersion around the modal value and it does not take all
the values into account.
 it is unsuitable for further statistical analysis.
 although it ignores extreme values, it is thought to be too much
affected by the most popular class when a distribution is
significantly skewed.

70
MEASURE OF CENTRAL TENDENCY

Example
a. Find the mode of the following frequency distribution

Class Frequency
1-3 1
4-6 4
7-9 8
10 - 12 6
13 - 15 3
16 - 18 1

71
MEASURE OF CENTRAL TENDENCY

Class Frequency
1-3 1
4-6 4
7-9 8 mode class
10 - 12 6
13 - 15 3
16 - 18 1

 D1 
mode  L   C
 D1  D2 
 84 
 6.5    (9.5  6.5)
 (8  4)  (8  6) 
 8.5

72
MEASURE OF DISPERSION

1. Range

maximum value – minimum value

73
MEASURE OF DISPERSION
a) Set of data
2. Standard deviation
x  x 
2

 x - x
2 2

-
  n 
s= s=
- Calculate the mean value n n  
- find the deviation of each
observation from this mean b) Simple frequency distribution

- Square these deviations


 fx   fx 
2
2

-
  f 
- add the squares s=
f  
- divide this sum by num of
observations c) Grouped frequency
- Square root of the value
 fx   fx 
2
2
obtained
-
  f 
s=
f  
where x = class mid-point
74
MEASURE OF DISPERSION

Comparing standard deviation

75
MEASURE OF DISPERSION
a) Set of data
3. Variance

   fx 
2

  x - x
2 2
x
-
  n 
v= v=
n n  

b) Simple frequency distribution


variance =  standard deviation 
2

 fx   fx 
2
2
2
s = - 
f  f 

c) Grouped frequency

 fx   fx 
2
2
2
s = - 
f  f 

where x = class 76
MEASURE OF DISPERSION

Example

a. Find the variance and standard deviation of the following data:

Class Frequency
0 - 4.9 3
5 - 9.9 5
10 - 14.9 7
15 - 19.9 6
20 - 24.9 2

77
MEASURE OF DISPERSION

Class f x x2 fx fx2

0 - 4.9 3 2.45 6.0025 7.35 18.0075


5 - 9.9 5 7.45 55.5025 37.25 277.5125
10 - 14.9 7 12.45 155.0025 87.15 1085.0175
15 - 19.9 6 17.45 304.5025 104.7 1827.015
20 - 24.9 2 22.45 504.0025 44.9 1008.005
 f  fx  fx 2
 23  281.35  4215.5575

   fx 
2
2
fx
s 
2
- 
  
f  f
4215.5575  281.35 
2 s s2
    5.8
23  23 
 183.2851  149.6367
 33.6484
 33.65 78
MEASURE OF DISPERSION
4. Chebyshev’s Theorem

- By using the mean and standard deviation, we can find the proportion
or percentage of the total observation that fall within a given interval
about the mean using Chebyshev’s theorem.

For any number k greater than 1, at least (1  1 ) of the


k2
data values lie within k standard deviations of the mean.

At least (1-1/k2) of the


values lie in the shaded
areas.

  k    k

k k 79
MEASURE OF DISPERSION

Example

The average systolic blood pressure for 4000 women who were
screened for high blood pressure was found to be 187 with a
standard deviation of 22. Using Chebyshev’s theorem, find at
least what percentage of women in this group have a systolic
blood pressure between 143 and 231.

80
MEASURE OF DISPERSION
Solution:
  187 and   22
To find the percentage of blood pressure between 143 and 231

143 - 187 = -44 231 - 187 = 44


143   187 231

s obtained by dividing the distance between the mean by standard deviati


44
k 2
22
1 1
1 2
 1  2
 1  0.25  0.75
k (2)

81
MEASURE OF DISPERSION
At least 75% of the women have systolic blood pressure between
143 and 231
At least 75% of the women
have systolic blood pressure
between 143 and 231.

143 187 231


  2    2

82
MEASURE OF DISPERSION
5. Empirical Rule
- The empirical rule applies only to a specific type of distribution called
a bell-shaped distribution also known as normal curve.

• 68% of the observations lie within one standard deviation of the


mean
• 95% of the observations lie within two standard deviation of the
mean
• 99.7% of the observations lie within three standard deviation of
the mean

99.7%

95%

68%

  3   2         2   3
83
MEASURE OF DISPERSION

Example 1

The age distribution of a sample of 5000 person is bell-


shaped with a mean of 40 years and a standard deviation of
12 years. Determine the approximate percentage of people
who are 16 to 64 years old.

84
MEASURE OF DISPERSION
Solution:
x  40 and s  12
To find the percentage of age between 16 and 64

16 - 40 = -24 64 - 40 = 24
16 x  40 64

Dividing the distance,24 by the standard deviation,12 we have the


distance is equal 2s

24
2
12

85
MEASURE OF DISPERSION

16 - 40 = -24 64 - 40 = 24
= -2s = 2s

16 x  40 64

x  2s x  2s

Because the area within two standard deviations of the mean is


approximately 95% for a bell-shaped curve, approximately 95%
of the people in the sample are 16 to 64 years old.

86
MEASURE OF DISPERSION

Example 2

Assuming the incomes for all single parent household last year
produces a bell shaped distribution with mean RM23,500 and
standard deviation of RM4,500. Determine the range of
income if it is distributed for

68% = (RM19,000,RM28,000)
95% = (RM14,500,RM32,500)
99.7% = (RM10,000,RM37,000)

87
MEASURE OF DISPERSION
6. Coefficient of variation

standard deviation (s)


×100%
x

• The coefficient of variation represents the ratio of the standard


deviation to the mean, and it is  a useful statistic for comparing
the degree of variation from one data series to another, even if
the means are drastically different from each other.
• Investopedia explains Coefficient Of Variation - CV
In the investing world, the coefficient of variation allows you to
determine how much volatility (risk) you are assuming
in comparison to the amount of return you can expect from your
investment. In simple language, the lower the ratio of standard
deviation to mean return, the better your risk-return tradeoff.

88
MEASURE OF DISPERSION
Comparing coefficient of variation
the higher the coefficient
of variation, the more
dispersed are the data

89
MEASURE OF DISPERSION

Example 2

New Car Used Car


Mean = RM20,100 Mean = RM5,485
Standard deviation = RM6,125 Standard dev.= RM2,730

90
MEASURE OF DISPERSION

7. Quartile Deviation a) Set of data


b) Simple frequency distribution

- Quartiles are defined as Q3 - Q1


Quartile Deviation =
value which are quarter 2
the data Inter-quartile range = Q3 -Q1
- Q1 - first quartile
 n  1 3  n  1
- value below 25% of Q1  Q3 
4 4
observations
- Q2 - second quartile
c) Grouped frequency
- half of the data(median)
(i) Graphical method
- Q3 - third quartile
(ii) Interpolation method
- value below 75% of
observation
91
MEASURE OF DISPERSION

(i) Graphical Method


F
n

3n/4

n/4

x
Q1 Q3

92
MEASURE OF DISPERSION

(ii) Interpolation Method

n 
 4 - F Q1-1 
Q1 = LQ +   CQ
1 fQ 1
 1 
Where:  
LQ1 = the lower boundary of the class containing Q1.

n = the total frequencies

FQ1-1 = the cumulative number of frequency in the classes


immediately preceding the class containing Q1.

fQ1 = the frequency in the class containing Q1.

CQ1 = the width of the class in which Q1 lies.

93
MEASURE OF DISPERSION

 3n 
- FQ
4 3-1 
Q3 = L Q +   CQ
3 fQ 3
 3 
Where:
 
LQ3 = the lower boundary of the class containing Q3.

n = the total frequencies.

FQ3-1 = the cumulative number of frequency in the classes


immediately preceding the class containing Q3.

fQ3 = the frequency in the class containing Q3.

CQ3 = the width of the class in which Q3 lies.

94
MEASURE OF DISPERSION

Example

a. Find the quartile deviation of the following data:

Class Frequency
0 - 9.9 5
10 - 19.9 19
20 - 29.9 38
30 - 39.9 43
40 - 49.9 34
50 - 59.9 17
60 - 69.9 4

95
MEASURE OF DISPERSION

Class f F
0 - 9.9 5 5
10 - 19.9 19 24 n
Q =
20 - 29.9 38 62 1 4
30 - 39.9 43 105
3n
40 - 49.9 34 139 Q =
3 4
50 - 59.9 17 156
60 - 69.9 4 160

n
- FQ
  3n 
 - FQ
Q1 = LQ +  4 1-1  C 4 3-1 
1  Q1 Q3 = L Q + 
3  C Q3
fQ
 1   f Q3 
   
 160   3(160) 
  24  105
19.95   4  
38
 10  39.95   4  10
   34 
   
 24.16  44.36

96
MEASURE OF DISPERSION

Therefore the quartile deviation is,


Q3 - Q1
Quartile Deviation =
2
44.36  24.16

2
 10.1

97
MEASURE OF SKEWNESS

•Skewness is the degree of asymmetry


•Method to describe data distribution
•Data which are not symmetrical may be either positively or
negatively skewed.

negative skewness positive skewness


98
MEASURE OF SKEWNESS

Mean
Mode Mode
Median Median
Mean

Symmetric Histogram Positive Skewed Histogram


Mode
Median
Mean

Negative Skewed Histogram


Chapter 3 99
MEASURE OF SKEWNESS
Example
a. What type of distribution is described by the following
information?Mean = 56 Median = 58.1 Mode = 63

Answer : Negatively skewed

b. 11 1 2 2 3 3 4 5 6 7
32 4 4 5 6 6
13 1 2 2 2 3
04 0 1
Based on the stem-and-leaf plots above, find the
i) median,
ii) mode,
iii) mean and
iv) describe the shape of the distribution.
Answer :
i) 24 ii) 32 iii) 23.76 iv) Negative skewed distribution
100
MEASURE OF SKEWNESS

c. Class Frequency
0 - 100 5
100 - 200 19
200 - 300 38
300 - 400 43
400 - 500 34

Based on the distribution table

i) construct a histogram, and


ii) describe the shape of the distribution.

101
MEASURE OF SKEWNESS

Curve C
Curve A
Curve B

102
MEASURE OF SKEWNESS

Curve A:
Curve B:

103
MEASURE OF SKEWNESS

Curve A: Curve B:
Positively Skewed Negatively Skewed

104
BOX-AND-WHISKER PLOT

A plot that show the center, spread and skewness of a data


set. It is constructed by drawing a box and two whiskers that
use the median,the first quartile, the third quartile and the
smallest and the largest values in the data set between the
lower and the upper inner fences.

Minimum Q1 Q2 Q3 Maximum

105
BOX-AND-WHISKER PLOT

Example

The following data are the incomes (in thousands of dollars) for
a sample of 12 households.

35 29 44 72 34 64 41 50 54
104 39 58
onstruct a box-and-whisker plot for these data.

Chapter 3 10
6
BOX-AND-WHISKER PLOT

Solution:

Step 1: Rank the data


29 34 35 39 41 44 50 54
58 64 72 104

Q1 median
Q3

44  50
median   47
2
35  39
Q1   37
2
58  64
Q3   61
2
IQR(Q3  Q1 )  61  37

107
ep 2: Determine the lower and upper inner fences
1.5  IQR  1.5  24  36
Lower inner fence  Q1  36  37  36  1
Upper inner fence  Q3  36  61  36  97

Step 3: Determine the smallest and the largest values in the


data set within the two inner fences
Smallest value = 29
Largest value = 72
Step 4: Draw

median
First quartile Third quartile

25 35 45 55 65 75 85 95 105

108
: called whiskers
Step 5:

median
First quartile Third quartile largest value
within the two an outlier
smallest value inner fences
within the two
inner fences
*
25 35 45 55 65 75 85 95 105

outlier : value that falls outside


the two inner fences (value that
are very small or very large
relative).

The data are skewed to the right

109
BOX-AND-WHISKER PLOT

S<0 S=0 S>0

Negatively Symmetric Positively


Skewed (Not Skewed) Skewed
110
BOX-AND-WHISKER PLOT

Left-Skewed Symmetric Right-Skewed

Q1 Q2 Q3 Q1Q2Q3 Q1 Q2 Q3

111
BOX-AND-WHISKER PLOT

 Median close to the center of the box -- symmetrical


 Median close to the left of the center of the box --
positive skewed
 Median close to the right of the center of the box --
negative skewed
 Whiskers are the same length -- symmetrical
 Whisker is longer than the left whisker -- positive
skewed
 Whisker is longer than the right whisker -- negative
skewed

112
BOX-AND-WHISKER PLOT

 A bimodal distribution has two modes.  


 All classes occur with approximately the same

frequency in a uniform distribution.


 An outlier in any graph of data is an individual

observation that falls outside the overall


pattern of the graph.

113

Das könnte Ihnen auch gefallen