Sie sind auf Seite 1von 35

TOPIC 7 : STATISTICS

(FORM 4)

Presented by
Nur Shakirin Sabri
Nur Ayuni Daud
Nornaimah Rodzi
Ungrouped and Grouped
Data
 Ungrouped data is where the values are
handled individually.
e.g. 1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17
 Grouped data is where the values are
grouped into classes because
sometimes we may collect a large
number of data with varying values. e.g.

Circumference (cm) 1 - 2 3-4 5-6 7-8


No. of branches 2 10 14 8
Mean
Mean of an ungrouped set of data is when
we add up all the values in a set of data
and the sum is then divided by the
number of the values.
Let x be any value in the set of data
 N be the number of values
 x be the mean of the set of data.
Example: Find the mean of the following

data.
1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17
= 1 + 3 + 6 + 6 + 6 + 6 + 7 + 7 + 12 + 12 + 17 =

7.546
Mean of grouped data.
 For grouped data, we take the midpoint of
the class, known as the class mark, to
represent the class.
Let f be the frequency of for each class

 x be the corresponding class mark


Example:
Circumference (cm) 1 - 2 3-4 5-6 7-8

No. of branches, f 2 10 14 8

Midpoint, x 1+2 3+4=3. 5+6=5.5 7+8=7.


=1.5 5 2 5
2 2 2
=(1.5)2 + (3.5)10 + (5.5)14 + (7.5)8 =
5.147
2 + 10 + 14 + 8
Median
 When the values of a set of data is arranged
either ascending or descending, the values
that lies in the middle is called median.

 If n (number of values) is an odd number


median = (n +1)

 2
If n is an even number, the median is the

mean of
n th value and n + 1 th value.

 2
Example: 1, 3, 6, 6, 6, 7, 8, 8, 12, 12, 17

median is (n +1) th= ( 11 + 1) th = 6 th = 7

 2 2
Mode
Mode is the value that occurs the most

frequently in a set of data.


Example : 1, 3, 6, 6, 6, 7, 8, 8, 12, 12, 17


Mode = 6
Modal Class of Grouped Data
 The class from grouped data that have
the highest frequency is known as the
modal class. Example:
Circumferenc

10-19 20-29 30-39 40-49
e of trees 14
No. 19 23 10
(cm)

Q: Determine the modal class.


A: The class having the highest frequency

is 30-39 cm. Hence, the modal class is 30-


39 cm.

Finding the mode from
histogram
 If a distribution of a set grouped data is
represented by a histogram, we can
estimate the mode of distribution from
the histogram.

Mode
Steps to find mode from
histogram.
1.Determine the
modal class in
the histogram.
Modal class
2.Join the top vertices
of the modal
class to the
vertices of the
adjacent.
3.Determine the value
of the horizontal
axis at the Mode
intersection of
the two lines.
This value
obtained
Median from cumulative
frequency distribution table
The median of grouped set of data can be
calculated from cumulative frequency
distribution table using the following formula:
Median:

Where
L = lower boundaries where the median lies

N= total frequency

F= cumulative frequency before the class in

which the median lies


C= class interval

f
m= frequency of the class where the median
Circumference (cm) 1 - 2 3-4 5- 6 7-8
No. of branches, f 2 10 14 8
Cumulative 2 12 26 34
frequency, F
Calculate the median.
The median lies between 17th and 18th

branches, which is within the class 5-6.


The cumulative frequency before the class

5-6, F=12
Lower boundary, L=4.5

Class interval, C= 2. Frequency of the class

fm= 14


Estimating the median of
grouped data from an ogive.
We can use ogive to estimate the median

of grouped data.

Median
Effects of uniform changes in
data on mean, median and
mode
Activity: To find the effects on the measures of
central tendency if every value of the data is
changed uniformly
Given 4, 12, 5, 7, 9, 6, 10, 6, 13

a)Find the mean, median, and mode


b)Add 3 to every value. Find the mean,
median, and mode and compare with the
answer in (a)
c)Multiple 2 to every value. Find the mean,
median, and mode and compare with the
answer in (a)
d)Discuss the findings.
e)
Effects of uniform changes in
data on mean, median and
mode
From the activity, we find that if u is the
original measures of central tendency,
and v is the new value after a uniform
change in every value of the data, then
we have
 v = u + k if k is added to every value
of the data.
 v = cu if every value of the data is
multiplied by c.
Extreme values in the data

 30, 28, 120, 43, 35, 9


 The numbers 9 and 120 are considered


extreme values in the set of data.
 Affect significantly to mean of the data
 But little or no effect on median and
mode.
Effect of adding or removing a
value from a set of data.
 Uncertain
 Generally, the mean or median will be
shifted to a higher value when some
values greater than original mean or
median are added or some values
smaller than the original mean or
median are removed.
 But, the mean, mode or median will
remain unchanged if the value added or
removed is equal to the corresponding
measure of central tendency.
Determining the most suitable
measure of central tendency.
 Quenstion: Determine the most suitable
measure of central tendency for these set
of data.

1.42, 30, 39, 40, 35, 30


Mean = 36, Median = 37, Mode = 30
2.42, 30, 39, 40, 35, 30, 120
Mean = 48, Median = 39, Mode = 30

Answer:

1.Mean and median


2.Median
Determining the most suitable
measure of central tendency.
 Mean is widely used because all the values
in a set of data are taken into account.
 But, if a set of data contains extreme values,
the mean could deviate from its central
tendency and may not represent the
general characteristics of the data.
 Under such circumstances, median will be a
better representative measure of central
tendency because median eliminates the
effects of extreme values in the set of
data.
 Mode is usually used to represent a set of
data containing a large number of values
which take only some specific values and
many repeated values.
Range
Range of ungrouped data

 = largest value – smallest value


Example:
Find the range of data.

2, 4, 6, 7, 10, 15, 16, 19, 20, 21, 24

Answer: Range = 24 – 2 = 22

Interquartile range of
ungrouped data
Quartiles are three values in a set of data

which divide the data into four quarters


with each quarter having the same
number of values.

Example:

 2, 4, 6, 7, 10, 15, 16, 19, 20,


2
21, 24
2 2 2
 numbers numbers numbers numbers
Q1 = lower Q2 = median Q3 = upper
 quartile quartile

Interquartile range = Q3 – Q1 = 20 – 6 = 14

Range of grouped data
Range of grouped data
 = largest class mark – smallest class
Number 0 – 4 5–9 10 – 14 15 – 19 20 – 24 25 mark
- 29
of goals 12
Number 9 4 3 1 1 

of

players

Largest class mark =


 = 27 goals
Smallest class mark =
 = 2 goals
Hence, the range = 27- 2 = 25 goals

Interquartile range of grouped
data from cumulative
frequency table.
Construct the cumulative frequency table

first.
Number 20-29 30-39 40-49 50-59 60-69 70-79
of
Frequenc 2 5 11 16 10 4
durians
yCumulati 2 7 18 34 44 48

ve
frequenc

y
Q1

 Q3

Thus, Q1 lies within the class 40-49


 and Q3 lies within the class 60-69.


Cumulati 220-29
Number
Frequenc 7530-39 40-49
11
18 50-59
16
34 60-69
10
44 470-79
48
yof
ve
durians
frequenc
y

 Since the cumulative frequency before the


class 40-49 is 7, so Q1is the fifth value within
the class 40-49.

The size of the class 40-49 = 49.5-39.5 = 10


Assuming that the 11 values are distributed

evenly within the class.


Therefore, the size between the two values =


Cumulati 220-29
Number
Frequenc 7530-39 40-49
11
18 50-59
16
34 60-69
10
44 470-79
48
yof
ve
durians
frequenc
y

Since the cumulative frequency before the


class 60-69 is 34, so Q3 is the second
value within the class 60-69.

The size of the class 60-69 = 59.5-69.5 =


10
Hence, Q
3

Therefore, the interquartile range


Determining the interquartile
range of grouped data from an
ogive.
We are given an
ogive.

Q1 =
= 10th value
= 24.5 mm
Q3 =
= 30th value
= 31.5 mm
24.5 Hence, the
31.5
interquartile range
= 31.5-24.5
= 7.0 mm
Variance
 Variance of ungrouped data
 or where

qVariance of grouped data


 where f = frequency of each
class
 x = class mark

Standard deviation
 Standard deviation is a statistical
measurement which measure how
much the values in a set of data are
scattered around the mean. It is defined
as the positive square root of the
variance.

Standard deviation is a good measure of


dispersion because it has the same unit


as the values of the data whereas
Example:
Time Class f fx x2 fx2
5-9
taken 7mark,x 8 56 49 392
10-14 12 15 180 144 2160
15-19 17 4 68 289 1156
20-24 22 3 66 484 1452

Mean;
Varianc
e;
s.d.;
Effects on measures of
dispersion when:
 If every value of the data is changed
uniformly, i.e. when every value n a
set of data is multiplied by a constant
quantity k, then we have
 new range = k x original range
new interquartile range= k x ori.

interquartile range
 new s.d. = k x ori. standard
deviation
 new variance = k2 x original
variance
Effects on measures of
dispersion when:
 If there are extreme values in the
set of data, this will significantly
increase the range of the set of data but
have little or no effect on the interquartile
range.
 Extreme values also significantly increase
the value of standard deviation and
variance but s.d. Is affected to a smaller
degree as compared to variance.
 Hence, interquartile range will eliminates the
effect of extreme values.
 Other measures of dispersion are affected at
different degree by extreme values.
Effects on measures of
dispersion when:
 If certain values are added or removed
from a set of data, the effect on the
measures of dispersion is uncertain.
 In general, the range and the interquartile
range are less affected as compared to
the variance and the standard deviation.
 Variance and standard deviation are more
significantly affected when the added or
removed value has a greater difference
from the mean
Comparing the measures of
central tendency and
dispersion
 There are many situations where we need
to compare two or more sets of data
and subsequently make a conclusion.
 The measure of central tendency may not
provide us with enough information for
comparison.
 We need to determine the measures of
dispersion of a set of data to provide us
with a better picture of the
characteristics of the set of data and
eventually help us arrive at a more
meaningful and acceptable conclusion.
A teacher would like to select one of
Team Marks these teams to represent the school
P 60, 65, 85, 76, 64, 88 in a Mathematics quiz. The teacher is
more concerned about a steady
Q 68, 62, 76, 80, 81, 71 performance of the team in the quiz.
Which team would the teacher select?
 Team P  Team Q

= =
= 10.74 = 6.73
Both teams have the same mean, meaning that they are
considered equally good statistically. However, team Q
has a lower s . d . , implying a small difference in the
performances between the members of the team. Hence, they
are expected to have a more consistent performance
in the quiz as compared to team P. Therefore, the teacher
would select team Q.
THE END
Thank you for your patience and

attention.

Das könnte Ihnen auch gefallen