Sie sind auf Seite 1von 27

17/09/2017

BFC34302
Civil Engineering Statistics
CHAPTER 1
Review on Descriptive Statistics
Assc. Prof. Dr. Munzilah Md. Rohani

Statistics is the science that deals


with collecting, classifying,
presenting, describing, analyzing and
interpreting data
Enable us to draw conclusions and
making reasonable decisions

1
17/09/2017

Categories of statistics
Descriptive statistics Inferential statistics
The activities of collecting, The part dealing with
classifying, presenting and
describing quantitative data technique and method of
interpretation of the
Methods for organizing
(frequency table), results obtained from the
representing (graphs) and descriptive statistics
summarizing data (central
tendency and variability).
Numerical descriptions of
center, variability, position
(quantitative variables

Statistical terminology
Descriptive statistics are numbers that are
used to summarize and describe data.
Data" refers to the information that has been
collected from an experiment, a survey, a
historical record.

Statistics Terminology
Terminology Description
POPULATION Population is the entire (complete) collection of data
whose properties are analyzed. It contains all the
subjects of interest.
Can be of any size, its items need not be uniform but
must share at least one measurable feature.

SAMPLE A portion of population selected for study


Sample is any set of entities, cases, subjects, items or
experimental units chosen from the population

RANDOM SAMPLE A random sample is a sample selected in such a way that


each element of the population has the same chance of
being selected

PARAMETER Parameter is a numerical measurement describing


some characteristics of a population.
Eg: The population mean , variance

2
17/09/2017

Statistics Terminology
Terminology Description
STATISTIC Statistic is a numerical measurement describing some
characteristics of a sample
Eg: The sample mean ,variance

VARIABLE Any measured characteristic or attribute that differs for


different elements
For example, if the weight of 30 subjects were measured,
then weight would be a variable.
Can be classified as quantitative or qualitative

QUANTITATIVE The variable being studied is numeric


VARIABLE measured on an ordinal, interval, or ratio scale
eg: If the time it took them to respond were measured, then
the variable would be quantitative.

QUALITATIVE The variable being studied is non-numeric


VARIABLE Called "categorical variables
Measured on a nominal scale
eg: gender, educational level, eye colour

Statistics Terminology
Terminology Description
DATA A set of data is a collection of observation, measurements or
information obtained
Can be classified as quantitative or qualitative
Can be presented in various ways
QUANTITATIVE Quantitative data refers to observations which can be
DATA measured numerically or counted
Can be divided into discrete data and continuous data
eg: length, time, temperature and mass
QUALITATIVE Qualitative data are not in numerical form but instead
DATA assigned as attributes
eg: race, marital status, age, gender

3
17/09/2017

Statistics Terminology

Examples:
Determine whether the data obtained is discrete
or continuous data.

(a) The number of books sold by a stationary


shop.
(b) The time taken to travel from Kuala
Terengganu to Batu Pahat
(c) The number of As in SPM
(d) The weight of FKAAS students
(e) The diameter of twenty spheres

Statistical measure that determines a single value that accurately describes the
center of the distribution and represents the entire distribution of scores.

The goal of central tendency is to identify the single value that is the best
representative for the entire set of data

4
17/09/2017

Understand the data :


Data distribution

Outliers

Central tendency

5
17/09/2017

Raw Data

6
17/09/2017

DESCRIPTIVE STATISTICS FOR


UNGROUPED DATA

DESCRIPTIVE STATISTICS FOR UNGROUPED DATA


30 observation = 30 data / numbers
If there were 30 speed observation on
the road, then you had all 30 numbers
available to you. When you are trying
to solve a problem by analyzing data,
this is the best situation to be in. You
have what is known as raw or
raw or ungrouped data.
ungrouped data.

Individual experiments are known as


raw or grouped data.
Sometimes you do not have
access to the individual
observations. This may occur for
confidentiality reasons or
sometimes you have not
collected the data yourself.

All data are to be considered as sample


unless otherwise stated in the questions.

DESCRIPTIVE STATISTICS FOR UNGROUPED DATA

7
17/09/2017

Example :
Find the mean of the following data

14 2 0 2 3 3 2 1 4 5 2 1 2 0 1 2 3 1 2

Solution:

x i
1 4 2 ... 3 1 2
x i 1

n 20
41
2.05
20

OR

x 0 1 2 3 4 5
f 2 5 7 3 2 1

f x i i
2(0) 5(1) 7(2) 3(3) 2(4) 1(5)
x i1
k 20
f
i1
i 2.05

Example :

To obtain grade A, Saleha must achieve an


average of at least 75 marks in four tests. If
her average mark for the first three tests is
70, calculate the lowest mark she must get
in her fourth test in order to obtain grade A.

8
17/09/2017

Solution:

Let the four tests : w,x,y,z


Mean for w,x,y : 70
Mean for w,x,y,z :
3(70) z
75
4
210 z So, the lowest
75 mark she must
4 get in her
fourth test in
210 z 300 order to obtain
z 90 grade A is 90

9
17/09/2017

b) The data arranged in ascending order :


2.71 , 3.56 , 4.35 , 5.48 , 6.22 , 8.61
Since n = 6 , which is even, thus the
median is
1
xm x 6 x 6
2 2 2 1

1
x x 4
2 3
1

2
4.35 5.48
4.915

10
17/09/2017

MODE

The mode of a set of data is the value


that occurs most frequently.

The mode may not be unique or they may


be no mode at all.

11
17/09/2017

Example :
Find the mode for the following set of data

a) 2, 3, 3, 4, 5, 28, 5, 5

b) 2, 3, 5, 8, 10

c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.7, 0.7, 0.7, 0.5

12
17/09/2017

PERCENTILES
Percentiles divide a set of data which are
arranged in ascending order into 100 equal parts.
To find percentile ( Pk ):
k
Let r n
100
where : n number of observations
k percentile for Pk
(i) If r is an integer:
1
Pk r th observation (r 1)th observation
2
(ii) If r is not an integer, then round up to the
next integer.

MEASURES OF RELATIVE STANDING

The percentile rank, P, can also be found by

1
b e
P 2
n
b= the number of data values below the value of
interest
e= the number of data values equal to the value of
interest
n= the sample size

Dr. Serhat Eren 38

Quartile & percentile

13
17/09/2017

Quartile vs Percentile

Example :
Find the median, first quartile (Q1 ) ,third
quartile ( Q3 ) and 40th percentile ( P40 ) for the
following sets of data
a) 21, 24, 17, 28, 36, 20, 32
b) 3.5, 2.7, 5.4, 8.6, 4.3, 6.2, 9.9, 7.6
Solution:
a) The data arranged in ascending order :
17 , 20 , 21 , 24 , 28 , 32 , 36
Median Q2
k 2
r n 7 3.5 (not an integer )
4 4
Median Q2 4th observation 24

14
17/09/2017

First quartile Q1
k 1
r n 7 1.75 (not an integer )
4 4
Q1 2th observation 20
Third quartile Q3
k 3
r n 7 5.25 (not an integer )
4 4
Q3 6th observation 32
40th percentile P40
k 40
r n 7 2.8 (not an integer )
100 100
P40 3rd observation 21

MEASURES OF RELATIVE STANDING

The percentile rank, P, is then found by

1
b e 1
b e
P 2 P 2
n
n
b= the number of data values below the value of
interest
e= the number of data values equal to the value of
interest
n= the sample size

44

Example :

The following table shows the marks


obtained by 30 students in a Mathematics
quiz, where the maximum marks is 10.

Marks 2 3 4 5 6 7 8 9 10
No. of 2 4 3 6 4 5 4 1 1
students

Find the mean, mode, median, first and


third quartiles, interquartile range and
the 60th percentile.

15
17/09/2017

GROUPED DATA
If you are using secondary data, and much of the data
published on the Web or any report are unavailable as raw
data. Thus, often the only thing available to you is what is
known as grouped data.

Grouped data are data that are


available only as a frequency
distribution. The individual
observations are not accessible.

DESCRIPTIVE STATISTICS FOR GROUPED DATA


Measures of the Center for Grouped Data
There are three measures of the center:
the mean,
the median, and
the mode.
First consider how to estimate the mean of the
data set when you have grouped data.
For example, consider the amount of time, in
minutes, people travel from point A to point B.
The traffic engineer is interested in the center or
the typical length of time that the people
reached the destination B from A. She has only
the following frequency table from 32
observations:

DESCRIPTIVE STATISTICS FOR GROUPED DATA


Time Frequency for grouped data you cannot
sum the actual data because you
don't have them. So, you have to
estimate what the values might
25.0 < x 35.0 5 sum to for each interval
35.0 < x 45.0 2
45.0 < x 55.0 4 Consider the 5 observations
that fall in the first interval
55.0 < x 65.0 3 between 25 and 35 minutes.
65.0 < x 75.0 11 We need a way to estimate
75.0 < x 85.0 3 the sum of those 5 values to
begin our estimation of the
85.0 < x 95.0 4 mean
use the middle of the
interval as our best "guess"
Remember that to of the actual values m the
class. So, you must first find
calculate the mean you the midpoint of each class.
sum all the data and multiply the midpoint of 30
divide by the sample size. by the frequency of 5 to get
the contribution to the sum
for that interval

16
17/09/2017

DESCRIPTIVE STATISTICS FOR GROUPED DATA

This procedure is summarized in the steps


below.
It gives you a good estimate of the mean
when the data are in fact evenly spread out
throughout the interval.

Step 1. Find the midpoint of each class. Call it mj .


Step 2. Multiply the midpoint by the class frequency, fj, to
yield fjmj.
Step 3. Add up all the interval sums found in step 2.
Step 4. Divide the sum from step 3 by the sample size, n.
Note that the sample size is the sum of all the
frequencies.

50

DESCRIPTIVE STATISTICS FOR GROUPED DATA

The formula for estimating the mean from


grouped data is thus

X
j
f jm j
n

51

17
17/09/2017

52

DESCRIPTIVE STATISTICS FOR GROUPED DATA

Median is the data value of the middle


observation in an ordered set of data; thus it
is the value at or below which half (50%) of
the data values fall.

To find the median for grouped data we need


to find the midpoint of the interval that
contains the data value whose cumulative
relative frequency is 0.50.

53

DESCRIPTIVE STATISTICS FOR GROUPED DATA

Measures of the Center for Grouped Data

The mode is the data value


that has the highest
frequency of occurrence in
the sample.

The estimate of the mode is


then the midpoint of the
modal class.

54

18
17/09/2017

DESCRIPTIVE STATISTICS FOR GROUPED DATA

Step 1. Find the midpoint of each class. Call it mj.


Step 2. Subtract the estimate of the sample
mean, from each class midpoint. Square the
difference.
Step 3. Multiply the result of step 2 by the class
frequency.
Step 4. Add up the results of step 3 for all classes.
Step 5. Divide the sum from step 4 by one less
than the sample size, n - 1.

55

Example from ungrouped data (raw data)

Dr. Serhat Eren 57

19
17/09/2017

Dr. Serhat Eren 58

Example from grouped data

A total of 10,000 people visited the shopping


mall and park their vehicle over 12 hours:

a) Estimate the 30th percentile (when 30% of


the visitors had arrived).
b) Estimate what percentile of visitors had
arrived after 11 hours.

Try this way!!!

20
17/09/2017

DESCRIPTIVE STATISTICS FOR GROUPED DATA


Measures of Dispersion for Grouped Data
Clearly with grouped data the sample range can be estimated by
taking the difference between the upper value of the last class
and the lower value of the first class.
In order to adapt the formula for the sample variance for use
with grouped data, we need to take the same approach that we
used for estimating the sample mean for grouped data.

In particular, we need to adapt the formula for the sample


variance shown below to accommodate the fact that we no
longer have the individual data values represented by xi in the
formula


n
i 1
( xi x ) 2
s2
n 1
61

DESCRIPTIVE STATISTICS FOR GROUPED DATA

The following formula and steps for


estimating the sample variance for grouped
data.
midpoint of each class
mean, from each class midpoint

s 2

(m
1 j x ) 2 fj
n 1

62

21
17/09/2017

Example :
Calculate the variance and standard deviation
for the following sets of sample data. Hence,
determine which data is more disperse about
the mean.

Set 1 : 16,10,9,2,5,2,7
Set 2 : 10,32,8,12,14,36,20,8,40,4,32,1

For Data 1:

Data 1 : 16,10,9,2,5,2,7
n
2

x x2 n xi
X2 i 1

2 4
i 1
i
n
2 4

5 25 S2
7 49 n 1
9 81
51
2

10 100 519
7 24.571849
16 256 6
n

X 51 X
n
2
519
S 24.571849 4.957
i
i1 i
i1

For Data 2:

Data 2 : 10,32,8,12,14,36,20,8,40,4,32,1
n
2
n n
n x i X 217 X 2
5929
X2 i 1 i i

i 1 i n
i1 i1



S
2

n 1

217
2

5929
12 182.265 Hence, data 2 is
11 more disperse
than data 1
S 182.265 13.5

22
17/09/2017

23
17/09/2017

Example :
Marks of a recent Mathematics test are as given
below:
73, 42, 67, 78, 99, 84, 91, 82, 86, 94
Based on the marks given:

(a) Construct a stem-and-leaf diagram.


(b) What is the highest and lowest mark?
(c) Interpret the distribution.

24
17/09/2017

BOX-AND-WHISKER PLOTS

25
17/09/2017

26
17/09/2017

27

Das könnte Ihnen auch gefallen