CHAPTER 1 Descriptive Analysis Animate

17/09/2017
BFC34302
Civil Engineering Statistics
CHAPTER 1
Review on Descriptive Statistics
Assc. Prof. Dr. Munzilah Md. Rohani
Statistics is the science that deals

with collecting, classifying,
presenting, describing, analyzing and
interpreting data
Enable us to draw conclusions and
making reasonable decisions
1
17/09/2017
Categories of statistics
Descriptive statistics Inferential statistics
The activities of collecting, The part dealing with
classifying, presenting and
describing quantitative data technique and method of
interpretation of the
Methods for organizing
(frequency table), results obtained from the
representing (graphs) and descriptive statistics
summarizing data (central
tendency and variability).
Numerical descriptions of
center, variability, position
(quantitative variables
Statistical terminology
Descriptive statistics are numbers that are
used to summarize and describe data.
Data" refers to the information that has been
collected from an experiment, a survey, a
historical record.
Statistics Terminology
Terminology Description
POPULATION Population is the entire (complete) collection of data
whose properties are analyzed. It contains all the
subjects of interest.
Can be of any size, its items need not be uniform but
must share at least one measurable feature.
SAMPLE A portion of population selected for study

Sample is any set of entities, cases, subjects, items or
experimental units chosen from the population
RANDOM SAMPLE A random sample is a sample selected in such a way that

each element of the population has the same chance of
being selected
PARAMETER Parameter is a numerical measurement describing

some characteristics of a population.
Eg: The population mean , variance
2
17/09/2017
STATISTIC Statistic is a numerical measurement describing some
characteristics of a sample
Eg: The sample mean ,variance
VARIABLE Any measured characteristic or attribute that differs for

different elements
For example, if the weight of 30 subjects were measured,
then weight would be a variable.
Can be classified as quantitative or qualitative
QUANTITATIVE The variable being studied is numeric

VARIABLE measured on an ordinal, interval, or ratio scale
eg: If the time it took them to respond were measured, then
the variable would be quantitative.
QUALITATIVE The variable being studied is non-numeric

VARIABLE Called "categorical variables
Measured on a nominal scale
eg: gender, educational level, eye colour
DATA A set of data is a collection of observation, measurements or
information obtained
Can be classified as quantitative or qualitative
Can be presented in various ways
QUANTITATIVE Quantitative data refers to observations which can be
DATA measured numerically or counted
Can be divided into discrete data and continuous data
eg: length, time, temperature and mass
QUALITATIVE Qualitative data are not in numerical form but instead
DATA assigned as attributes
eg: race, marital status, age, gender
3
17/09/2017
Examples:
Determine whether the data obtained is discrete
or continuous data.
(a) The number of books sold by a stationary

shop.
(b) The time taken to travel from Kuala
Terengganu to Batu Pahat
(c) The number of As in SPM
(d) The weight of FKAAS students
(e) The diameter of twenty spheres
Statistical measure that determines a single value that accurately describes the
center of the distribution and represents the entire distribution of scores.
The goal of central tendency is to identify the single value that is the best
representative for the entire set of data
4
17/09/2017
Understand the data :

Data distribution
Outliers
Central tendency
5
17/09/2017
Raw Data
6
17/09/2017
DESCRIPTIVE STATISTICS FOR

UNGROUPED DATA
DESCRIPTIVE STATISTICS FOR UNGROUPED DATA

30 observation = 30 data / numbers
If there were 30 speed observation on
the road, then you had all 30 numbers
available to you. When you are trying
to solve a problem by analyzing data,
this is the best situation to be in. You
have what is known as raw or
raw or ungrouped data.
ungrouped data.
Individual experiments are known as

raw or grouped data.
Sometimes you do not have
access to the individual
observations. This may occur for
confidentiality reasons or
sometimes you have not
collected the data yourself.
All data are to be considered as sample

unless otherwise stated in the questions.
DESCRIPTIVE STATISTICS FOR UNGROUPED DATA
7
17/09/2017
Example :
Find the mean of the following data
14 2 0 2 3 3 2 1 4 5 2 1 2 0 1 2 3 1 2
Solution:
x i
1 4 2 ... 3 1 2
x i 1

n 20
41
2.05
20
OR
x 0 1 2 3 4 5
f 2 5 7 3 2 1
f x i i
2(0) 5(1) 7(2) 3(3) 2(4) 1(5)
x i1
k 20
f
i1
i 2.05
Example :
To obtain grade A, Saleha must achieve an

average of at least 75 marks in four tests. If
her average mark for the first three tests is
70, calculate the lowest mark she must get
in her fourth test in order to obtain grade A.
8
17/09/2017
Solution:
Let the four tests : w,x,y,z

Mean for w,x,y : 70
Mean for w,x,y,z :
3(70) z
75
4
210 z So, the lowest
75 mark she must
4 get in her
fourth test in
210 z 300 order to obtain
z 90 grade A is 90
9
17/09/2017
b) The data arranged in ascending order :

2.71 , 3.56 , 4.35 , 5.48 , 6.22 , 8.61
Since n = 6 , which is even, thus the
median is
1
xm x 6 x 6
2 2 2 1

1
x x 4
2 3
1

2
4.35 5.48
4.915
10
17/09/2017
MODE
The mode of a set of data is the value

that occurs most frequently.
The mode may not be unique or they may

be no mode at all.
11
17/09/2017
Example :
Find the mode for the following set of data
a) 2, 3, 3, 4, 5, 28, 5, 5
b) 2, 3, 5, 8, 10
c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.7, 0.7, 0.7, 0.5
12
17/09/2017
PERCENTILES
Percentiles divide a set of data which are
arranged in ascending order into 100 equal parts.
To find percentile ( Pk ):
k
Let r n
100
where : n number of observations
k percentile for Pk
(i) If r is an integer:
1
Pk r th observation (r 1)th observation
2
(ii) If r is not an integer, then round up to the
next integer.
MEASURES OF RELATIVE STANDING
The percentile rank, P, can also be found by
1
b e
P 2
n
b= the number of data values below the value of
interest
e= the number of data values equal to the value of
interest
n= the sample size
Dr. Serhat Eren 38
Quartile & percentile
13
17/09/2017
Quartile vs Percentile
Example :
Find the median, first quartile (Q1 ) ,third
quartile ( Q3 ) and 40th percentile ( P40 ) for the
following sets of data
a) 21, 24, 17, 28, 36, 20, 32
b) 3.5, 2.7, 5.4, 8.6, 4.3, 6.2, 9.9, 7.6
Solution:
a) The data arranged in ascending order :
17 , 20 , 21 , 24 , 28 , 32 , 36
Median Q2
k 2
r n 7 3.5 (not an integer )
4 4
Median Q2 4th observation 24
14
17/09/2017
First quartile Q1
k 1
4 4
Q1 2th observation 20
Third quartile Q3
k 3
4 4
Q3 6th observation 32
40th percentile P40
k 40
100 100
P40 3rd observation 21
MEASURES OF RELATIVE STANDING
The percentile rank, P, is then found by
1
b e 1
b e
P 2 P 2
n
n
b= the number of data values below the value of
interest
e= the number of data values equal to the value of
interest
n= the sample size
44
Example :
The following table shows the marks

obtained by 30 students in a Mathematics
quiz, where the maximum marks is 10.
Marks 2 3 4 5 6 7 8 9 10
No. of 2 4 3 6 4 5 4 1 1
students
Find the mean, mode, median, first and

third quartiles, interquartile range and
the 60th percentile.
15
17/09/2017
GROUPED DATA
If you are using secondary data, and much of the data
published on the Web or any report are unavailable as raw
data. Thus, often the only thing available to you is what is
known as grouped data.
Grouped data are data that are

available only as a frequency
distribution. The individual
observations are not accessible.
DESCRIPTIVE STATISTICS FOR GROUPED DATA

Measures of the Center for Grouped Data
There are three measures of the center:
the mean,
the median, and
the mode.
First consider how to estimate the mean of the
data set when you have grouped data.
For example, consider the amount of time, in
minutes, people travel from point A to point B.
The traffic engineer is interested in the center or
the typical length of time that the people
reached the destination B from A. She has only
the following frequency table from 32
observations:

Time Frequency for grouped data you cannot
sum the actual data because you
don't have them. So, you have to
estimate what the values might
25.0 < x 35.0 5 sum to for each interval
35.0 < x 45.0 2
45.0 < x 55.0 4 Consider the 5 observations
that fall in the first interval
55.0 < x 65.0 3 between 25 and 35 minutes.
65.0 < x 75.0 11 We need a way to estimate
75.0 < x 85.0 3 the sum of those 5 values to
begin our estimation of the
85.0 < x 95.0 4 mean
use the middle of the
interval as our best "guess"
Remember that to of the actual values m the
class. So, you must first find
calculate the mean you the midpoint of each class.
sum all the data and multiply the midpoint of 30
divide by the sample size. by the frequency of 5 to get
the contribution to the sum
for that interval
16
17/09/2017
This procedure is summarized in the steps

below.
It gives you a good estimate of the mean
when the data are in fact evenly spread out
throughout the interval.
Step 1. Find the midpoint of each class. Call it mj .

Step 2. Multiply the midpoint by the class frequency, fj, to
yield fjmj.
Step 3. Add up all the interval sums found in step 2.
Step 4. Divide the sum from step 3 by the sample size, n.
Note that the sample size is the sum of all the
frequencies.
50
The formula for estimating the mean from

grouped data is thus
X
j
f jm j
n
51
17
17/09/2017
52
Median is the data value of the middle

observation in an ordered set of data; thus it
is the value at or below which half (50%) of
the data values fall.
To find the median for grouped data we need

to find the midpoint of the interval that
contains the data value whose cumulative
relative frequency is 0.50.
53
Measures of the Center for Grouped Data
The mode is the data value

that has the highest
frequency of occurrence in
the sample.
The estimate of the mode is

then the midpoint of the
modal class.
54
18
17/09/2017
Step 1. Find the midpoint of each class. Call it mj.

Step 2. Subtract the estimate of the sample
mean, from each class midpoint. Square the
difference.
Step 3. Multiply the result of step 2 by the class
frequency.
Step 4. Add up the results of step 3 for all classes.
Step 5. Divide the sum from step 4 by one less
than the sample size, n - 1.
55
Example from ungrouped data (raw data)
Dr. Serhat Eren 57
19
17/09/2017
Dr. Serhat Eren 58
Example from grouped data
A total of 10,000 people visited the shopping

mall and park their vehicle over 12 hours:
a) Estimate the 30th percentile (when 30% of

the visitors had arrived).
b) Estimate what percentile of visitors had
arrived after 11 hours.
Try this way!!!
20
17/09/2017

Measures of Dispersion for Grouped Data
Clearly with grouped data the sample range can be estimated by
taking the difference between the upper value of the last class
and the lower value of the first class.
In order to adapt the formula for the sample variance for use
with grouped data, we need to take the same approach that we
used for estimating the sample mean for grouped data.
In particular, we need to adapt the formula for the sample

variance shown below to accommodate the fact that we no
longer have the individual data values represented by xi in the
formula

n
i 1
( xi x ) 2
s2
n 1
61
The following formula and steps for

estimating the sample variance for grouped
data.
midpoint of each class
mean, from each class midpoint
s 2

(m
1 j x ) 2 fj
n 1
62
21
17/09/2017
Example :
Calculate the variance and standard deviation
for the following sets of sample data. Hence,
determine which data is more disperse about
the mean.
Set 1 : 16,10,9,2,5,2,7
Set 2 : 10,32,8,12,14,36,20,8,40,4,32,1
For Data 1:
Data 1 : 16,10,9,2,5,2,7
n
2
x x2 n xi
X2 i 1

2 4
i 1
i
n
2 4

5 25 S2
7 49 n 1
9 81
51
2
10 100 519
7 24.571849
16 256 6
n
X 51 X
n
2
519
S 24.571849 4.957
i
i1 i
i1
For Data 2:
Data 2 : 10,32,8,12,14,36,20,8,40,4,32,1
n
2
n n
n x i X 217 X 2
5929
X2 i 1 i i

i 1 i n
i1 i1

S
2
n 1
217
2
5929
12 182.265 Hence, data 2 is
11 more disperse
than data 1
S 182.265 13.5
22
17/09/2017
23
17/09/2017
Example :
Marks of a recent Mathematics test are as given
below:
73, 42, 67, 78, 99, 84, 91, 82, 86, 94
Based on the marks given:
(a) Construct a stem-and-leaf diagram.

(b) What is the highest and lowest mark?
(c) Interpret the distribution.
24
17/09/2017
BOX-AND-WHISKER PLOTS
25
17/09/2017
26
17/09/2017
27

CHAPTER 1 Descriptive Analysis Animate

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

CHAPTER 1 Descriptive Analysis Animate

Hochgeladen von

Copyright:

Verfügbare Formate

17/09/2017

Statistics is the science that deals

SAMPLE A portion of population selected for study

RANDOM SAMPLE A random sample is a sample selected in such a way that

PARAMETER Parameter is a numerical measurement describing

VARIABLE Any measured characteristic or attribute that differs for

QUANTITATIVE The variable being studied is numeric

QUALITATIVE The variable being studied is non-numeric

(a) The number of books sold by a stationary

Understand the data :

DESCRIPTIVE STATISTICS FOR

DESCRIPTIVE STATISTICS FOR UNGROUPED DATA

Individual experiments are known as

All data are to be considered as sample

DESCRIPTIVE STATISTICS FOR UNGROUPED DATA

To obtain grade A, Saleha must achieve an

Let the four tests : w,x,y,z

b) The data arranged in ascending order :

The mode of a set of data is the value

The mode may not be unique or they may

c) 0.2, 0.4, 0.4, 0.4, 0.5, 0.7, 0.7, 0.7, 0.5

MEASURES OF RELATIVE STANDING

The percentile rank, P, can also be found by

Dr. Serhat Eren 38

Quartile & percentile

MEASURES OF RELATIVE STANDING

The percentile rank, P, is then found by

The following table shows the marks

Find the mean, mode, median, first and

Grouped data are data that are

DESCRIPTIVE STATISTICS FOR GROUPED DATA

DESCRIPTIVE STATISTICS FOR GROUPED DATA

DESCRIPTIVE STATISTICS FOR GROUPED DATA

This procedure is summarized in the steps

Step 1. Find the midpoint of each class. Call it mj .

DESCRIPTIVE STATISTICS FOR GROUPED DATA

The formula for estimating the mean from

DESCRIPTIVE STATISTICS FOR GROUPED DATA

Median is the data value of the middle

To find the median for grouped data we need

DESCRIPTIVE STATISTICS FOR GROUPED DATA

Measures of the Center for Grouped Data

The mode is the data value

The estimate of the mode is

DESCRIPTIVE STATISTICS FOR GROUPED DATA

Step 1. Find the midpoint of each class. Call it mj.

Example from ungrouped data (raw data)

Dr. Serhat Eren 57

Dr. Serhat Eren 58

Example from grouped data

A total of 10,000 people visited the shopping

a) Estimate the 30th percentile (when 30% of

Try this way!!!

DESCRIPTIVE STATISTICS FOR GROUPED DATA

In particular, we need to adapt the formula for the sample

DESCRIPTIVE STATISTICS FOR GROUPED DATA

The following formula and steps for

(a) Construct a stem-and-leaf diagram.

Das könnte Ihnen auch gefallen