Beruflich Dokumente
Kultur Dokumente
BFC34302
Civil Engineering Statistics
CHAPTER 1
Review on Descriptive Statistics
Assc. Prof. Dr. Munzilah Md. Rohani
1
17/09/2017
Categories of statistics
Descriptive statistics Inferential statistics
The activities of collecting, The part dealing with
classifying, presenting and
describing quantitative data technique and method of
interpretation of the
Methods for organizing
(frequency table), results obtained from the
representing (graphs) and descriptive statistics
summarizing data (central
tendency and variability).
Numerical descriptions of
center, variability, position
(quantitative variables
Statistical terminology
Descriptive statistics are numbers that are
used to summarize and describe data.
Data" refers to the information that has been
collected from an experiment, a survey, a
historical record.
Statistics Terminology
Terminology Description
POPULATION Population is the entire (complete) collection of data
whose properties are analyzed. It contains all the
subjects of interest.
Can be of any size, its items need not be uniform but
must share at least one measurable feature.
2
17/09/2017
Statistics Terminology
Terminology Description
STATISTIC Statistic is a numerical measurement describing some
characteristics of a sample
Eg: The sample mean ,variance
Statistics Terminology
Terminology Description
DATA A set of data is a collection of observation, measurements or
information obtained
Can be classified as quantitative or qualitative
Can be presented in various ways
QUANTITATIVE Quantitative data refers to observations which can be
DATA measured numerically or counted
Can be divided into discrete data and continuous data
eg: length, time, temperature and mass
QUALITATIVE Qualitative data are not in numerical form but instead
DATA assigned as attributes
eg: race, marital status, age, gender
3
17/09/2017
Statistics Terminology
Examples:
Determine whether the data obtained is discrete
or continuous data.
Statistical measure that determines a single value that accurately describes the
center of the distribution and represents the entire distribution of scores.
The goal of central tendency is to identify the single value that is the best
representative for the entire set of data
4
17/09/2017
Outliers
Central tendency
5
17/09/2017
Raw Data
6
17/09/2017
7
17/09/2017
Example :
Find the mean of the following data
14 2 0 2 3 3 2 1 4 5 2 1 2 0 1 2 3 1 2
Solution:
x i
1 4 2 ... 3 1 2
x i 1
n 20
41
2.05
20
OR
x 0 1 2 3 4 5
f 2 5 7 3 2 1
f x i i
2(0) 5(1) 7(2) 3(3) 2(4) 1(5)
x i1
k 20
f
i1
i 2.05
Example :
8
17/09/2017
Solution:
9
17/09/2017
10
17/09/2017
MODE
11
17/09/2017
Example :
Find the mode for the following set of data
a) 2, 3, 3, 4, 5, 28, 5, 5
b) 2, 3, 5, 8, 10
12
17/09/2017
PERCENTILES
Percentiles divide a set of data which are
arranged in ascending order into 100 equal parts.
To find percentile ( Pk ):
k
Let r n
100
where : n number of observations
k percentile for Pk
(i) If r is an integer:
1
Pk r th observation (r 1)th observation
2
(ii) If r is not an integer, then round up to the
next integer.
1
b e
P 2
n
b= the number of data values below the value of
interest
e= the number of data values equal to the value of
interest
n= the sample size
13
17/09/2017
Quartile vs Percentile
Example :
Find the median, first quartile (Q1 ) ,third
quartile ( Q3 ) and 40th percentile ( P40 ) for the
following sets of data
a) 21, 24, 17, 28, 36, 20, 32
b) 3.5, 2.7, 5.4, 8.6, 4.3, 6.2, 9.9, 7.6
Solution:
a) The data arranged in ascending order :
17 , 20 , 21 , 24 , 28 , 32 , 36
Median Q2
k 2
r n 7 3.5 (not an integer )
4 4
Median Q2 4th observation 24
14
17/09/2017
First quartile Q1
k 1
r n 7 1.75 (not an integer )
4 4
Q1 2th observation 20
Third quartile Q3
k 3
r n 7 5.25 (not an integer )
4 4
Q3 6th observation 32
40th percentile P40
k 40
r n 7 2.8 (not an integer )
100 100
P40 3rd observation 21
1
b e 1
b e
P 2 P 2
n
n
b= the number of data values below the value of
interest
e= the number of data values equal to the value of
interest
n= the sample size
44
Example :
Marks 2 3 4 5 6 7 8 9 10
No. of 2 4 3 6 4 5 4 1 1
students
15
17/09/2017
GROUPED DATA
If you are using secondary data, and much of the data
published on the Web or any report are unavailable as raw
data. Thus, often the only thing available to you is what is
known as grouped data.
16
17/09/2017
50
X
j
f jm j
n
51
17
17/09/2017
52
53
54
18
17/09/2017
55
19
17/09/2017
20
17/09/2017
n
i 1
( xi x ) 2
s2
n 1
61
s 2
(m
1 j x ) 2 fj
n 1
62
21
17/09/2017
Example :
Calculate the variance and standard deviation
for the following sets of sample data. Hence,
determine which data is more disperse about
the mean.
Set 1 : 16,10,9,2,5,2,7
Set 2 : 10,32,8,12,14,36,20,8,40,4,32,1
For Data 1:
Data 1 : 16,10,9,2,5,2,7
n
2
x x2 n xi
X2 i 1
2 4
i 1
i
n
2 4
5 25 S2
7 49 n 1
9 81
51
2
10 100 519
7 24.571849
16 256 6
n
X 51 X
n
2
519
S 24.571849 4.957
i
i1 i
i1
For Data 2:
Data 2 : 10,32,8,12,14,36,20,8,40,4,32,1
n
2
n n
n x i X 217 X 2
5929
X2 i 1 i i
i 1 i n
i1 i1
S
2
n 1
217
2
5929
12 182.265 Hence, data 2 is
11 more disperse
than data 1
S 182.265 13.5
22
17/09/2017
23
17/09/2017
Example :
Marks of a recent Mathematics test are as given
below:
73, 42, 67, 78, 99, 84, 91, 82, 86, 94
Based on the marks given:
24
17/09/2017
BOX-AND-WHISKER PLOTS
25
17/09/2017
26
17/09/2017
27