Sie sind auf Seite 1von 35

Lecture # 1

INTRODUCTION TO
STATISTICS
Scope of the Course

To serve as a comprehensive introduction to


‘probability concepts’ and ‘statistical
methods & applications’ most likely to be
encountered and used by students in pursuit
of their careers in engineering

2
Population and Sample and Processes
 Engineers and Scientists are constantly exposed to the
collection of facts, or data.

 Statistics provide methods for organizing and


summarizing data and for drawing conclusions
based on data

 An investigation will typically focus on a well-defined


collection of objects constituting a population (e.g. all
graduating students of a University)

 If desired information is available for all objects in the


population, we have what is called census
3
Population and Sample and Processes
(cont...)
 Usually census is impractical and infeasible: Why?
Constraints on time, money and other scarce resources

4
Population and Sample and Processes
(cont...)
 Usually census is impractical and infeasible: Why?
Constraints on time, money and other scarce resources
 Instead, a subset of population – a sample is selected in
some prescribed manner (e.g. a randomly selected 50
students out of 500 graduates)
 In order to draw inferences/ conclusions about a
population, certain characteristics of the objects of
population are investigated: (e.g. age, gender, GPA – a
categorical or numerical variable)
 Variable is any characteristic whose value may change
from one object to another or from one place to another.
 Uni-variate , bi-variate and Multivariate data set

5
Univariate, Bivariate, and Multivariate
Data
 Depending on how many variables we are
measuring on the individuals or objects in our
sample, we will have one of the three following
types of data sets
◦ Univariate: Measurements made on only one variable
per observation.
◦ Bivariate: Measurements made on two variables per
observation.
◦ Multivariate: Measurements made on more than two
variables per observation.

6
Population and Sample
 Population: The entire collection of individuals or
measurement objects about which information is desired
e.g.Average height of 5-year old children in Pakistan

 Sample: A subset of the population selected for study.


Primary objective is to create a subset of population
whose center, spread and shape (data characteristics)
are as close as that of population; i.e., the sample should be
a true representative of entire population. There are many
methods of sampling. Random (simple or systematic)
sampling, stratified or cluster sampling etc.

 Random Sample: A simple random sample is a


subset/sample drawn from population in which every
member of the population have an equal chance of being
selected for the sample.

7
Population and Sample (Cont…)

 Why do we need randomness in sampling?


It reduces the possibility of subjective biaseness (e.g.
selectivity bias).
Mean and variance of a random sample is an unbiased
estimate of the population mean and variance
respectively.

8
Census and Inference
 Census: Complete enumeration of population units.

 Inference: We sample the population (in a manner to ensure


that the sample correctly represents the population) and then
take measurements on our sample and infer (or generalize)
back to the population.

Example: We may want to know the average height of all


adults (over 18 years old) in Pakistan. Our population is then
all adults over 18 years of age. If we were to census, we would
measure every adult and then compute the average. By using
statistics, we can take a random sample of adults over 18
years of age, measure their average height, and then infer that
the average height of the total population is ``close to'' the
average height of our sample.

9
Parameter and Statistic
 Parameter: Any statistical characteristic of a
population. Population mean, population median,
population standard deviation are examples of
parameters.

 Statistics: Any statistical characteristic of a


sample. Sample mean, sample median, sample
standard deviation are some examples of
statistics.

 Statistical Methods: Describing population


through census or making inference from sample
by estimating the value of the parameter using
statistic.

10
Some Differences between Population and Sample

POPULATION SAMPLE
Size Large Small
Size Notation N n
Easy to collect data? No Yes
Term used to describe A “parameter” A “statistic”
its nature
e.g., μ, σ e.g., x, s

11
Some Differences between Population and Sample
(Cont’d)

POPULATION SAMPLE
Mean (notation) μ x
Std Deviation σ s
(notation)
Mean (formula)
µ= ∑ x
x= ∑ x
N n
Variance (formula)
∑ (x − µ ) 2
=
∑ (x − x) 2

σ 2
= s2
n −1
N

12
Statistics!
What is it? What does it involve?
 The art or science of making confident conclusions about the
attributes of a system or collection of systems

 Involves:
-taking a small sample from a larger set (Sampling)
-analyzing data from the small sample (Data analysis)
-testing the hypotheses to ascertain if true (Hypothesis
Testing)
-making conclusions about the larger set (Statistical
Inference)
-presenting your findings to an audience (Information
Delivery)

13
Prelude to Statistics

As engineers we may be required to answer some


questions about a population, which is the
collection of all elements in a system.

However, we’ll find it impractical to use the entire


population for the investigation.

We will have to settle for a small part of the


population, called sample.

14
Some of such questions we may be required to answer as
civil engineer :

-What is the quality of aggregates at a certain quarry?


(Materials/Civil Engineering)

-What is the ratio of auto use to transit use


(Transportation Planning)

15
-What is the strength of concrete being used in
constructing a certain structure?
(Construction/Materials Engineering)

-What is the quality of water produced by a water


treatment plant? (Environmental Engineering)

- What has been the long-term settlement of


high-rise buildings in a City? (Geotechnical)

16
-
-How many of the steel I-sections provided by a certain
supplier have a lower-than-specified strength?
(Structural Engineering)

-What is the quality of water in a water reservoir?


(Environmental Engineering)

17
Therefore:

… there’re countless instances in engineering

where

we’ll have to take only a small sample from a large


population of systems or system components

in order to

investigate an issue and provide needed answers.

18
Because we draw the sample from the population, the
sample is called a subset of the population (Recall
Set Theory)

The population is also referred to as the “Universe”, or


the “Sample Space”.

Sample

Population

19
Ideally, we seek a sample that is a miniature copy of
the population.

But there is no guarantee that we can achieve such a


sample.

This dilemma leads to 2 very important questions …

20
Important Questions …

1. Is our sample a good copy of the


population?
In other words, what quantitative means
can we use to determine whether our
sample is “close” enough to the
population?

2. What steps can we take to ensure that our


sample is a good miniature copy of the
population?

21
Every engineer involved in statistic analysis of his/her system hopes
that:
his/her sample is a good representative of the population.

i.e., the engineer “prays” that the statistics of his/her


sample closely match the true parameters of the population.

Otherwise any conclusion he/she makes about the


sample does not reflect the entire population.

POPULATION SAMPLE
Parameters: μ, σ Statistics: x, s
22
Back to “Important Questions, #1”

Is our sample a good copy (close enough) of the


population?
We may compare the population parameters and the
sample statistics. However, the parameters of the
population are unknown, so can we measure such
closeness of our sample to the population?

We use the concepts of Bias and Efficiency (to be


discussed under “Inferential Statistics”).

“Statistical Inference”, helps to determine the


biased-ness or efficiency of estimates, in order to
see how good our samples are.
23
Back to “Important Questions #2”

What steps can we take to ensure that our sample is a


good miniature copy of the population?

Answer: Sampling must be random (and representative).


i.e., all elements of the population should have an
equal chance of being picked in the sample

24
Methods of Random Sampling
There are 4 major ways by which a sample can be
carried out to ensure that it is random and yet
represents a true miniature copy of the population:

- Simple Random Sampling


- Systematic Random Sampling
- Stratified (or Clustered) Random Sampling
- Combos of the above

The choice of any specific sampling technique above


depends on
- the composition of the population
- the availability of sampling resources
25
Simple Random Sampling
This is just a simple selection of elements of the
population without regard to the nature of the
population.

Advantages: - Less effort in preparations for the survey


- Less effort for conduct of the survey
- Is best when all elements in the population
have similar characteristics

Disadvantage: May not be truly representative of the


population, if the population has diverse
characteristics.

26
Systematic Random Sampling

This sampling method is …

Systematic in time : sampling elements from the


population within specified time intervals, at the same
location) (when data is sensitive to temporal changes)
(e.g. traffic data collection during rush/peak hours and
off peak hours),

Systematic in space: sampling elements from the


population at selected locations at the same time.
(when data is sensitive to spacial variation). (e.g.
counting the number of car accidents at various
segments/locations of a highway over a given time
period) (Crossectional data sample)
27
Stratified Random Sampling

This sampling method first divides the entire population


into different groups, or strata, on the basis of
certain characteristics of the population.

Next, a random sample is obtained within each stratum


to obtain the desired sample size.

28
Stratified Random Sampling
MAIN POPULATION

SUB-POPULATION SUB-POPULATION SUB-POPULATION SUB-POPULATION


#1 #2 #3 #4

SAMPLE SAMPLE
SAMPLE SAMPLE

Sub-populations may be of same size or of different sizes


29
Stratified Random Sampling (continued)
A stratified sampling approach is most effective when
three conditions are met
 Variability within strata are minimized
 Variability between strata are maximized
 The variables upon which the population is stratified
are strongly correlated with the desired dependent
variable.
Advantage:
Stratified random sampling ensures that each group in
the population is represented in the sample.
Is therefore ideal for populations having diverse groups.
Disadvantage:
Relatively more preparation time is needed to calculate
the proportions of each group in the population, and
therefore determination of their proportions in the
36
sample
Combinations of the 3 major methods of random
sampling.

Sampling schemes which are combination of the 3 methods can


also be used.

For example, You may decide to carry out a stratified and


systematic random sampling of your population.

Any Example?

Collection of data on different classes of vehicles (cars,


motor cycles, bus, etc.) during peak and off peak hours.

31
In Summary ...
-We can afford to take only a small sample
from a large population of systems or system
components in order to investigate the
population.
- Our sample must as much as possible reflect
the population from which it is drawn.

-Good sampling should be random, and


representative. Systematic and Stratified
sampling are useful to ensure that sample is
representative of the population.

-Only a good sample can result in accurate


inferences and predictions about the
population.
32
Population, Sample
and Processes •Properties of
population under
study is assumed to be
known
Probability •Deals with questions
involving samples
taken from population
(logic based on known properties)

Sample
Population

(logic based on observed instances)


Statistics of sample are
known to infer about
Inferential Statistics population
•Point estimation
•Hypothesis testing
•Estimation by
Confidence interval

 Any samples used should be representative of


the target population 39
Using Statistics in Research
 Carrying out research means the collection and
collation of data. Statistics are a way of making use
of this data
◦ Descriptive Statistics: used to describe characteristics
of the sample
 Statistics describe samples
 Gives numerical and graphic procedures to summarize a collection
of data in a clear and understandable way
◦ Inferential Statistics: used to generalise/infer/predict
from our sample to our population
 Parameters describe populations
 Provides procedures to draw inferences about a population from a
sample

34
Introduction to Statistics

Types of Statistical Analysis

Descriptive Inferential

Graphical Non-graphical
Central Tendency Point Estimation
Dot Plots Dispersion/ Variance Hypothesis Testing
Scatter Plots Range Confidence Interval
Box Plots Shape Statistical Regression
Stem-and-leaf Plots
Bar Charts/Histograms

35

Das könnte Ihnen auch gefallen