Sie sind auf Seite 1von 31

Lecture 5

Discussion for Today


Probability sampling
Non probability sampling
Questionnaire

Probability sampling-the types

1-

Random Sampling or Simple Random Sampling

When each and every unit of the population has equal probability of
being included in the sample example: a lottery system.
When to use Simple random sample
1.

Have an accurate and easily accessible sampling frame that lists the entire
population, preferably stored on a computer.

2.

Not suitable for face-to-face data collection methods if the population


covers a large geographical area.

3.

Prefer this sampling whenever possible

4.

It minimizes the biases.

2- Stratified Random Sampling


This is a form of random sampling in which units are divided into groups or
categories (homogenous) that are mutually exclusive. These groups are called
strata.
Within each stratum simple or systematic random is selected.
Grouping by age, sex
Advantages:
a- It provides more accurate impression of the population.
b- It is an improvement over random sampling when the population is more
heterogeneous.
Disadvantages:
a- If not properly designed, overlapping, the accuracy of the results
decreases.

3- Systematic sampling
A form of random sampling involving a system which means there is gap, interval or no
sampling between each selected units
When to use systematic sampling
It is used when the population that we want to study is connected to an identified site, e.g.
I.

Patients attending a clinic.

II.

Houses that are ordered along a road

III.

Customers who walk one by one through an entrance

Advantages:
1.

Sufficiently random to obtain reliable estimates

Disadvantages:
2.

It is not fully random because after the first step each unit is selected with a fixed
interval.

3.

It could be problematic if particular characteristics arise. For example every 10 th house


in the sector may be corner house.

4- Cluster/area Sampling
Clusters are formed by breaking down the area to be surveyed into
smaller areas.
Then a few of smaller areas are selected randomly.
If the clusters is small all the respondents are interviewed otherwise
The units/respondents are selected randomly.
When to use:
It is used when the population is widely dispersed across the regions. For
example universities, villages.
Advantages:
I. When no suitable sampling framework, this is the suitable method.
II. Time and money is saved to avoid travelling.
III. Do not need a complete frame of the population, need a complete list of
clusters.
Disadvantages:
1. Cluster may contain similar units.
Stratum is homogeneous, cluster should be as heterogeneous as possible

Multistage cluster sampling

It is a combination of the methods of random sampling.


Population is divided into number of stages.
It guarantees the greatest representativity for the survey
It is also one of the most complex methods.
Simply speaking it is a series of samples taken at successive stages.
Normally used to overcome problems associated with a geographically
dispersed population when face-to-face contact is needed.

Non-Probability Sampling
It is a process in which the personal judgment determines rather the statistical
procedure which unit is to be selected. It is also called non. Random sampling.
1- Quota Sampling: In this techniques interviewer is asked to select a person
with certain characteristics.

The purpose is to make sample more representative of the population.

Advantages:
I.

An alternative when there is no suitable random framework

II.

Lower cost as the survey is carried rapidly.

Disadvantages:
III.

Identifying the unit is difficult.

2- Snow ball sampling:


Used when the population is hidden, for example sex workers
and drug addictor.
First key informants are identified that help in reaching the
respondents.
With the help of that respondents further are contacted.
The sample increases as it rolls down.
The process continues till the requirement.

Which techniques to use


No rule of thumb
Purpose of the researcher
Resource
Time
Nature of the study

SUMMARY

QUESTIONNAIRE

A QUESTIONNAIRE IS ONLY AS GOOD AS THE QUESTIONS IT


ASKS

Questionnaire
What a Questionnaire is?
A series of written questions in a fixed, rational order to generate the
statistical information from a specific Population needed to accomplish the
research objectives.
Purposes of the Questionnaire

Ensures standardization and comparability of the data across interviews


everyone is asked the same questions

Allows the researcher to collect the relevant information necessary to


address the management decision problem

Criteria to consider

Does it provide the necessary information?


Does it consider the respondent?
Does it meet editing, coding and data processing requirements?

Questionnaire Design
1- List variables
I.

Focus Groups that include

II.

key Informants

III.

Theory or Conceptual Framework,

IV.

Expert opinion.

2- Borrow from other Instruments


A. Save development effort (reinventing the wheel)
B. Borrow reliability, validity
C. Facilitate comparison with previous studies

3. Solicit input from colleagues and friends

Correlation
What
Correlation is:
It measure the degree of relationship/association between the
variables.
The measure of correlation is called the correlation coefficient.
1- It can be positive as well as negative
2- Its range is --------------

( -1 r +1) (DIAGRAM)

3- It is symmetrical in nature; that is, the coefficient of correlation


between X and Y() is the same as that between Y and X(.
4- It is independent of the origin and scale- notes

Causation versus correlating


Causation
1. Cause and effect

Correlation

1- Degree of Association

2. Asymmetric
Y=f(x) is not equal to x=f(y)
3- Causation is necessarily
correlation

2- Symmetric

=
3- Correlation is not
necessarily causation

Notation
Dependent variable

Independent variable

Explained variable

Explanatory variable

Predictand

Predictor

Regressand

Regressor

Response

Stimulus

Endogenous

Exogenous

Outcome

Covariate

Controlled variable

Control variable

LHS

RHS

Regression
History- Francis Galton
Tall parents----------tall children
However average height of children less than parents
Short parents.. Short children
However average height of children was greater than parents.
The average height of children tend to move or regress the
average height of population as a whole. Galton law of universal
Regression
Karl Pearson verified it by collecting data from 1000 people and
called it regression to mediocrity

Modern concept
Regression analysis concerned with the study of dependence of
one variable (dependent variable) on one or more variables
(explanatory variables) with a view to estimate or predict the
average/mean value of the DV in term of the given/fixed value of
the known EV variable.
Example 1- sons height and fathers height
Example 2- height at different age level
Note that this line has a positive slope but the slope is less
than 1, which is in conformity with Galtons regression to
mediocrity.

Statistical Versus Deterministic Relationship


Regression concerns with statistical relationship not functional or
deterministic dependence of variables as in physics.

Example 1: Dependency of crop yield

Y= f ( temp, sunshine, rainfall, fertilizers,.)

Measurement of error, many other variable, prediction is not 100% correct

Newton's law of gravity

F becomes random if the measurement error arises in k.

Statistical versus deterministic Relationship


Functional or Deterministic

Statistical
Concerned

with

dependency
Variables are random
Statistical dependency

variable

Concerned with variable


dependency
Variables are non random
Deterministic or functional
dependency

Can not be predicted with accuracy


Can be predicted accurately
Example: Crop yield

Example: Newton's law

Regression versus causation


Although the regression analysis deal with dependency of one

variable on other variables


It does not necessarily imply causation.
A statistical relationship, however strong can never establish causal

connection.
There is no statistical reason to assume that rainfall does not

depend on crop yield.


Our idea of causation must come from outside statistics ultimately

from some theory or other information.


Key Point: a statistical relationship in itself cannot logically imply

causation.

Simple or Bivariate Regression

Regression analysis is largely concerned with estimating and/or predicting


the (population) mean value of the dependent variable on the basis of the
known or xed values of the explanatory variable(s).

Example: EXPENDITURE-INCOME

Conditional Mean: E(Y/X)

Unconditional Mean: E(Y)

The population regression line is simply the locus of the conditional mean of
the dependent variable for the fixed values of the explanatory variable.

Population Regression Function(PRF)


E(Y/Xi)=f(Xi)---------------------------------------A
The above equation is called conditional expectation function(CEF) or
Population Regression Function PRF.
What form the f(Xi) assume- important question
E(Y/Xi)= B1+B2 Xi

---------------(B)

B1 and B2 are unknown but fixed parameters known as regression


coefficients.
B1 and B2 also known as intercept and slope coefficients.
Other names are Regression, Regression equation, Regression model
used synonymously.
The purpose of the regression is to estimate the values of the parameters i.e.
unknown parameters B1 and B2

Summary
Correlation
Correlation and causation
Regression
Regression and causation

Das könnte Ihnen auch gefallen