Sie sind auf Seite 1von 44

Data Collection & Sampling Methods

Where do data come from?

Take a step back if were starting from scratch, how do we collect / find data?
Secondary data Primary data

Secondary Data

Secondary data data someone else has collected


This is what you were looking for in your assignment.

Secondary Data Examples of Sources

County health departments Vital Statistics birth, death certificates Hospital, clinic, school nurse records Private and foundation databases City and county governments Surveillance data from state government programs Federal agency statistics - Census, NIH, etc.

Secondary Data Limitations

What did you find on the frustrating side as you looked for data on the states websites?

Secondary Data Limitations

When was it collected? For how long?


May be out of date for what you want to analyze. May not have been collected long enough for detecting trends. E.g.

Secondary Data Limitations

Is the data set complete?


There may be missing information on some observations Unless such missing information is caught and corrected for, analysis will be biased.

Secondary Data Limitations

Are there confounding problems?


Sample selection bias? Source choice bias? In time series, did some observations drop out over time?

Secondary Data Limitations

Are the data consistent/reliable?


Did variables drop out over time? Did variables change in definition over time?

E.g. number of years of education versus highest degree obtained.

Secondary Data Limitations

Is the information exactly what you need?


In some cases, may have to use proxy variables variables that may approximate something you really wanted to measure. Are they reliable? Is there correlation to what you actually want to measure? E.g.

Secondary Data Advantages

No need to reinvent the wheel.


If someone has already found the data, take advantage of it.

Secondary Data Advantages

It will save you money.


Even if you have to pay for access, often it is cheaper in terms of money than collecting your own data. (more on this later.)

Secondary Data Advantages

It will save you time.


Primary data collection is very time consuming.

Secondary Data Advantages

It may be very accurate.


When especially a government agency has collected the data, incredible amounts of time and money went into it. Its probably highly accurate.

Secondary Data Advantages

It has great exploratory value


Exploring research questions and formulating hypothesis to test.

Primary Data

Primary data data you collect

Primary Data - Examples

Surveys Focus groups Questionnaires Personal interviews Experiments and observational study

Surveys

A survey solicits information from people; e.g. Gallup polls; pre-election polls; marketing surveys. The Response Rate (i.e. the proportion of all people selected who complete the survey) is a key survey parameter. Surveys may be administered in a variety of ways, e.g. Personal Interview, Telephone Interview, and Self Administered Questionnaire.

Questionnaire Design

1. 2. 3. 4.

5.
6. 7. 8.

Over the years, a lot of thought has been put into the science of the design of survey questions. Key design principles: Keep the questionnaire as short as possible. Ask short, simple, and clearly worded questions. Start with demographic questions to help respondents get started comfortably. Use dichotomous (yes|no) and multiple choice questions. Use open-ended questions cautiously. Avoid using leading-questions. Pretest a questionnaire on a small number of people. Think about the way you intend to use the collected data when preparing the questionnaire.

Primary Data Limitations

Do you have the time and money for:


Designing your collection instrument? Selecting your population or sample? Pre testing/piloting the instrument to work out sources of bias? Administration of the instrument? Entry/collation of data?

Primary Data Limitations

Uniqueness
May not be able to compare to other populations

Primary Data Limitations

Researcher error
Sample bias Other confounding factors

Data collection choice

What you must ask yourself:


Will the data answer my research question?

Data collection choice

To answer that
You much first decide what your research question is Then you need to decide what data/variables are needed to scientifically answer the question

Data collection choice

If that data exist in secondary form, then use them to the extent you can, keeping in mind limitations. But if it does not, and you are able to fund primary collection, then it is the method of choice.

SAMPLING THEORY

Sampling :the process of inferring something about a large group if elements by studying only a part of it is called Population :the collection of all elements about which some reference is to be made is called.. Sample :the part of the population , which is chosen to study & make the inference is known as.

Methods of Enumeration
Two Methods

CENSUS / COMPLETE ENUMERATION

SELECTIVE / SAMPLE ENUMERATION

Deals with study of The entire population

Studies the selected part of the population that is the representative of the population

Advantages of census method


1.

The finding of the census methods are accurate & reliable : because it consider each & every unit of the population.

2.

Scope for detailed study

Disadvantages of census method

1.

2.

Two main disadvantages : Resource constraints : it requires valuable resources like money , time , manpower Time constraints : in cases where timely decision making is required census method of enumeration is not advisable .

Characteristics of Good Sampling


1.

2.

3.

A manager should look out for the following Characteristics : A sample should have all the characteristics of the population from where it is taken . Manager should not be biased in selecting a sample from the population . The findings or decisions taken based on the sample study should be applicable to the entire population

Types of Sampling

Two methods of sampling from population:


1.
2.

Random / probability Sampling Non Random / judgment Sampling

RANDOM SAMPLING

1. 2. 3.

4.

Here each elements has the some non-zero probability of getting included in sample. Some of the random sample methods :Simple Random Sampling Stratified Sampling Systematic Sampling Cluster Sampling

Simple Random Sampling


It is one of the simple sampling designs & can work well for relatively small population. N = the numbers of elements in population n = sample size (< N) All other forms of probability sampling use this basic concept of sampling random sampling . It creates problems when large population in which it is used.

Stratified Sampling

It is more complex then simple random sampling . The basis for using stratified sampling is the existence of strata such that each stratum is more homogeneous & markedly different from another stratum . The strata are so defined that they constitute a partition of the population .

Systematic Sampling

Systematic Sampling proceeds by picking up one element after a fixed interval depending on the sampling ratio. Systematic Sampling is relatively much easier to implement compared to random sampling . It is perhaps the most commonly used method among the probability sampling designs & for many other purpose .

Cluster Sampling

In this method , the population is divided into well defined groups . Then few of these clusters are selected based on the assumption that they represent the entire universe . All the units of selected clusters are studied to arrive at a conclusion . The selection of clusters is done by one of the sampling methods . It is used primarily . It is rarely used in single-stage sampling plans .

NON-RANDOM SAMPLING

1. 2. 3.

Some of the non-random sampling methods are as follows :Judgment Sampling Convenience Sampling Sequential Sampling

Judgment Sampling : also known as purpose sampling . the selection of the sample is based on the judgment of the manager . Convenience Sampling : based on the convenience of the researcher. uses the sources available to him to come to a conclusion . Sequential Sampling : the sample size is not fixed in advance . used for statistical quality control .

Central Limit Theorem

Statement : The distribution of the sum of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution . z = (`x m ) (s / n) is the standard normal distribution .

Sampling Errors

1.
2. 3.

4.
5.

They are due to differences between each sample & the population & among several samples . Originate at the time of collecting samples. Some of the causes for errors in sampling are : Errors in selection of the sample. Bias in the reporting of data . Diversity of population . Substitution of sampling units for convenience. Faulty demarcation of sampling universe .

Non Sampling Errors


Occurs at a time of observation, approximation & processing of data . It is common for both sampling & census survey.

Occurs due to :
Faulty sampling plan. Lack of trained & qualified investigators. Inaccuracy in responses collected due to bias of the respondent . Errors in design of the survey .

Sampling Distribution

A probability distribution based on the sample statistics ; will be known as sampling distribution of a statistic . The statistic may be mean , median , mode or standard deviation .

Standard Error
It is an estimate of the standard deviation of the sampling distribution of means, based on the data from one or more random sample . The formula of the standard error of the mean : sm s n n=sample size s =the standard deviation of the original distribution

Das könnte Ihnen auch gefallen