Sie sind auf Seite 1von 16

Chapter 5

Data Collection and Sampling

5.2 Sources of Data


The reliability and accuracy of the data affect the validity of the results of a statistical analysis. The reliability and accuracy of the data depend on the method of collection. Three of the most popular sources of statistical data are:
Published data Observational studies Experimental studies

Published Data
This is often a preferred source of data due to low cost and convenience. Published data is found as printed material, tapes, disks, and on the Internet. Data published by the organization that has collected it is called PRIMARY DATA. Data published by an organization different than the organization that has collected it is called SECONDARY DATA. 3

Observational and experimental studies


When published data are unavailable, one needs to conduct a study to generate the data.
Observational study is one in which measurements representing a variable of interest are observed and recorded, without controlling any factor that might influence their values. Experimental study is one in which measurements representing a variable of interest are observed and recorded, while controlling factors that might influence their values.
4

Surveys

A good questionnaire must be Surveys solicit information from people. well designed: Keep the questionnaire Surveys can be made by means of as short as possible.

Ask short,simple, and clearly worded questions. personal interview Start with demographic questions to help telephone interviewrespondents get started comfortably. Use dichotomous and multiple choice questions. self-administered questionnaire questions cautiously. Use open-ended Avoid using leading-questions. is useful to pretest a questionnaire. Think about the way you intend to use the collected data when preparing the questionnaire.

5.3 Sampling
Motivation for conducting a sampling procedure:
Costs. Population size. The possible destructive nature of the sampling process.

The sample population and the target population should be similar to one another. Self-selected samples are almost always biased.
6

5.4 Sampling Plans


Simple random sampling
In simple random sampling all the samples with the same size is equally likely to be chosen. To conduct a random sampling
assign a number to each element of the chosen population (or use already given numbers), randomly select the sample numbers (members). Use a random numbers table, or a software package.

Stratified Random Sampling


This sampling procedure separates the population into mutually exclusive sets (strata), and then draw simple random samples from each stratum. With this procedure we can acquire information about Age
the whole population under 20 Occupation 20-30 each stratum professional starta. the relationships among31-40 clerical 41-50 blue-collar Sex Male Female

There are several ways to build the stratified sample. For example, keep the proportion of each startum in the population.
A sample of size 1,000 is to be drawn
Stratum Income Population proportion
25% 40% 30% 5%

Stratum size
250 400 300 50

1 under $15,000 population proportions 2These are the15,000-29,999 3of each income category 30.000-50,000 4 over $50,000

Total 1,000
9

Cluster sampling
Cluster sampling is a simple random sample of groups or clusters of elements. This procedure is useful when
it is difficult and costly to develop a complete list of the population members the population members are widely dispersed geographically.

Cluster sampling increase sampling error, because there are probably similarities among cluster members.

10

5.5 Errors Involved in sampling


Two major types of errors can rise when a sampling procedure is performed. Sampling Error
Sampling error refers to differences between the sample and the population, because of the specific observations that happen to be selected. Sampling error is expected to occur when making a statement about the population based on the sample taken.
11

Population income distribution

m - Income population mean Sampling error The sample mean falls here only because certain randomly selected observations were included in the sample.

x income sample mean


12

Non-sampling error
Non-sampling errors occur due to mistakes made along the process of data acquisition Increasing sample size will not reduce this type of errors. There are three types of Non-sampling errors;
Errors in data acquisition, Non-response errors, Selection bias.

13

Data Acquisition Error


Population

Sampling error + If this observation is wrongly recorded here Data acquisition error

Sample

Then the sample mean is affected


14

Non-Response Error Population

No response here... May lead to biased results here

Sample

15

Selection Bias Population

When parts of the population cannot be selected...

Sample

the sample cannot represent the whole population

16

Das könnte Ihnen auch gefallen