Sampling

IBA - JU 08 WMBA Course Instructor: Dr. Swapan Kumar Dhar Most people intuitively understand the idea of sampling.
One taste from a drink tells us whether it is sweet or not sweet. If we select a few ads from a magazine, we usually assume our selection reflects the characteristics of the full set. If some members of our staff favor a promotional strategy, we infer that others will also. These examples vary in their representatives, but each is a sample. The basic idea of sampling is that by selecting some of the elements in a population, we may draw conclusions about the entire population. Sampling is the process of selecting some elements from a population to represent that population. A population is the total collection of elements about which we wish to make some inferences. All office workers in the firm compose a population of interest; all 4000 files define a population of interest. A census is a count of all the elements in a population. If 4000 files define the population, a census would obtain information from every one of them. Sampling frame is a list of elements in the population from which the sample is actually drawn. Why Sample? There are several compelling reasons for sampling, including (i) lower cost, (ii) greater speed of data collection (iii) the destructive nature of certain tests. Sources of Data: (i) Primary Source and (ii) Secondary Source. Objective of sampling: (1) The objective of sampling is to collect data within a short period of time spending minimum money and employing more experienced manpower to conduct the survey. (2) The sample survey provides more accurate information of population units and hence most efficient estimate of population parameter is available using sampling technique. Types of Sampling Methods or sample design: There are two methods of selecting samples from populations: 1. Probability or random sampling 2. Nonrandom or non-probability sampling. In probability sampling, a sample is selected in such a way that each item or person in the population being studied has a chance of being chosen in the sample and the sample is called the probability sample. Nonprobability sampling is an arbitrary and subjective sampling procedure where each population element does not have a known, nonzero chance of being included. Steps in sampling design: There are several questions to be answered in securing a sample. (i) What is the target population? (ii) What are the parameters of interest? (iii) What is the sampling frame? (iv) What is the appropriate sampling method? (v) What size sample is selected? Random Sampling: There are mainly three methods of random sampling: (a) Simple random sampling (b) Systematic sampling (c) Cluster sampling (a) Simple random sampling: Simple random sampling selects samples by methods that allow each possible sample to have an equal probability of being picked and each item in the entire population to have an equal chance of being included in the sample. Simple Random Sample: A sample selected so that each item or person in the population has the same chance of being included. . Method of Drawing Simple Random sample: There are usually two methods of drawing SRS. They are (i) Lottery Method and (ii) Use of Random number Table.
-1-
Lottery Method: Suppose there are N units in a population of interest. Let these N units are identified as say 1,2,3N. These identification numbers are written separately in pieces of papers of same size and color. Each piece of paper is folded uniformly, put in a box and mixed randomly. Then, one by one, n pieces of papers are selected from the box randomly to get a sample of n units. The sample thus selected is a simple random sample of size n. Use of Random Number Table: Random Number Table is available from the table provided by Fisher and Yates. A portion of such a table is presented here. The digits are so arranged in a row that number of one digit, two digits and so on can be formed. The sample is drawn (i) with replacement and (ii) without replacement. Example 1: Suppose that a company wants to select a sample size of 40 full-time workers out of a population of 800 full-time employees in order to obtain information on expenditures from a company - sponsored dental plan. How will the sample actually be drawn? Solution: To select the random sample, we use a table of random numbers. The population frame consists of listing of the names all N = 800 full-time employees obtained from the company personnel files. Since the population size (800) is a three-digit number, each assigned code number must also be three digits so that every full-time worker has an equal chance for selection. Thus, a code of 001 is given to the first full-time employee in the population listing, a code of 002 is given to the second full-time employee in the population listing, and so on, until a code of 800 is given to the N-th full-time worker in the listing. To select the simple random sample, a random starting point for the table of random numbers is chosen. One such method is to close one's eyes and strike the table of random numbers with a pencil. Suppose we use such a procedure and thereby select row 06, column 05 of the following Table of random numbers as the starting point. Although we can go in any direction in the table, suppose we read from left to right in sequences of three digits without skipping. The individual with code number 003 is the first full time employee in the sample (row 06 and columns 0507), the second individual has code number 364 (row 06 and column 08-10) and the third individual has code number 884. Since the highest code for any employee is 800, this number may be discarded. Then code numbers 720, 433, 463, 363, 109, 592, 470, and 705 are selected third through tenth respectively. The selection process continues in a similar manner until the needed sample size of 40 full time employees is obtained. During the selection process, if any three digits coded sequences repeats, the repeated coded sequence is to be discarded. Table of Random Numbers Column Row 12345 67890 12345 67890 12345 67890 12345 67890 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 49280 61870 43898 62993 33850 97340 70543 89382 37818 60430 82975 39087 55700 14756 32166 23236 45794 09893 54382 94750 70297 88924 41657 65923 93912 58555 03364 29776 93809 72142 22834 66158 71938 24586 23997 53251 73751 26926 20505 74598 89923 34135 35779 07468 25078 30454 51438 88472 10087 00796 67140 14130 84731 40355 93247 78643 70654 31888 15130 14225 91499 37089 53140 00283 08612 86129 84598 85507 04334 10072 95945 50785 96593 19436 54324 32596 75912 92827 81718 82455 68514 14523 20048 33340 81163 98083 78496 56095 71865 63919 55980 34101 22380 23298 55790 08401 11865 83832 63491 06546 78305 46427 68479 80336 42050 07275 97349 97653 20664 79488 36394 64688 81277 16703 56203 69229 26299 63397 32768 04233 83246 55058 56788 27686 94598 82341 89863 20775 91550 12872 76783 11095 68239 66090 53362 92671 28661 46420 44251 18928 33825 47651 52551 96297 46162 26940 44104 02348 45091 08078 64647 31708 92470 20461 88872 44940 15925 13675 59208 43189 57070 69662 04877 47182 78822 83554 36858 82949
-2-
22 85157 47954 32979 26575 57600 40881 12250 73742 23 11100 02340 12860 74697 96644 89439 28707 25815 24 36871 50775 30592 57143 17381 68856 25853 35041 25 23913 48357 63308 16090 51690 54607 72407 55538 Suppose that we have 100 employees in a company and wish to interview a randomly chosen sample of 10. Going from the top to the bottom of the columns beginning with the left-hand column, read only the first three digits in each row from the table of random numbers. Then the samples are: 92, 18, 38, 29, 38, 73, 05, 93, 78, 04. Example 2: An industrial firm is concerned about the time per week spent by scientists on certain trivial tasks. The time log sheets of a simple random sample of n = 50 employees show the average amount of time spent on these tasks is 10.31 hours. The company employs N = 750 scientists. Estimate the total number of man-hours lost per week on trivial tasks. Solution: We know the population consists of N = 750 employees from which a random sample of n = 50 time log sheets was obtained. The average amount of time lost for the 50 employees was y = 10.31 hours per week. Therefore the estimate of the population total is N y = 750(10.31) = 7732.5 hours. Thus the estimate of total time lost is 7732.5 hours. The sample is drawn (i) with replacement and (ii) without replacement. The usual practice is that, simple random sample is drawn without replacement. This means that any random number, which occurs more than once, is selected once. Illustration: Suppose we wish to select random sample of size 2 from a population of 5 students, identified as A, B, C, D and E. If we sample with replacement, then there are are listed below: AA AB AC AD AE Here we see that BA BB BC BD BE CA CB CC CD CE
N n = 52 =25 possible samples, which

DA DB DC DD DE EA EB EC ED EE
9 . Similarly, 25 9 1 P (B ) = P (C ) = P (D ) = . Furthermore, each of the 25 samples has an equal probability of of 25 25
appears
in 9 out of 25 samples, so that
P (A) =
being selected. When the sampling is done without replacement and the order is disregarded, there are possible distinct samples, which are listed as: AB Here we AC see that AD AE appears in 4 BC out of BD 10 BE samples, so CD that
N n
5 = = 10 2
DE
CE
4 . Similarly, 10 4 1 P (B ) = P (C ) = P (D ) = . Furthermore, each of the 10 distinct samples has the probability of 10 25
P (A) =
being selected. Example 3: Assume that a population consists of 5 students and the marks obtained by them in a certain statistics examination are 20, 15, 12, 16 and 18. Draw all possible random samples of 2 students when sampling is performed (i) with replacement and (ii) without replacement. Calculate the mean marks for each sample. Solution: Let the five students be identified as A, B, C, D and E.
-3-
(i) The number of possible random samples of 2 students, which can be selected with replacement from this population, is 25. Let
X 1 denote the marks of the student selected first and X 2 , the marks of the
X
student selected on the second draw. Then the possible random samples of size n = 2 with values of are given below: Sample Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Sampled Students A,A A,B A,C A,D A,E B,A B,B B,C B,D B,E C,A C,B C,C C,D C,E D,A D,B D,C D,D D,E E,A E,B E,C E,D E,E Sampled Marks ( X 1 , X 2 ) 20,20 20,15 20,12 20,16 20,18 15,20 15,15 15,12 15,16 15,18 12,20 12,15 12,12 12,16, 12,18 16,20 16,15 16,12 16,16 16,18 18,20 18,15 18,12 18,16 18,18 Sample Mean Marks 20 17.5 16 18 19 17.5 15 13.5 15.5 16.5 16 13.5 12 14 15 18 15.5 14 16 17 19 16.5 15 17 18
Mean of sample mean marks= 16.2= E (y ) Population Mean = 16.2 = . (ii) The number of random samples of 2 students that can be drawn without replacement is 10. These samples along with values of mean marks are given below: Sample Number 1 2 3 4 5 6 7 8 9 10 Sampled Students A,B A,C A,D A,E B,C B,D B,E C,D C,E D,E Sampled Marks ( X 1 , X 2 ) 20,15 20,12 20,16 20,18 15,12 15,16 15,18 12,16 12,18 16,18 Sample Mean Marks 17.5 16 18 19 13.5 15.5 16.5 14 15 17
Here also, Mean of sample mean marks= 16.2 Population Mean = 16.2 = . Example 4: The following data represent the combined grade point average (CGPA) of 50 students after a semester:
-4-
ID No 00 01 02 03 04 05 06 07 08 09
CG PA . 3. 90 2 .85 2. 70 2. 70 2. 85 2 .65 2. 00 2. 00 2. 00 3. 86
ID N o 10 11 12 13 14 15 16 17 18 19 20
CGPA 3.92 2.85 2.80 2.60 2.00 2.50 2.52 2.80 2. 95 2. 05 2. 18
ID No 21 22 23 24 25 26 27 28 29 30
CG PA 3.60 3.02 3.15 3.12 2.75 2.85 2.50 2.42 3. 16 2. 78
ID No 31 32 33 34 35 36 37 38 39 40
CGPA 2.86 2.05 3.75 3.25 3.62 2.42 2.40 2.64 2. 00 2. 85
ID No 41 42 43 44 45 46 47 48 49
CGPA 2. 85 2. 00 3. 26 2. 95 3. 00 2. 15 2. 02 3. 00 2. 40
(i) Draw a simple random sample of size n = 10. (ii) Estimate the average CGPA of all students. Solution: (i) Given N = 50, we need a sample of size n = 10. To select the sample, let us select 10 random numbers using 1st two columns of Table of Random Numbers. The selected numbers and the sample observations are shown below: R a n d o m n u m b e rs O bserv ation s (y,) 49 2.40 43 3.26 33 3.75 37 2.40 39 2.00 14 2.00 32 2.05 23 3.15 45 3.00 09 3.86
(ii) The estimate of average CGPA is given by
y =
y = 2.40 + 3.26 + ... + 3.86 = 27.87 = 2.787.

n 10 10
Example 5: The following are the simple random sample observations of size n = 15 drawn from a population of size N = 500. The observations are the production of jute (in tons) of 15 farmers. Production 0.5 0.6 1.2 1.7 0.8 0.7 0.5 0.4 0.3 Estimate the average production of jute of farmers. Solution: Given N = 500, n= 15, the estimate of average jute production is 0.6 1.5 2.0 1.6 0.9 0.7
y =
y = 14.00 = 0.93 tons.

n 15
-5-
Example 7: The following are the marks of some students in a Statistics course: Marks out of 100 ID 01 02 03 04 05 06 07 06 09 10 11 12 13 14 15 16 17 18 19 20 Marks 82 84 90 40 30 20 50 52 65 66 70 72 74 75 76 70 60 62 55 40 ID 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 M arks 82 92 90 42 48 36 35 30 18 12 10 50 55 58 60 42 46 40 65 66 ID 41 42 43 44 45 45 47 48 49 50 51 52 53 54 55 56 57 53 59 60 Marks 62 64 66 40 44 45 45 45 43 57 59 60 65 62 36 30 38 43 22 10 ID 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 M arks 40 40 50 52 38 37 19 12 21 73 97 72 99 30 66 29 91 28 45 68 ID 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 Marks 92 31 72 50 72 22 67 11 56 82 44 65 79 90 13 11 37 12 10 43 ID 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 M arks 23 73 59 64 58 25 54 06 31 23 62 83 14 40 12 19 89 73 20 68
Draw a sample of size n = 13 and estimate the proportion of students who secured marks above 60. Solution: To draw the sample of size n = 13, 13 random numbers of 3 digits are to be selected from the Random Number Table. The random numbers and the value of y, are shown below: (Random numbers are selected starting from 1- 6 columns of Table of Random Numbers.) Random Numbers Marks 098 12 111 62 033 55 023 90 074 30 100 43 007 50 002 84 086 22 043 66 084 50 118 73 072 72
There are 6 students who got marks above 60. So, the proportion of students securing marks above 60 is
-6-
P=
6 13
= 0.4615.
(b) Systematic Sampling: In a systematic sample, the N individuals or items in the population frame are partitioned into K groups by dividing the size of the population frame N by the desired sample size n, that is,
K=
N n
Where K is rounded to the nearest integer. To obtain a systematic sample, the first individual or item to be selected is chosen at random from the K individuals or items in the first partitioned group in the population frame and the rest of the sample is obtained by selecting every k-th individual or item thereafter from the entire population frame listing. Systematic Random Sample: A random starting point is selected and then every k th member of the population is selected. Example 6: As in Example 1, suppose that a company wants to select a sample size of 32 full-time workers out of a population of 800 full- time employees in order to obtain information on expenditures from a company-sponsored dental plan. If we again assume an (overall) 80% response rate, our survey will be distributed to 40 full-time employees taken from the personnel files of the company to obtain the desired 32 responses. How will the systematic sample actually be selected? Solution: The population frame consists of listing of the names and company mailbox numbers of all N = 800 full time employees obtained from the company personnel files. Since the population size (800) is a three-digit numbers, each assigned code number must also be three digits, so that every full time worker has an equal chance of selection. That is,
K=
N 800 = = 20 . n 40
Which is itself an integer and no rounding is needed. We use a table of random numbers to select the first full-time employees in our systematic sample from among the 20 in the first partitioned group. From the given random number table we select row 12, column 01, as the starting point, we read from left to right in sequences of two digits without skipping until we obtain a random number that corresponds to one of the 20 fill-time employees. From the table we note that the first two-digit sequence is 39 (greater than 20), but the second two-digit sequence is 08 (row 12 and column 03-04). Therefore first employee to be selected is whose code number is 008. Thus the systematic sample would consist of the 40 full time employees with code numbers 008,028, 048, 068,088,108,128,148,168,188,208,228,248,268,288,308,328,348,368,388, 408,428, 448, 468,488,508,528,548,568,588,608,628,648,668,688,708,728,748,768,788. Cluster sampling: A population is divided into clusters using naturally occurring geographic or other boundaries. Then, clusters are randomly selected and a sample is collected by randomly selecting from each cluster. Cluster: The smallest unit into which the population can be divided is called an element of the population. A group of such elements is known as cluster. Suppose you want to determine the views of residents in a particular town about the town and the policies taken towards the development of the town. Selecting a random sample of residents in the town and personally contacting each one would be time consuming and very expensive as well. Moreover, up to date sampling frame may not be available. Instead you could employ cluster sampling by subdividing the town into small units either wards or blocks. These are often called primary units. Suppose you have divided the town into 12 primary units, then selected at random four wards 2, 6, 3 and 12 and concentrated your efforts in these primary units. You could take a random sample of the residents in each of these wards and interview them. Sampling distribution of the sample mean: A probability distribution of all possible sample means of a given sample size is the sampling distribution of the sample mean.
-7-
Example 7: A small cottage industry has seven production employees (considered as the population). The hourly earnings of each employee are given in the following table. Employee Joe Sam Sue Bob John Ted Tart Hourly Earnings (in $) 7 7 8 8 7 8 9
(a) What is the population mean? (b) What is the sampling distribution of the sample mean for samples of size 2? (c) What is the mean of the sampling distribution? Solution:
(a) The population mean = =
7 +7 +8+8+7+8+9 = 7.71 . 7
(b) To arrive at the sampling distribution of the sample mean, all possible samples of size 2 have been selected from the population without replacement and their means were computed. There are 21 possible samples found by
N 7 = = 21 . n 2
Sample means for all possible samples of 2 employees Earnings 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7 8 8 7 8 9 8 8 7 8 9 Sum 14 15 15 14 15 16 15 15 14 15 16 Means 7.00 7.50 7.50 7.00 7.50 8.00 7.50 7.50 7.00 7.50 8.00 Sampl e 12 13 14 15 16 17 18 19 20 21 Employees Sue, Bob Sue, John Sue, Ted Sue, Tart Bob, John Bob, Ted Bob, Tart John, Ted John, Tart Ted, Tart Earnings 8, 8, 8, 8, 8, 8, 8, 7, 7, 8, 8 7 8 9 7 8 9 8 9 9 Sum 16 15 16 17 15 16 17 15 16 17 Means 8.00 7.50 8.00 8.50 7.50 8.00 8.50 7.50 8.00 8.50
Sample 1 2 3 4 5 6 7 8 9 10 11
Employees Joe, Sam Joe, Sue Joe, Bob Joe, John Joe, Ted Joe, Tart Sam, Sue Sam, Bob Sam, John Sam, Ted Sam, Tart
Sampling distribution of sample means for n = 2. Sample Means 7.00 7.50 8.00 8.50 Total Number of Means 3 9 6 3 21 Probability 0.1429 0.4285 0.2857 0.1429 1.0000
-8-
(c) The mean of all the sample means is usually denoted by value. The subscript
x . The
reminds us that it is a population
X indicates that it is the sampling distribution of the sample mean. Moreover, Sum of all sample means 7.00 + 7.50 + ... + 8.50 = = = 7.71. Total number of samples 21
Example: The following is a list of shops. Also noted is whether the store is corporate owned (C) or manager owned (M). A sample of four locations is to be selected and inspected for customer convenience, safety, cleanliness and other features. ID No Address Type ID No Address Type 00 12, Wall street C 12 23, Shyamoli C 01 14, Dhanmondi C 13 15, Kalabagan C 02 16, Gulshan 1 C 14 114, Hatirpul C 03 18, Gulshan 2 M 15 125, Green Road C 04 23, Mahakhali C 16 65, Motijheel C 05 14, Mirpur C 17 23, Banani M 06 2, Mohammadpur M 18 12, Khilkhet M 07 15, Mirpur C 19 13, Wari C 08 12A, Mogbazar C 20 603, Shantinagar M 09 13B, Sobhanbag C 21 10, Malibagh C 10 4, Newmarket M 22 12, Arambagh C 11 23, Bashundara M 23 4, Pantapath M a. The random numbers selected are 08, 18, 11, 54, 02, 41, and 54. Which stores are selected? b. Use the table of random numbers to select your own sample of locations. c. A sample is to consist of every seventh location. The number 03 is the starting point. Which locations will be included in the sample?
-9-

Sampling

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Sampling

Hochgeladen von

Copyright:

Verfügbare Formate

IBA - JU 08 WMBA Course Instructor: Dr. Swapan Kumar Dhar Most people intuitively understand the idea of sampling.

N n = 52 =25 possible samples, which

9 . Similarly, 25 9 1 P (B ) = P (C ) = P (D ) = . Furthermore, each of the 25 samples has an equal probability of of 25 25

in 9 out of 25 samples, so that

4 . Similarly, 10 4 1 P (B ) = P (C ) = P (D ) = . Furthermore, each of the 10 distinct samples has the probability of 10 25

CGPA 3.92 2.85 2.80 2.60 2.00 2.50 2.52 2.80 2. 95 2. 05 2. 18

CG PA 3.60 3.02 3.15 3.12 2.75 2.85 2.50 2.42 3. 16 2. 78

CGPA 2.86 2.05 3.75 3.25 3.62 2.42 2.40 2.64 2. 00 2. 85

(ii) The estimate of average CGPA is given by

y = 2.40 + 3.26 + ... + 3.86 = 27.87 = 2.787.

y = 14.00 = 0.93 tons.

(a) The population mean = =

reminds us that it is a population

Das könnte Ihnen auch gefallen