Beruflich Dokumente
Kultur Dokumente
In such cases, wc gather information about thc p,rpullion hascd ,,n dah derived
liom a sample ofthe
populali0n-
'lb galher useful inlbnnation, the samplc must be rcPresentatiye of thc population.
A samplc is
described as represe tative if it is believed that it fairly and comprehensively
rcprcsents the
population ftom which it is drawn. A sarnple is biased if it over_repr;sents
or under_iepresents a
pafticular subgoup. The rvords .unbiased, and .rcpresentative' arc usually
interchangeable.
SAMPLING METIIODS
SampJing is collectiirg data frollr a rcprcsentative san1ple
int.oduce 4 t)?es ofsampling, narnely,
of a population. h the following, we
. Random sampling
. Stratificd sampling
. Systcmatic samphng
. Quota sampling
( t21t- l
Some gcneral comments before the details ofeach method :
The finit 3 t)?es .LIe rcferred to as probabilistic sarnpling in which everv member of the poDulation
has an equal chance of being selectcd. The sample obtained is called a random sample- ln each of
thesesallpling methods, a samDlins frame (see section trelow) is necded.
Quota sampling is a tjpe of nonprobabilistic sampling and the sample obtained is non-random. I
samplingframe is not nee(led in quota sampling.
We are more likely to represent the population well through probability sampling. In gerleral,
researchers prefer probability sampling methods and consider them to be more accuratc and rigorous.
Dxample l.l
Which of the following is a random sample?
(D John picks out 2 identical balls liom a bag of2l, without replac€ment.
(ii) Mark picks out 2 identical balls from a bag of2l, with replacement.
(iiD Mattlew picks out 2 identical balls ftom 2 bags, one from a bag of l0 and one from a bag of
11.
(i") Luke picks out 3 identical balls liom 3 bags of7, one Aom each bag.
Solution
Recall:
ln a sampling method, if every member ofthe pop,rlatioihas an equal chance of beins selected, t!7en
Llre sampl( oblaincd is a random sarnple.
(i) The probability ofselecting the first ball is l/21- Sincc the lirst ball is not rcplaced before thc
second ball is drawn, the probabiiity of selecting the second ball is l/20_ Tlus, we see that
each member of the population does 4qt have an equal chance l)1. being selectcd and we
co{clude that the sample is non-random.
(it Since the lirst ball is replaced before the second ball is drawn, the probabiljty ofselecting the
second ball is l/21. Thus, we see that each member ofthc population has an equal chance of
treing selccted and we conclude that thc sample is random.
(iiD The probability ofselecting a ball liom rhc bag of 10 is 1/10 and the probability ofselecting a
ball from the bag of 1l is l/11. Thus, we see that oach member of the population does !!)t
have aa equal chance ofbeing selected and we concludc that the sample is non-random.
(iu) The probability ofselecting each ball from a bag of7 is l/7. 'lhus, we see that each member of
the population has an equal chance of being selectcd and we conclude that the sample is
random,
Solution
C2B 2
The Sampling Frame
The sampling Jrame is a complete list (to be idcal), in some form, of qtz thc members in the
population, from which a sample can bc selectcd.
other cxanrples include population registers or electorar registers of countries, registeis of schoois
and records ofscrial numbers in mallufactured goods-
ln simple random sampling, cach ildividual ir thc saniplc is chosen frorn the population purely
by chance. Each nenlbcr of thc population has an cqual chalce of bcing sclectecl_
To obtain a random sample ofsize r? from a population ofsizc
{ we
I ) obtain a sampling liame of the population,
Note that random selection of the numbers can bc <Ionc mechalically through writing
each
nulnber that represents a membcr ol the population ot a slip ofpup"r, .i* themrcally
well in a
cortainer atrd drawing the required number of slips out. alicmaiivcly, wlr can usc
compurers to
generate the random numbers selectcd.
Advantaqes
(i) Analysis ofdata is rclativety easy.
(ii)The data collecred is not biased.
Disadvantages
(D tt is difncult or impossible to identii/ every member of the population, to be issued
a
number(difficulty in obraining the sanpling frame).
(i0 Ever if we can idcntify every nlemb€I of thc population, wi: may not be ablc to get
access to some mcmbers who have bcen chosen lor thc sample_
(2) StratiliedSampling
ln ceftain invostigations, variables such as age and gender in the population have
an influeirce on
the results- For example, lt1 a study on thc taste ;f music, rge oi the individuals
studied will
c28 3-
influence the responses signilicantly. In such a case, it will be usetirl to ensu.e that the sample
rcprcsents the proportions ofmutually exclusive subgroups (ty age), in the population.
Suppose that in a company thl: proportions ofstaffin different age-groups are as follows:
To obtain a sample of 200 staff, we draw random samples from the age-groups with sample size
in thc same proportion as the sizc ofeach age-group-
Advantaqes
(t It is more likely to give a good represeotative sample ofthe population.
(i0 The data obtai ed liom cach stmtum can be anall,zed separately, and this usually gives
more accurate estimatcs of the population pammcters, compared to simple random
sampling.
Disadvantages
(i) lt nlay be hard to obtaio the sampling lizune needed
(i0 More diflicult to conduct as compared with (simple) random sampling. l-here may not
be sufficient inlbrmation about the population to delineate the subgroups.
(iii) The strata may not bl- clcarly dehned.
(iv) ll is rclativelr morc rinre consurning.
(3) SystematicSampling
c28- 4 ,
Advantages
It is easier to conduct as compared with othcr t)?es ofsampling
1ii
(iD lt is more cvenly spr-ead over the population. The use ofwide-spacing ofk individuals guards
against the list consistir4 ofclustcrs ofsiinilar individuals.
Disadvantages
(D It is not always possible 1o oblain a valid sampling ftame of thc popuiation and to numbc.
each individual.
(iD The sample obtaincd can be biased when thc members of the population have a periodic
or
cyclic pattem of occurrence in the sampling framc. As an ei:rmple, if a sampiirrg frarn"
consists of a list of married couples in the order of husband, wife,
husband, rviie, h'usband,
wife..-.etc, then if every tcnth pcrson is selected alier choosing thc first of the
husbands, then
the sarnple will likely to consist only ofmales!
Social class
Hi Midrllc Low
Age/Sex M F M F M F
I I
30-44
3
4)-64 1 2 3
I 2
Advantirqcs
(D Cost is lower bccalrsc the sarnplc size can bc smaller to rDecl the objcctive.
(ii) The infonnation car bc collected quickly.
(iii)The sampling docs not require a sampling liame.
c28 5-
Disadvantaqcs
(i) lt is not a good representative ofthe population as compared with other types ofsampling .
(ii) It is non-random
(iii) It is biased as the inteNiewer may simply select those who are easiest to intewiew
NB: Quota sampling is vc.y uselll and convenient though it is non-random in nature. Info.mation
gathered Aom this twe of sample should be treated with caution.
Example 1.3
A sample of 5 is to be selected liom a class of 15 boys and l0 girls, Desclibe how you would choose
thc conlmittee using
(i) random sampling
(ii) stratifi ed sampling
(iii)quota sampling
(iv)systematic sampling
Solution
(i) To fonn a sampling frame, assign a number to each sfudent in the class. Write the numbers I
to 25 on strips of papers, put thcm in a contaiDer, mix them up well and draw 5 numbers out-
Select the 5 students assigned to thesc 5 selected numberc-
(ii) Since the proportions ofboys and girls in the class rre l5/25 and I U/25 respectively, u,e need
to randonly select (i5l25)x5:3 6ot. urtd,l0/25)rS = 29.1". We assign the nurnbels I to 15
to the boys and randomly select (as in part (i)) 3 numbers. Wc sclcct the 3 boys that are
i
assigned these 3 numbcrs. Similarly, we assign thc numbers to 10 to the grls cnd rardom
selcct 2 numbers bcfore selccting the corresponding girls.
(iiD We decide on the propodions ofboys and girls ifl the sanple of 5. They need mrt be l5/25and
10/25. Supposc we decided kr selcct 3 girls and 2 boys, we can selecl any 3 girls, fnrm l0 in
the class, and any 2 boys, liom l5i[ tho class, that arc most cool]erative ot convenient to us-
They are not randoinly selected within their subgroups (unlikc stratified sampling).
(iu) Number the students in the class fiom 1 to 25 (forming the sampling framc). Since sample
size is 5 and population siz,e is 25, k : 25/5: 5. Select any number from lto tr (:5), say we
select 3. Then the students to bc selected are those numbered 3,3+k,3+2k, .....i.e. those
numbered 3,5,13, 18, and 23.
Examplc 1.4
An employnent agoncy wants to estimate the number of unemployed people in a HDB new town,
and so decide to obtain infomation Aom a sample ofits adult who is ofworking age. Give onc
reason why it would not be appropriate to obtain the sample by stopping adults at the town centre's
superrnarket during onc working day.
Solution
It misses out peoplc who do not shof at that supennarket (biascd sample).
c28 6
Example 1.5
In a city, the following infonnation is availabie from the registry officc-
A sample of40 residents is to bc chosen for an intervicw. Discuss the following sainpling methods:
(i) Stratilied Sampling
(ii) Quota Sampling
State the advantages and disadvantages ofcach mcthod.
Solution
(i) In stratified sampling, wc rrec(l to rdndomh) selecl ('7441930\\40 Chinese, (93/930)r{0
Malays, (65/910)x40 Indians and (28/930)x40 Others, -fiius, we randomlv select 32
Chinesc
from 744 Chinesc, 4 Malays from 93 Malays, 3 Ildians liom 65 lndi:rns and 1 person
liom 28
people ofother raccs.
Advantages:
(1) The sample rcpresents the population betfer than samples obtained through
other sampiing
methods.
(2) TLe data lronr each race group ca be analyzed sepantcly to
llivc more detcils and thc results
can bc more represcntalivc ofthc popuiation.
Disadvantages:
(l) It is more diflicultto conduct. The sampling frame may not bc availablc and thus it may bc
impossible to have random sclection within each racc
ltroup.
(2) It is morc time consuoliLig.
(ii) In quota sampling, the proportioN ofthc diflirent raccs ur the salnple need not be thc
same as
those in the population. Supposc we docide to select 10 people from
each subgroup. The 10
pcople in each subgroup nced not be rundomly selcctett. wc can sclcct thosc
that are most
availablc.
Advantages:
(1) Data can be collected iastcr .
(2) Sampling liame is nor required.
Disadvantagcs:
(1)1'he sample obtained does rrot represcot fhe population as well as
samples obtained through
other methods
(2) l hc sarnlle is nnt randoml) selcct(,1.
c28 7-
Calculalion o{unbiased estimates ofthe populdtion mean a dvariance from a sdnple-
A sanple statistic is any numerical value describirg a characte.istic ofa samplc. It is calculated f'rom
Wrcn population parameters ate u known, sample statistics are used to estimate thcse paftlmeters-
Such a statistic is called at estirnator-
2{ l+4+31 5
T'NBIASED [,STIMATE
Consider a populatior with an uDknown paramefer 0. If I is a statistic derived fiom a randorn sample
of the population, thcn T is at unbiased estitnator of 0 if E(1) = 0 .
c28- 8 ,
Unbiased Estimators of the Population Mean u and Variance o2
From a population with unloown mean p and unknown variance 62, take a random sarnple ofsize
n
and let the sample mean bc
- a _ I +
Y=
x,' I r,I +............-r._ ,-
n n?' -\
:6nn;I
and the sarnple variance be
r(",)' /
---\'{/
\,
I c 1, =X .
lleeI
To show F is an unbiased cstimator lbr F.
l.
--lnpl-
n
p
Note:
Since each ,l is drawn from the distribution of-y.
Therclbre -Y is an unbiascd cstinator ldr /. E(.X)=Eol= lr
i." t :fi(samplevariance)
,rI
=,_1l I (x
)'
-F)'l ;
=*[r'''' (I(";))'l
, rin Mrr)r. .....( r'
]
c28 9-
Altematively,
- I n-l
(", il'
(in MFI 5)............. ..................(2)
Note:
If thc sample data given is in the form I("-") -a | (" - c)t , *lrere c is a constant, then
11 -
I(-' .) lc and
n
EG_dI
(3)
Example 2.1
Obtain the unbiased estimates of thc populetron mern and varianc€ lrorn each of the lollowiig
samples drawn:
(D 19.30, r9.61, 18.27, 18.90, 19.t4,19.90,18.76, 19.10;
(iD n=50, Ir:1150, [r'?=28990
{iii) n 12. r = 2.1.5. l(r .)' aa.z:
(i")
Number, x 17 1g 22 25 3l 33
Frequency. / 3l 13 I 4 3 8
1v1
n = 200,
)(x-:oo)=20t2, l(r 3oo)'z =525262
1n4
n = 50, f r = 1s00, l(r - zo)' :52520
c28- 10 -
Soln:
i) Using GC, enter lhe valucs as a list undcr I_r.
L1 31 l-Var Stat-s
IE x= 19. 122s
19.61 Ex=152- 98
lE.at
1E.9 Exr=292?. 1422
1g-tq 5x=.5845719544
19.S
18"76 sx=.4719938451
Lrfil=1 t.3
Ncxt, u-sing
<STAT>,<CALC>, <l: l-Var Stat* <Lr><ENIER>
]
, )
s0 I
-'-fronon-(.1-l!91
4eL I
sr.8
(iii)
, )tr-rt2 4B,j)
Unbiascd estimator ofthe populatior varirnls 15 J'=L - ";;' =4.43
cr28 ll -
(iv)
Using GC, entcl values ofr under Lr and their corresponding frequcncies under L2
17 3t
Ls 13
2Z 1
2S rt
l1 3
L2(1i =
USC<STAT>, <CALC>, <l:1-Var Stats> <Ll>,<,>,<L2>
1-Uar Stats
Jt-LEr. oo,.1,J,.1,J,J,J
-4J( L --CD/Jl
5x=5.914988421
sx=5.865411798
.11=68
I
x-20.9, S"'? = 34.986 r:35.0 (3s.f.)
- leel
' lrrrro, 2ol2']l
200 I
=ffi<zszs.to64) = 2531 .le x 2s38
c28- 12 -
(vi) n-50, l.r = 1s00, )(x-zo)'=s2s2o
x I" =
50
1500
50
=10
c28, 13 -
L Distribulion ofsample means from a normal population.
2. Use of the Central Ltmit meoretu to treat sampL rned6 as having rormel distributioa
when the sample .r{ze is sufrcientl! large.
X, +)(^ +...-....-...X \
n1X;=rl
,)
: !1E8,\ + E(x, ) +...........z(x, )l
n
-tn (,ri- o
u-tFl:u*[ xl + X, +.........--.X,
,)
)
1-
=n',lVar( X,l+Var( X,l+.............Var( X,lf
Ir .
: n , lo'" + cr' + .......... ..o'^r I
r '\
: 1rVo')
n
o2
n
Example 3.1
A population consists ofall numbers liom 0 to 99- Samples of5 numbers are selected as follorvs:
l'! sample 5l ,'/7 , 2'7 , 46, 40
r sample :42.33,12,90,44
J Sarnrrle 46,62,16,28,98
4th sarnple 93,58,20,41,86
5'h sample 19, 64, 8, 70, 56
Calculate the m()ms ofthese sanlples and thc in€an and standard deviation ofthcsc sample means.
c28 14
Soln:
_rr + 12 +............-r,
Using , tho rcspcctive sample means arc 48.2, 44.2,50.0, 59.6 and
n
43.4
Using GC, the mear (ofall these sarnple rneans) is 49-0g and the sfandard deviation is 5.g0.
(Nnte: we use
Oi-l tnstead of S because in GC, 6 is the variance oftho data entered into the GC,
which is requircd here)
Can lou see the led of a distribution of sample means here?
I ^ N(u. )
-n
X, r X, r............X,, N(n1r.no')
0)Xt+X2+..-.........X,-N(np,no'),opproximatelybycentralLimit.l.trcorem.
6'
(2) X - N(p, ), approximarety by Ccotral Limir.I.hcorem.
-
Note:
. The approximatior gets better as ,? gets larger_
o
is known as thc slazzlard error ofthe sample mean_
G
cr28 t5 -
Solving problems involving the sampling distributio
Example 4.1
Thc heights of a particular species of plant fbllow a normal distribution with mear 2l cm and
variancc 90 cm. A random sample of 10 plants is taken and the mean height calculatcd.
Find the probability that this sdmple mean lies between l8 crn and 27 cm.
Soln:
Let r.v. -tr denotes the hcight ofa plant in cm.
Then-Y- N(21, 90)
Example 4.3
A large number ofrandom samples ofsize z are taken from the distribution ofxwhcrc
I- N(74, 36) and thc sample meais are calculatcd.
IfP( X >'12) = 0.854, find, expressed to thc nearest integeE the value ofr.
a2R t6-
Soln:
Mcthod I
-)r- N(7,1, 36).
- - N1z+,i9;y
n
t4
-lqt]\
.s5q0E
11 .8655q
1Z -a75Sq
1l .ES5t9
1t{ .893Ell
15 .5{lE5
Method 2
t6
x-N(74,-:)
n
11F'22.y:6.s5a
r)
rl . - 72
,70 i=o,ss+
l6l
I J;'
i.e. whcre Z-N(0,1)
/ r'
plz<-tnl=r-o'rto
l. 3J
=0.146
_____
_J; t r.t53144303
Usrng c('. we obrain ' "
3
/? ! l0
c2B- 17 -
l,xample 4.4(Do it yourself)
A raodom sample of 16 observatioos is to be drawn from a Nomal distribution having mean l l and
staodard deviation 3- L"t X denote the sample mean, Find, corcct to three decimal places, thr:
value ofc for which P
Examplc 4.5
If a random sample of size 50 is taken from each of the following distributions, find, for each case,
the probability that the sample ntean exceeds 5.
(a) x-'Po(4,5),
(b) n.- B(9, 0.5),
Soln:
(a) f '' Po(4,5).
E(-l) = 4.5, Var(X) :4.5
c28,!8-
Using GC, P( X > 5) : 0.00921
Note: Clontinuitv correctiou is NOT needed as \,vc arc dealing witb sample rnean (whiclr is cotltinuous
raldom va ablc)-
Ngllg: For questions involving sample mean results, look for thc phrase .,ptqlillliqll3!|lll9_qqlpb
mean / arithmetic mean / average valuc".
Dxamplc .1.7
'fhc discretc random variable Xis such that
E(-\,) : 4.5 and Var(, : 0.75_ Sixty obscrvations of Xare
laken and 7 is the totai sunt ofthe observations, I.ind tho snallest value ofl such that p(f < /) > 0.95
Soln:
T - Xt+ Xr+ X.+.........X0u
Since n = 60 is large, by CLT, f - N(60x4.5,60x0.75)
z - N(270,45)
P(Z< r) > 0.95
'fherefore,
t >281.O34
c28- r 9
Use GC table to cherk answer
Refcrcnces:
i. Introducins Statistics(2"'r Edition) by Craham(Jpton/Ian Cook.
2. Advanced Modular Mafhematics. Statistics 2 . for A and AS level 6), Gerltld Westot'er
3. Statistics For Real Lilb Srunple Surveys by Scrgey Dorofeev / Peter Grant
4. H2 Mathematics. A ComDrehensivo Guide for'A' Level by f'ederick IIo, David Khor, Yui-
P'ng Lam, B.S. Ong.
c28 20 ,