Beruflich Dokumente
Kultur Dokumente
Note:
Ratio , Regression & Difference estimation is NOT a sampling
technique. It is a statistical technique that can be applied for
different sampling methods.
BASIC IDEAS:
y N y y
x N x x
y
x x
ny
y
x i n1
nx
yi
x
i 1
Note:
i) If N unknown, y N y cannot be determined.
ii)If N known, choose either
y
x
y N y or y x
iii)
If x and y are highly correlated i.e. if
information for the prediction of y then
y
contributes
y
x should be better than y N y , which
x
depends on y .
a)
Rr
y
i 1
n
x
i 1
b) y y r x
y
y
x
x ; R rate of change
y
x
y .
x
i
c) y y r x
a) Population Ratio: ;
yi
V r V
x
i
N n 1 2
Sr
nN
2
where Sr
y rx
i
i 1
n 1
V y x V r
2
N n 1 2
Sr .
nN x
and
yi rxi yi 2 r 2 xi2 2r xi yi
2
i 1
Nx
as an estimator.
c) Population Mean:
V y x2V r
N n 2
Sr
nN
Example 1:
5
xi
6.7
8.2
7.9
6.4
8.3
7.2
6.0
7.4
8.1
9.3
8.2
6.8
7.4
7.5
8.3
9.1
8.6
7.9
6.3
8.9
yi
7.1
8.4
8.2
6.9
8.4
7.9
6.5
7.6
8.9
9.9
9.1
7.3
7.8
8.3
8.9
9.6
8.7
8.8
7.0
9.4
Mean
yi - rxi
-0.042330
-0.341359
-0.221554
0.077476
-0.447962
0.224660
0.103884
-0.288544
0.265242
-0.013981
0.358642
0.051068
-0.088543
0.304854
0.052038
-0.100777
-0.467768
0.378447
0.284078
-0.087573
Median
SD
6
x
20
7.725
7.900 0.947
y
20
8.235
8.350 0.957
yi - rxi
20
-0.000
0.0185 0.260
_____________________________________
Scatterplot of y vs x
10
6
6.0
6.5
7.0
7.5
8.0
8.5
9.0
9.5
Solution:
Estimate of R:
Rr
..
From sample:
xi 154.5 , yi 164.7 ,
n 20, N 1000
Then,
2
i
1210.5 ,
2
i
1373.71,
x y 1288.95 ,
i i
r=
164.7
=1.066
154.5
y rx y
i 1
2
i
r 2 xi2 2r xi yi
1.3157
N n 1
2 V r 2
nN x
x
is unknown, use
y rx
i
n 1
to approximate
x .
1000 20 1 1.3157
B = 2 V r 2
20 1000 7.725 19
2
0.015 0.02 .
Conclusion:
We estimate the ratio of current real estate valuation to that of two
years ago to be r =1.07, and we are quite confident that the error of
estimation is less than 0.02. the true ratio R for the population
should be between: 1.05<R<1.09.
Note: The bound on the error of estimation is quite small,
should be a fairly accurate estimate of R.
r Z
V r
S xy
Sx S y
1 n
xi x yi y
where S xy
n 1 1
1 n
S
xi x
n 1 1
2
x
1 n
S
yi y
n 1 1
2
y
S xy
==>
S x2 S y2
But,
n
yi rxi
i 1
n 1
xy
S 2rS xy r S and
Sx S y
2
y
2
x
( y rx ) ( y y rx r x)
[( y y) r ( x x)]
( y y) 2r ( x x)( y y) r ( x x)
2
(Divide by (n -1))
Then,
1 f
V r
n
where f
1
2
2 2
S y r S x 2r S x S y
x
n
sampling fraction .
N
Example 2:
Orange
Sugar Content
(pounds)
1
2
3
4
5
6
7
8
9
10
0.021
0.030
0.025
0.022
0.033
0.027
0.019
0.021
0.023
0.025
Weight of
orange
(pounds)
0.4
0.48
0.43
0.42
0.5
0.46
0.39
0.41
0.42
0.44
(yi - rxi)
-0.0016207
0.0028552
0.0006828
-0.0017517
0.0047241
0.0009862
-0.0030552
-0.0021862
-0.0007517
0.0001172
y 0.246 , y
i
2
i
0.006224 ,
x 4.35, x
i
2
i
1.9035 , xi yi 0.10839
10
i 101.79 lbs.
x
i) y r x
x
y
Note:
N is unknown, we assume the finite population correction:
N n
1 f 1
N
i.e. f 0.05
V y x2V r
N n 1 1
N n x
1 1
n x
2
x
y rx
i
y rx
i
2
x
n 1
n 1
1 1
2
= (1800)
(0.0024)
10 0.435
2
= 9.863
2 V y 6.3
11
Summary:
The ratio estimate of the total sugar content of the truckload of
oranges is:
yi 1750 ,
i 1
y 17.50 .
xi 1200
i 1
100
100
i 1
i 1
i 1
yi2 31650 ,
xi2 15620 ;
y i xi
22059 .35 .
12
Solution:
The estimate of y is :
y = rx
Where x =
x
N
y ( x )
y
750
(12.5) 18.23
1200
100
( y rx ) y
2
12500
12.5
1000
r 2 x 2 2r xy 529.80
= 0.44
2 V B
ie 2
V (r ) B
(Solve for n)
x , calculated from
14
Example 4:
A manufacturing company wishes to estimate the ratio of change
from last year to this year in the number of man-hours lost due to
sickness.
A preliminary study of n* 10 employee records is made. The
company records show that the total number of man-hours lost
because of sickness for the previous year was x 16300 . Assume
N=1000 employees.
Worker Man-hours lost
in previous
year, x
1
2
3
4
5
6
7
8
9
10
12
24
15
30
32
26
10
15
0
14
Man-hours
lost
in current
year, y
13
25
15
32
36
24
12
16
2
12
yi - rxi
n* 10 employees
x 16300
B 0.01
15
N=1000 employees.
From the sample, determine
yi 187 1.051
r
xi 178
where
( y rx ) y
2
2
i
r 2 xi2 2r yi xi =
y rx
n 1
x
N
2
2
= S r = 3.46
16300
16.3
1000
(0.01)2 (16.3)2
B 2 x2
0.006642
D
4
4
ND
1000(3.46)
342.5 343
1000(0.006642) 3.46
N 2
B2
n
where D
ND 2
4
Prove:
y r x
N n 1 2
2
nN
V y x2 V r
2 x V r B
B2
V r
4 x2
2
B2
N n
2
2
nN x 4 x
2
NB 2
N n
2
2
n x 4 x
2 2
N N x B
1
2 2
n
4 x
N
NB 2
1
n
4 2
N
N 2
B2
n
where D
NB 2 2 ND
4 .
1
4 2
17
B2
where D
4N 2
y r x
2 V ( y ) B
N n
V (r )
2
nN x
or
2 x V r B .
Sr2 3.46
i) Determine the sample size required to estimate y with a bound
on the error of estimation B=100.
B2
1002
D
0.0025
4 N 2 4(1000) 2
n
ND
581
18
B 2 0.052
D
0.000625
4
4
n
ND
848
N ArA xA N B rB xB
N
where rA
yA
xA
rB
yB
xB
y :
19
nA
N N nA
V yRS A A
N N A nA
i 1
rA xi
nB
N N nB
B B
N N B nB
2
nA 1
i 1
nB 1
2
SA
S B2
y st
x st
(where rc
i) Mean
y st
x st
x st .
Then,
N A x A N B xB
N A NB
y r x
N N nA
A A
N N A nA
2
x A xB
N A NB
y :
nA
for
y
N A y A NB yB
x = st x
N A x A NB xB
x st
where x
y :
yRC
V yRC
rB xi
i 1
nA 1
nB
y r x
N N nB
B B
N N B nB
2
i 1
nB 1
10
8
0
14
12
6
4
0
8
16
78
Man-hours
lost
in current
year, yB .
8
0
4
6
10
0
2
4
4
8
46
yB
0.589
xB
Solution:
21
1000 18.7
1500 4.6
i) yRS
16.3
8.53 9.87
2500 17.8
2500 7.8
V yRS 0.40
ii) yRC
V yRC
y st
x
x st
2500
0.417.8 0.6 7.8
2
2
1000 990
1500 1490
2.39
4.00 0.66 .
2500 1000(10)
2500 1500(10)
2
iii)
Estimator type
Separate
Combined
Estimate
9.87/
10.10
Std. deviation
0.631
0.821
Summary:
22
Or
and
is linear
23
y 0, x 0
y
A
y
B , B .
x
x
x
Assume
24
x y
xy
b=
S xy
n
2
2
S
x
x
n
Estimated variance
S2 N n
For simple random sampling,: V y
replace
n N
S2
N n 2
2 2
[ S y b S x ]
nN
2
Sy
N n 2
2
,
[Since
b
=
]
S
y
y
Sx
Nn
2
N n 2
S y (1 )
Nn
25
Example 6:
A mathematics achievement test was given to 486 students prior to
their entering a certain college. A simple random sampling of
n=10 students was selected and their progress in calculus grades
was reported below. It is known x 52 for all 486 students taking
the achievement test.
Estimate y for this population and place a bound on the error of
estimation.
Student
1
2
3
4
5
6
7
8
9
10
Achievement test
score, x
39
43
21
64
57
47
28
75
34
52
Final calculus
grade, y
65
78
52
82
92
89
73
98
56
75
Solution:
First step: plot a scatter plot.
A strong +ve association between y and x.
A straight line, a reasonable model for this relationship.
26
Scatterplot of y vs x
100
90
80
70
60
50
20
30
40
yL y b x x
n
Where
x y
i 1
50
x
60
70
80
i 1
i 1
x y
n
x
x 2 i 1
n
i 1
n
0.766
yL 76 0.766 52 46 80.6
N n
V yL
MSE
Nn
27
Coef
40.784
0.7656
S = 8.70363
SE Coef
8.507
0.1750
T
4.79
4.38
R-Sq = 70.5%
P
0.001
0.002
R-Sq(adj) = 66.8%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
1450.0
606.0
2056.0
MS
1450.0
75.8
F
19.14
P
0.002
MSE
S = MSE
V yL
476
75.8 7.42 .
486 10
B 2 V yL 5.45
28
OR
y i o 1 xi a bxi
ei yi y i yi a bxi
SSE ( yi a bxi ) 2
MSE
SSE
n2
Estimated variance of
where
:
where
Regression Estimation:
yL y b( x x)
= y 1( x x)
= x y x x d
29
Example 7:
Auditors are often interested in comparing the audited value of
items with the book value. Suppose a population contains 180
inventory items with a stated book value of x $13320 .
Let xi the book value
yi audit value of the ith item.
(i) Plot a graph y versus x.
(ii) Estimate the mean audit value of y by the difference method.
(iii) Estimate the variance of
yD .
yi
xi
9
14
7
29
45
109
40
238
60
170
10
12
8
26
47
112
36
240
59
167
d i yi xi
-1
2
-1
3
-2
-3
4
-2
1
3
x 71.7, x 74.0, n 10
yD y
30
i 1
di d
n 1
d 2 nd
i 1
n 1
V yD
N n
nN
i 1
6.27
di d
n 1
0.59
( yi xi y x) yi y xi x
n 1
n 1
2
yi y
n 1
yi y xi x
n 1
xi x
n 1
= S y 2S xy S x
2
= S y S x 2S x S y
2
y ;
y
x
x
ii)
Ratio estimator: y r x
iii)
Regression estimator: yL y b x x .
iv)
Difference estimator: yD y x x x d
31
y .
(i) E 1 E 2 y .
(ii) Assuming equal sample size for both estimators.
Note: Variances usually as n . It is convenient to describe the
relative size of the two variances by looking at their ratio.
Relative Efficiency of 1 to
V ( 2 )
RE 1
.
2 V (1 )
(or
wrt
2 )
is given by;
1
1
If RE
V 2 V 1
1
is an (favorable) estimator of
1
RE
2
If
2
V 2 2V 1
in
32
1
RE
1
If
2
Note:
Comparing the Ratio Estimator to the Simple Mean per Element y :
S y2 N n
n N
SRS:
V y
Ratio:
N n
V S y2 r 2 S x2 2r S x S y
nN
y
RE
y
V y
S y2
2 2 2
V
S y r S x 2r S x S y
y
y
RE
y
if S y2 r 2 Sx2 2r S x S y S y2
or
r 2 S x2 2r S x S y
or rS x2 2S x S y (assume
or
Sx
x
If
r 0)
Sx
1 S x 1 yS x 1 x
1 CV ( x)
r
=
2 S y 2 xS y 2 S y
2 CV ( y )
y
1
2
if .
33
Regression: yL y b x x
V yL
to the Regression
n
2
2
N n 1 n
2
b
x
i
i
Nn n 2 i 1
i 1
N n 2 2 2
V yL
S y b S x
Nn
And since
Sy
Sx
becomes
2
N n 2
V yL
S
S y2
y
Nn
2
N n 2
S
1
y
Nn
yL
S y2
1
RE
2
2
y
2
1
Sy 1
34
yL
Reg:
2 N n
V yL S y2 1
Nn
Ratio:
N n 2 2 2
V
S y r S x 2r S x S y
nN
Then,
yL S y2 r 2 S x2 2r S x S y
RE
1
2
2
Sy 1
y
if
r 2 S x2 2r S x S y S y2
or
Since
S y rS x
S y bS x ,
bS x rS x 0
2
b r 0
2
35
Diff:
yD
N n
V yD
i 1
Nn
n 1
n
2
N n 1
i
i
Nn n 1 i 1
di d
N n 2
2
S y S x 2 S x S y
Nn
yD
S y2
RE
1
y S 2 S 2 2S S
x
x y
If
2S x S y S x2
Or
Sx
2S y
1
2
( 1 ).
2
36
yL ,
2
N n 2
V yL
S
1
y
Nn
N n 2
2
V yD
S y S x 2 S x S y
Nn
RE yL
yD
If
S x2 2S x S y S y2
Or
Sx S y
Sx S y
37
(i) If
1 CV x 1 S x x
2 CV y 2 S y y
1
2
V y V y .
38
5.7 SUMMARY
a) If the relationship between ys and xs is linear
Use ratio, difference or regression.
b) If
rL
yi
ru
xi
y A Bx
A 0 y Bx
Use ratio.
c) If
rL
yi
xi
(wide range)
B 1
difference:
yx A
(ii) If
B 1
regression:
y A Bx .
39