5 - Ratio Regression and Difference Estimation - Revised

5.
RATIO, REGRESSION AND DIFFERENCE

ESTIMATION
Estimation of the population mean and total was based on a sample

of response measurements y1 , y2 , , yn , obtained by SRS and
stratified random sampling.
Suppose associated with each unit in the population is
measurement X which is highly correlated with Y, the variable of
interest. By measuring y and x 1 subsidiary variables, we can
obtain additional information for estimating the population mean,
y . Assume that x is known.
It is basic to the concept of correlation and provides means for

development of a prediction equation relating y and x by the
method of least squares.
3 methods of estimation based on the used of a subsidiary variable
x.
1) Ratio
2) Regression
3) Difference
Note:
Ratio , Regression & Difference estimation is NOT a sampling
technique. It is a statistical technique that can be applied for
different sampling methods.
BASIC IDEAS:
The estimated ratio of one variable to other related variable

(auxiliary variable) can be used to obtain more precise estimate
(i.e. smaller sampling variance) of population mean or total that
can be obtained based on one variable, when the population value
of auxiliary variable is known or can be measured easily.
One auxiliary variable is used.
Works well when auxiliary variable is highly correlated with

variable under investigation.
Examples of ratio estimation:

To estimate the amount of juice that can be produced from a
truckload of oranges, a sample of oranges can be squeezed to
determine the ratio of juice to weight. Then the estimated ratio is
multiplied to the weight of oranges to estimate the total amount of
juice.
It will be easier to weight oranges in the truck than counting them.
We can avoid the need to know N by noting the following two
facts.
1. The sugar content of an individual orange, y is closely related
to its weight x .
2. The ratio of the total sugar content y to the total weight of the
truckload x is equal to the ratio of the mean sugar content per
orange, y , to the mean weight x .
y N y y
x N x x
y

x x
y1 , y2 , , yn SRS with size n.

Estimate y y
Estimate x x
y
y x
x
n
ny
y
x i n1
nx
yi
x
i 1
In this case, the no. of elements in the population, N, is

unknown.
So, we cannot use the simple estimator N y of the population
total y .
Note:
i) If N unknown, y N y cannot be determined.
ii)If N known, choose either
y
x
y N y or y x
iii)
If x and y are highly correlated i.e. if
information for the prediction of y then
y
contributes
y
x should be better than y N y , which
x
depends on y .
5.1 RATIO ESTIMATION

RATIO ESTIMATION USING SIMPLE RANDOM SAMPLING
Population size: N elements.
y1, y2 , , yn SRS of size n to be drawn from finite population.
x and y : correlated
Let
sample mean for x .

y sample mean for y .
x sample total for x .
y sample total for y .
To perform statistical inference for

a) R population ratio
b) y population mean, y .
c) y population total, y .
Definition:
Ratio Estimator for:
n
a)
Rr
y
i 1
n
x
i 1
b) y y r x
y
y
x
x ; R rate of change
y
x
y .
x
i
c) y y r x
Variance estimator for:

4
a) Population Ratio: ;
yi
V r V
x
i
N n 1 2

Sr
nN
2
where Sr
y rx
i
i 1
n 1
Note: If the population mean for x, x is unknown, use x to

approximate x .
b) Population Total:
V y x V r
2
N n 1 2
Sr .
nN x
and
yi rxi yi 2 r 2 xi2 2r xi yi
2
i 1
Note: If x is unknown, use
Nx
as an estimator.
c) Population Mean:
V y x2V r
N n 2
Sr
nN
Example 1:
5
In a survey to examine trends in real estate, we interested to

investigate the relative change over a two-year period in the
assessed value of homes in a particular community. A simple
random sampling of n=20 homes is selected from N=1000 homes.
Two variables x and y are;
y = assessed value for this year.
x = corresponding value for two years.
(a) Estimate R, the relative change in real estate valuation over
two given two-year period.
(b) Place a bound on the error of estimation.
Data and calculation for the real estate valuation survey

($10,000 units)
Home
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
xi
6.7
8.2
7.9
6.4
8.3
7.2
6.0
7.4
8.1
9.3
8.2
6.8
7.4
7.5
8.3
9.1
8.6
7.9
6.3
8.9
yi
7.1
8.4
8.2
6.9
8.4
7.9
6.5
7.6
8.9
9.9
9.1
7.3
7.8
8.3
8.9
9.6
8.7
8.8
7.0
9.4
Mean
yi - rxi
-0.042330
-0.341359
-0.221554
0.077476
-0.447962
0.224660
0.103884
-0.288544
0.265242
-0.013981
0.358642
0.051068
-0.088543
0.304854
0.052038
-0.100777
-0.467768
0.378447
0.284078
-0.087573
Median
SD
6
x
20
7.725
7.900 0.947
y
20
8.235
8.350 0.957
yi - rxi
20
-0.000
0.0185 0.260
_____________________________________
Scatterplot of y vs x
10
6
6.0
6.5
7.0
7.5
8.0
8.5
9.0
9.5
Solution:
Estimate of R:
Rr
y total curent valuation of 20 homes

x total valuation of 20 homes 2 years ago
..
From sample:
xi 154.5 , yi 164.7 ,
n 20, N 1000
Then,
2
i
1210.5 ,
2
i
1373.71,
x y 1288.95 ,
i i
r=
164.7
=1.066
154.5
We estimate that the real estate valuation has increased

approximate 7% over a two-year period in the area.
20
y rx y
i 1
2
i
r 2 xi2 2r xi yi
1.3157
N n 1
2 V r 2

nN x
x
is unknown, use
y rx
i
n 1
to approximate
x .
1000 20 1 1.3157
B = 2 V r 2
20 1000 7.725 19
2
0.015 0.02 .
Conclusion:
We estimate the ratio of current real estate valuation to that of two
years ago to be r =1.07, and we are quite confident that the error of
estimation is less than 0.02. the true ratio R for the population
should be between: 1.05<R<1.09.
Note: The bound on the error of estimation is quite small,
should be a fairly accurate estimate of R.
The 1001 % confidence interval for ratio R:
r Z
V r
Note: V r can be written in the form which involves the

correlation coefficient between x and y.
S xy
Sx S y
1 n
xi x yi y
where S xy
n 1 1
1 n
S
xi x
n 1 1
2
x
1 n
S
yi y
n 1 1
2
y
S xy
==>
S x2 S y2
But,
n
yi rxi
i 1
n 1
xy
S 2rS xy r S and
Sx S y
2
y
2
x
( y rx ) ( y y rx r x)
[( y y) r ( x x)]
( y y) 2r ( x x)( y y) r ( x x)
2
(Divide by (n -1))
Then,
1 f
V r
n
where f
1
2
2 2
S y r S x 2r S x S y
x
n
sampling fraction .
N
Example 2:
In a study to estimate the total sugar content of a truckload of

oranges, a random sample of n=10 oranges was juiced and
weighted. The total weight of all the oranges, obtained by first
weighting the truck loaded and then unloaded is 1800 pounds.
Orange
Sugar Content
(pounds)
1
2
3
4
5
6
7
8
9
10
0.021
0.030
0.025
0.022
0.033
0.027
0.019
0.021
0.023
0.025
Weight of
orange
(pounds)
0.4
0.48
0.43
0.42
0.5
0.46
0.39
0.41
0.42
0.44
(yi - rxi)
-0.0016207
0.0028552
0.0006828
-0.0017517
0.0047241
0.0009862
-0.0030552
-0.0021862
-0.0007517
0.0001172
(i) Estimate y , total sugar content for the oranges.

(ii) Determine a bound on the error of estimation.
Solution:
n 10
x 1800 lbs. (total weight of all oranges)
From the sample:
y 0.246 , y
i
2
i
0.006224 ,
x 4.35, x
i
2
i
1.9035 , xi yi 0.10839
10
i 101.79 lbs.
x
i) y r x
x
y
Note:
N is unknown, we assume the finite population correction:
N n
1 f 1
N
i.e. f 0.05
The sample mean x must be used in place of x .

ii) Then,
V y x2V r
N n 1 1

N n x
1 1

n x
2
x
y rx
i
y rx
i
2
x
n 1
n 1
1 1
2
= (1800)
(0.0024)
10 0.435
2
= 9.863
The bound on error of estimation:
2 V y 6.3
11
Summary:
The ratio estimate of the total sugar content of the truckload of
oranges is:
y 101.79 lbs. , with a bound on the error of estimation of 6.3.

We are confident that the total sugar content y lies in the interval
101.79 6.3 . i.e. 95.49 to 108.09 lbs.
Example 3:
A company wishes to estimate the average amount of money y
paid to employees for medical expenses during the first 3 months
of the current year. A random sample of 100 employee records is
taken from the population of 1000 employees. The sample results
are summarized below;
(i) Estimate y .
(ii) Place a bound on the error of estimation.
n 100 ;
N 1000 .
Given:
Total for the current quarter:
100
yi 1750 ,
i 1
y 17.50 .
Total for the corresponding quarter of the previous year:

100
xi 1200
i 1
Population total x for the corresponding quarter of the previous

year:
x 12500
100
100
100
i 1
i 1
i 1
yi2 31650 ,
xi2 15620 ;
y i xi
22059 .35 .
12
Solution:
The estimate of y is :
y = rx
Where x =
x
N
y ( x )
y
750
(12.5) 18.23
1200
100
( y rx ) y
2
12500
12.5
1000
r 2 x 2 2r xy 529.80
Bound on the error of estimation:

N n ( yi rxi )
2 V ( y ) = 2
Nn
n 1
= 0.44
We estimate that the average amounts of money paid to employees

for medical expenses to be RM18.23 with the bound on the error
of estimation is 0.44.
5.2 SELECTING THE SAMPLE SIZE

13
The amount of information contained in the sample depends on;

i) Variation in the data
ii) Number of observations n included.
The sampling procedure (design) need to be chosen, then the

investigator must determine the number of elements to be drawn.
Consider the sample size required to estimate a population

parameter R, or y to within B units for simple random
sampling using ratio estimators. The number of observations
required to estimate R, or y with a bound B on the error of
estimation is determined by solving the following equation for n:
2 V B
ie 2
V (r ) B
(Solve for n)
Sample size required to estimate R with a bound on the error of

estimation B
N 2
B 2 2x
n
D
2 where
4
ND
If 2 is unknown; i.e. if no past information is available to
calculate S r2 as an estimate of 2 , we take a preliminary sample
of size n * .
If x is unknown, replaced by sample mean
n * preliminary observations.
x , calculated from
14
Example 4:
A manufacturing company wishes to estimate the ratio of change
from last year to this year in the number of man-hours lost due to
sickness.
A preliminary study of n* 10 employee records is made. The
company records show that the total number of man-hours lost
because of sickness for the previous year was x 16300 . Assume
N=1000 employees.
Worker Man-hours lost
in previous
year, x
1
2
3
4
5
6
7
8
9
10
12
24
15
30
32
26
10
15
0
14
Man-hours
lost
in current
year, y
13
25
15
32
36
24
12
16
2
12
yi - rxi
Determine the sample size (n) required to estimate R, the rate of

change for the company, with a bound on the error of estimation
B=0.01.
Solution:
n* 10 employees
x 16300
B 0.01
15
N=1000 employees.
From the sample, determine
yi 187 1.051
r
xi 178
where
187 hours ; xi 178 hours, xi yi 4245 .
( y rx ) y
2
2
i
r 2 xi2 2r yi xi =
y rx
n 1
x
N
2
2
= S r = 3.46
16300
16.3
1000
(0.01)2 (16.3)2
B 2 x2
0.006642
D
4
4
ND
1000(3.46)
342.5 343
1000(0.006642) 3.46
We should sample approximate 343 employee records to

estimate R with a bound on the error of estimation of 0.01 hour.
Sample size required to estimate y with a bound on the error

of estimation B:
16
N 2
B2
n
where D
ND 2
4
Prove:
y r x
N n 1 2
2
nN
Variance for population ratio, V r

Therefore
V y x2 V r
2 x V r B
B2
V r
4 x2
2
B2
N n
2
2
nN x 4 x
2
NB 2
N n
2
2
n x 4 x
(multiply by N both sides)
2 2
N N x B
1
2 2
n
4 x
N
NB 2
1
n
4 2
N
N 2
B2
n
where D
NB 2 2 ND
4 .
1
4 2
17
Sample size required to estimate y with a bound on the error

of estimation B.
N 2
n
ND 2
B2
where D
4N 2
This can be found by solving for n:
y r x
2 V ( y ) B
N n
V (r )
2
nN x
or
2 x V r B .
Example 5: (Refer to example 4)

2
Sr2 3.46
i) Determine the sample size required to estimate y with a bound
on the error of estimation B=100.
B2
1002
D
0.0025
4 N 2 4(1000) 2
n
ND
581
ii) Determine the sample size required to estimate y with a

bound on the error of estimation B=0.05.
18
B 2 0.052
D
0.000625
4
4
n
ND
848
5.3 RATIO ESTIMATION IN STRATIFIED

RANDOM SAMPLING
There are two different methods for constructing estimators of a
ratio in stratified sampling:
i) Separate ratio estimator.
ii) Combined ratio estimator.
Separate ratio estimator:
Estimate the ratio y to x within each stratum and then form a
weighted average of these separate estimates as a single
estimate.
i) Mean y :
yRS
ii) Variance for
N ArA xA N B rB xB
N
where rA
yA
xA
rB
yB
xB
y :
19
nA
N N nA
V yRS A A
N N A nA
i 1
rA xi
nB
N N nB
B B
N N B nB
2
nA 1
i 1
nB 1
2
SA
S B2
Combined ratio estimator:

First estimate y by y st and similarly estimate
y st
x st
can be used as an estimator of
(where rc
i) Mean
y st
x st
ii) Variance for
x st .
Then,
N A x A N B xB
N A NB
y r x

N N nA
A A
N N A nA
2
x A xB
N A NB
y :
nA
for
y
N A y A NB yB
x = st x
N A x A NB xB
x st
where x
y :
yRC
V yRC
rB xi
i 1
nA 1
nB
y r x
N N nB
B B
N N B nB
2
i 1
nB 1
Example 6: (Refer to example 4)

20
Let the 10 observation given on man hours lost due to sickness as a

simple random sampling from company A.
nA 10, y A 18.7, x A 17.8, rA 1.05, and xA 16300
A simple random sampling of nB 10 measurements was taken
from company B within the same industry. (Assume companies A
and B form the population of workers of interest). The data is
given below. It is known that N B 1500 employees and xB 12800 .
i) Find the separate ratio estimate of y and its estimated variance.
ii) Find the combined ratio estimate of y and its estimated
variance.
iii) Compare (i) and (ii)
Employees Man-hours lost
in previous
year, xB
1
2
3
4
5
6
7
8
9
10
Total
10
8
0
14
12
6
4
0
8
16
78
Man-hours
lost
in current
year, yB .
8
0
4
6
10
0
2
4
4
8
46
nB 10; y B 4.6; x B 7.8; xB 12800; rB
yB
0.589
xB
Solution:
21
1000 18.7
1500 4.6
i) yRS
16.3
8.53 9.87
2500 17.8
2500 7.8
V yRS 0.40
ii) yRC
V yRC
y st
x
x st
0.4 18.7 0.6 4.6 16300 12800 10.10
2500
0.417.8 0.6 7.8
2
2
1000 990
1500 1490
2.39
4.00 0.66 .

2500 1000(10)
2500 1500(10)
2
iii)
Estimator type
Separate
Combined
Estimate
9.87/
10.10
Std. deviation
0.631
0.821
Combined ratio estimator gives the larger estimated variance and

so we should employ the separate ratio estimator.
However, the separate ratio estimator may have a larger bias since
each stratum ratio estimate contributes to that bias.
Summary:
22
i) If stratum sample sizes are large enough ( 20 ) so that the

separate ratio do not have large biases and so the variance
approximations work adequately,
Use separate ratio estimator.
ii) If stratum sample size are very small or if within-stratum ratios
are all approximately equal,
Use combined ratio estimator.
Estimator of the population total:
Or
5.4 REGRESSION ESTIMATION

Ratio estimator of mean
-most appropriate when relationship between
through the origin.
Regression estimator of mean
and
is linear
23
-A linear relationship between y and x but not necessarily pass

through the origin. Extra information provided by auxiliary
variable x may be taken into account.
y 0, x 0
y
A
y
B , B .
x
x
x
Assume
The estimator given next assumes that;

x 's to be fixed in advance (already be observed eg. last year).
y's to be random variables (response variable yet to be
observed).
Regression estimator of a population mean
(subscript L denote linear regression)

where
24
x y
xy
b=
S xy
n
2
2
S
x
x

n
Estimated variance
where MSE = mean square error of residuals from

the std. simple linear regression of on .
S2 N n
For simple random sampling,: V y
replace
n N
S2
with MSE to determine V yL .

Replace (n - 2) by (n - 1):
N n 2
2 2
[ S y b S x ]
nN
2
Sy
N n 2
2
,
[Since
b
=
]
S
y
y
Sx
Nn
2
N n 2
S y (1 )
Nn
25
Example 6:
A mathematics achievement test was given to 486 students prior to
their entering a certain college. A simple random sampling of
n=10 students was selected and their progress in calculus grades
was reported below. It is known x 52 for all 486 students taking
the achievement test.
Estimate y for this population and place a bound on the error of
estimation.
Student
1
2
3
4
5
6
7
8
9
10
Achievement test
score, x
39
43
21
64
57
47
28
75
34
52
Final calculus
grade, y
65
78
52
82
92
89
73
98
56
75
Solution:
First step: plot a scatter plot.
A strong +ve association between y and x.
A straight line, a reasonable model for this relationship.
26
Scatterplot of y vs x
100
90
80
70
60
50
20
30
40
yL y b x x
n
Where
x y
i 1
50
x
60
70
80
i 1
i 1
x y
n
x
x 2 i 1
n
i 1
n
0.766
yL 76 0.766 52 46 80.6
N n
V yL
MSE
Nn
27
Regression Analysis: y versus x

The regression equation is
y = 40.8 + 0.766 x
Predictor
Constant
x
Coef
40.784
0.7656
S = 8.70363
SE Coef
8.507
0.1750
T
4.79
4.38
R-Sq = 70.5%
P
0.001
0.002
R-Sq(adj) = 66.8%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
8
9
SS
1450.0
606.0
2056.0
MS
1450.0
75.8
F
19.14
P
0.002
MSE
S = MSE
[MSE can be obtained from ANOVA table]
V yL
476
75.8 7.42 .
486 10
B 2 V yL 5.45
28
OR
y i o 1 xi a bxi
ei yi y i yi a bxi
SSE ( yi a bxi ) 2
MSE
SSE
n2
5.5 DIFFERENCE ESTIMATION

If b=1 , use the difference estimator.
The difference method is easier to employ than the regression
method and frequently works just as well.
Commonly employed in auditing procedures.
Difference estimator of a population
Estimated variance of
where
:
where
Regression Estimation:
yL y b( x x)
= y 1( x x)
= x y x x d
29
Example 7:
Auditors are often interested in comparing the audited value of
items with the book value. Suppose a population contains 180
inventory items with a stated book value of x $13320 .
Let xi the book value
yi audit value of the ith item.
(i) Plot a graph y versus x.
(ii) Estimate the mean audit value of y by the difference method.
(iii) Estimate the variance of
yD .
Sample Audit value, Book value,

1
2
3
4
5
6
7
8
9
10
Solution:
y 72.1,
yi
xi
9
14
7
29
45
109
40
238
60
170
10
12
8
26
47
112
36
240
59
167
d i yi xi
-1
2
-1
3
-2
-3
4
-2
1
3
x 71.7, x 74.0, n 10
yD y
yD x d 74.0 72.1 71.7 74.4
30
i 1
di d
n 1
d 2 nd
i 1
n 1
V yD
N n
nN
i 1
6.27
di d
n 1
0.59
( yi xi y x) yi y xi x
n 1
n 1
2
yi y
n 1
yi y xi x
n 1
xi x
n 1
= S y 2S xy S x
2
= S y S x 2S x S y
2
5.6 RELATIVE EFFICIENCY OF ESTIMATORS

The estimators of population mean
i) Sample mean, y (or y )
y ;
y
x
x
ii)
Ratio estimator: y r x
iii)
Regression estimator: yL y b x x .
iv)
Difference estimator: yD y x x x d
31
Which is the best estimator for a particular sampling situation?

There are some guidelines that compare the properties of
estimators i.e. expressed in terms of relative efficiency of
estimators.
Let
and 2 are the two estimators for mean
y .
(i) E 1 E 2 y .
(ii) Assuming equal sample size for both estimators.
Note: Variances usually as n . It is convenient to describe the
relative size of the two variances by looking at their ratio.
Relative Efficiency of 1 to
V ( 2 )
RE 1
.
2 V (1 )
(or
wrt
2 )
is given by;
1
1
If RE
V 2 V 1
1
is an (favorable) estimator of
1
RE
2
If
2
V 2 2V 1
Favorable case for 1 .
Sample size for 2 would have to be twice that for

order to make 1 and 2 equivalent in terms of variance.
in
32
1
RE
1
If
2
Two estimators are equivalent. It does not matter which

one to use.
Note:
Comparing the Ratio Estimator to the Simple Mean per Element y :
S y2 N n
n N
SRS:
V y
Ratio:
N n
V S y2 r 2 S x2 2r S x S y
nN
y
RE
y
V y
S y2
2 2 2
V
y
y
RE
y
if S y2 r 2 Sx2 2r S x S y S y2
or
r 2 S x2 2r S x S y
or rS x2 2S x S y (assume
or
Sx
x
If
r 0)
Sx
1 S x 1 yS x 1 x
1 CV ( x)
r
=
2 S y 2 xS y 2 S y
2 CV ( y )
y
is known as the coefficient of variation for the x values

Sx x
1,
Sy y
then ratio estimator is more efficient than the simple
mean per element estimator
1
2
if .
33
Comparing the Simple Mean per Element

Estimator, yL :
Regression: yL y b x x
V yL
to the Regression
n
2
2
N n 1 n
2
b
x
i
i
Nn n 2 i 1
i 1
Make the slight change of replacing n 2 by n 1 in the

denominator,
N n 2 2 2
V yL
S y b S x
Nn
And since
Sy
Sx
becomes
2
N n 2
V yL
S
S y2
y
Nn
2
N n 2
S
1
y
Nn
yL
S y2
1
RE
2
2
y
2
1
Sy 1
which always greater than 1 if 0

Thus, yL is always more efficient than y as an estimator of y ,
unless regression y on x is linear if not, there is serious bias
problem.
34
Comparing the Regression Estimator,

y:
yL
to the Ratio Estimator,
Reg:
2 N n
V yL S y2 1
Nn
Ratio:
N n 2 2 2
V
nN
Then,
yL S y2 r 2 S x2 2r S x S y
RE
1
2

2
Sy 1
y
if
r 2 S x2 2r S x S y S y2
or
Since
S y rS x
S y bS x ,
bS x rS x 0
2
b r 0
2
The regression estimator is more efficient than the ratio estimator

unless b=r. The b=r occur when regression of y on x is linear
through the origin and variance of y is proportional to x.
35
Comparing the Difference Estimator,

per Element, y :
n
Diff:
yD
to the Mean Sample
N n
V yD
i 1
Nn
n 1
n
2
N n 1
i
i
Nn n 1 i 1
di d
N n 2
2
S y S x 2 S x S y
Nn
yD
S y2
RE
1
y S 2 S 2 2S S
x
x y
If
2S x S y S x2
Or
Sx
2S y
If variation in xs and ys is about the same, the difference

estimator will be more efficient than y when correlation
between x and y is greater than
1
2
( 1 ).
2
36
Comparing the Regression Estimator,

Estimator, yD :
Reg:
Diff:
yL ,
with the Difference
2
N n 2
V yL
S
1
y
Nn
N n 2
2
V yD
S y S x 2 S x S y
Nn
RE yL
yD
If
S x2 2S x S y S y2
Or
Sx S y
Since bS x S y , the regression estimator will be equivalent to

the difference estimator when b=1.
Otherwise, the regression estimator will be more efficient than
the difference estimator.
Ratio estimator procedure usually provides more precise

estimators of y and y compared to y and N y
Let
correlation coefficient between x and y.

S xy
Sx S y
37
(i) If
1 CV x 1 S x x
2 CV y 2 S y y
Then ratio estimator is more suitable because V y V y .

(ii) If CV x CV y , then ratio estimator is more suitable, if
1
2
V y V y .
Situation to Choose the Best Estimator.

Choose the shortest confident interval.
Ratio estimator is better estimation if the relation between ys
and xs is a straight line through the origin. A 0 . Variance of
ys about this line is proportional to xs.
Regression estimator is good estimator if the relationship
between the ys and xs is a straight line not through the origin
and the slope, B 1.
Difference estimator works well when the plot of y versus x
points lying uniformly close to a straight line with slope, B 1.
38
5.7 SUMMARY
a) If the relationship between ys and xs is linear
Use ratio, difference or regression.
b) If
rL
yi
ru
xi
(small range) and if
y A Bx
A 0 y Bx
Use ratio.
c) If
rL
yi
xi
(wide range)
Use regression or difference.

(i) If
B 1
difference:
yx A
(ii) If
B 1
regression:
y A Bx .
39

5 - Ratio Regression and Difference Estimation - Revised

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

5 - Ratio Regression and Difference Estimation - Revised

Hochgeladen von

Copyright:

Verfügbare Formate

5.

RATIO, REGRESSION AND DIFFERENCE

Estimation of the population mean and total was based on a sample

y . Assume that x is known.

It is basic to the concept of correlation and provides means for

The estimated ratio of one variable to other related variable

One auxiliary variable is used.

Works well when auxiliary variable is highly correlated with

Examples of ratio estimation:

y1 , y2 , , yn SRS with size n.

In this case, the no. of elements in the population, N, is

5.1 RATIO ESTIMATION

sample mean for x .

To perform statistical inference for

Variance estimator for:

Note: If the population mean for x, x is unknown, use x to

Note: If x is unknown, use

In a survey to examine trends in real estate, we interested to

Data and calculation for the real estate valuation survey

y total curent valuation of 20 homes

We estimate that the real estate valuation has increased

The 1001 % confidence interval for ratio R:

Note: V r can be written in the form which involves the

In a study to estimate the total sugar content of a truckload of

(i) Estimate y , total sugar content for the oranges.

The sample mean x must be used in place of x .

The bound on error of estimation:

y 101.79 lbs. , with a bound on the error of estimation of 6.3.

Total for the corresponding quarter of the previous year:

Population total x for the corresponding quarter of the previous

Bound on the error of estimation:

We estimate that the average amounts of money paid to employees

5.2 SELECTING THE SAMPLE SIZE

The amount of information contained in the sample depends on;

The sampling procedure (design) need to be chosen, then the

Consider the sample size required to estimate a population

Sample size required to estimate R with a bound on the error of

Determine the sample size (n) required to estimate R, the rate of

187 hours ; xi 178 hours, xi yi 4245 .

We should sample approximate 343 employee records to

Sample size required to estimate y with a bound on the error

Variance for population ratio, V r

(multiply by N both sides)

Sample size required to estimate y with a bound on the error

This can be found by solving for n:

Example 5: (Refer to example 4)

ii) Determine the sample size required to estimate y with a

5.3 RATIO ESTIMATION IN STRATIFIED

ii) Variance for

Combined ratio estimator:

can be used as an estimator of

ii) Variance for

Example 6: (Refer to example 4)

Let the 10 observation given on man hours lost due to sickness as a

nB 10; y B 4.6; x B 7.8; xB 12800; rB

0.4 18.7 0.6 4.6 16300 12800 10.10

Combined ratio estimator gives the larger estimated variance and

i) If stratum sample sizes are large enough ( 20 ) so that the

5.4 REGRESSION ESTIMATION

Regression estimator of mean

-A linear relationship between y and x but not necessarily pass

The estimator given next assumes that;

Regression estimator of a population mean