Sie sind auf Seite 1von 3

STAT 332 Exercises: Ratio and Regression Estimation

Chapter 6 of the Course Notes covers ratio and regression estimation. These exercises will
show you how to generate datasets for yourself with which to practice calculating these kinds
of estimates. This will also help you become more familiar with programming in R, and see
how changing certain properties of our data generation process can affect our estimates.
Exercise 1:

• Download the files ‘Ex 2 Gen.R’ and ‘Ex 2 Sol.R’.

• Open R and change your working directory to wherever you saved these two files. (If
you’re not sure how to do this, see the R tutorial file called ‘IntroductionToR’, under
‘Additional Notes’, on Learn.)

• Run the file ‘Ex 2 Gen.R’ by using the following command:

source(‘Ex_2_Gen.R’)

This should produce output that looks something like

Population mean of X = 9.977


Population mean of Y = 9.96

Make a note of these, they’re the population mean of the X variable (µX in the lecture
notes), and the Y variable (µY ). In reality we’d never know Y, but it’s useful to see
what the ‘truth’ is when comparing our estimates later!

• This should also have generated a data file called ‘Ex 2 Data.csv’; check that this
appears in the folder you’ve been working in.

• You now have a dataset (‘Ex 2 Data.csv’), and know the population mean µX . You
can now use this dataset to do the following:

(a) Prepare a scatterplot of y versus x (see the R tutorial file on linear regression on
Learn).

(b) Estimate µY using the sample average, the ratio estimate, and the regression es-
timate.

(c) Find 95% confidence intervals based on each estimate.

N.B. The code is not using simple random sampling without replacement, but you
should calculate these as if it is.

1
Once you’ve tried to do all this yourself, you can run the solution file ‘Ex 2 Sol.R’. Im-
portant: do not make any changes to the file ‘Ex 2 Gen.R’ before running the
solution file! Do this by typing

source(‘Ex_2_Sol.R’)

You should see a plot appear, and messages similar to the following (with the dots replaced
by numbers):

Population mean of X = 9.977


Population mean of Y = 9.96
-----------------------
Mean estimates of Y:
Sample average: ......
Ratio estimate: ......
Regression estimate: ......
-----------------------
95% confidence intervals:
Sample average: [......,......]
Ratio estimate: [......,......]
Regression estimate: [......,......]

Check your answers match up with the above R output. If they don’t, there’s either a
problem with your code/calculations, or mine. If you think it might be the latter, please
let me know ASAP!

Exercise 2:

If the above went well for you, now’s time to start creating your own unique datasets. To
do this, open up the file ‘Ex 2 Gen.R’ in a text editor (such as Notepad on Windows), and
then do the following:

• Look over the code and try to make sense of it. I’ve provided a lot of comments that
should help explain most of what’s going on. You don’t need to know how to generate
data like this yourself for STAT 332, but it’s incredibly common in real analysis (and
theoretical work!).

• Now try changing some of the parameters we’ve used to generate the dataset. You
can change the following variables:
- N: the population size, and n the sample size.
- mu and sigma: the mean and standard deviation of the distribution we draw our
population for X from.
- alpha and beta: the intercept and slope of the model relating X and Y.

As a starting point, I’d suggest changing alpha from 0 to 10. What effect should this have
on your ratio and regression estimates?

2
• If you want to get even more involved, you can try changing:
- low and high: these set the range of X in which our sample is taken. By default
they’re set to 1 and 250, which means that our sample will only be taken from the
250 units with the smallest value of X. You should see this reflected in the sample
estimates of µX and µY being lower than their true values. If we set low to be 1 and
high to be 1000, this would sample from the full population. If we set low to be 750
and high to be 1000, this would sample from units with larger values of X.

- X <- rnorm(N, mu, sigma): the data generation process for X. It may be interest-
ing to try other probability distributions than normal/Gaussian. For example, our X
could come from a Uniform[0,10] distribution using the command

X <- runif(N, 0, 10)

- Y <- alpha + beta*X + rnorm(N,0,1): this generates Y. The rnorm(N,0,1) part


corresponds to the residual terms in our model. If you change the 1 to a larger number
(i.e., larger residual variance), or a smaller number (i.e., smaller residual variance),
this should have an impact on what your plot looks like, and the accuracy of your
ratio and regression estimates.

After you’ve made your changes, follow the steps in Exercise 1 (starting at the third bullet
point: running the data generation file). This will generate a new dataset based on your
changes, and you can try and construct estimates and confidence intervals based on these
new data. Once you’ve finished, running the solutions file ‘Ex 2 Sol.R’ should, once again,
generate solutions that match up with your own calculations.

You can use these two programs to generate an infinite number of datasets to practice with!

Exercise 3

Once you’re happy you’ve got enough practice calculating these datasets yourself, you can
try just changing the various parameters in the data generation file, then running the solu-
tion file to see what happens. You can follow this procedure:

1. Make any changes you like to ‘Ex 2 Gen.R’. I’d suggest only changing one parameter
at a time.

2. Run the solution file by typing:

source(‘Ex_2_Sol.R’)

3. Have a look at the various estimates (and the plot), and see if you notice any patterns.
For example, if you set alpha to be large (so that a line through the origin does not
explain the data very well), you should find that your ratio estimate performs poorly.

Das könnte Ihnen auch gefallen