Sie sind auf Seite 1von 2

AQA Statistics 1 Correlation and regression

1 of 2 27/02/13 MEI
Section 2: Regression

Notes and Examples

These notes contain subsections on
The least squares regression line
Using a regression line to make predictions
Residuals


The least squares regression line

If there is a linear connection between two sets of data, it may be appropriate
to draw a line of best fit which can sometimes be used for predicting values.
At GCSE level you may have drawn lines of best fit by eye: this is not very
satisfactory as it may be difficult to judge which of the possible lines is the
best. If several people all draw lines of best fit for the same data set by eye,
and then find their equations, they may be quite different, even if they all look
fairly convincing when drawn!

In this section you learn to calculate the equation of the line of best fit (the
least squares regression line). This is the line for which the sum of the
squares of the residuals (the vertical distances of each point from the line) is
as small as possible.

There are several different ways in which the equation of the regression line
may be written: one of the simplest is given on page 135, and is also given in
similar form in your formula book:
y a bx , where
xy
xx
S
b
S
and a y bx .


Try the Regression matching activity, in which you match up scatter
diagrams with the equation of a regression line.


Using a regression line to make predictions

A regression line may sometimes be used for predicting values, but you must
be careful to consider whether or not this is appropriate.
Firstly, you need to think about how good the linear model is for the data. This
can often be judged from a scatter diagram:

it may be that the data does not really look linear

AQA S1 Correlation 2 Notes and Examples
2 of 2 27/02/13 MEI


the line might look like a good model within part of the range of the
data but not for other parts of the range



You may need to extrapolate a graph in order to make a prediction. This may
or may not be appropriate: you must think about the context.


Residuals

As mentioned before, the residual for each data point is the vertical distance
between the point and the regression line. The sum of all the residuals is zero,
but the sum of the squares of the residuals is used as a measure of how close
the points are to the regression line.

It is important to realise that amongst all possible straight lines, the regression
line is the one which minimises the sum of the squares of the residuals. You
may be asked in an examination about the significance of the residuals.


The Bivariate data interactive spreadsheet demonstrates that the sum of
the squares of the residuals is minimised by the regression line. The
regression line is initially shown on the spreadsheet. You can vary the
gradient of the line (which always passes through , x y ) and you will see that
the sum of the squares of the residuals is at a minimum for the regression
line. You can also change the data values if you wish.

You can also try the Geogebra resource Regression, which can be used in a
similar way.

These data look as if they fit a
curve rather than a line, so the
regression line will not give very
reliable predictions
The line is a very good fit up to about
x = 2.5, so for values of x between 0
and 2.5 the regression line could be
used to make predictions. However
for larger values of x the model looks
less appropriate, so predictions
become unreliable.

Das könnte Ihnen auch gefallen