Beruflich Dokumente
Kultur Dokumente
Satya Mandal, KU
Suppose
(x1 , y1 ), (x2 , y2 ), (x3 , y3 ), . . . , (xN , yN )
are N data points. We wish to determine the line
y = mx + c
that fits the data best according to the method of Least Square. We describe
the method, as follows:
1
2. The sum of square of these distances:
N
X
L = L(m, c) = D12 + D22 + + DN
2
= (mxi + c yi )2 .
i=1
3. The line for which L is least will be called the least square line, also
the regration line.
and X
(mxi + c yi ) = 0. Eqn II
2
(c) The covariance of x abd y is defined as
P
(xi x)(yi y)
cov(x, y) = .
N
We use these statistical notations and continue with our optimization
as follows.
6. Divide Eqn-II by N and we get
mx + c y = 0 or c = y mx.
X X
= (xi x)2 + 2x(N x) N x2 = (xi x)2 + N x2 .
(b) X X
xi yi = (xi x)(yi y) + N xy
Proof.
P This proofP is similar to the above. Here ee will use both
xi = N x and yi = N y. We have
X X
xi yi = [(xi x)(yi y) + xyi + xi y xy]
X X X
= (xi x)(yi y) + x yi + y xi N xy
X X
= (xi x)(yi y)+x(N y)+y(N x)N xy = (xi x)(yi y)+N xy.
3
9. Now Equation (III) is rewritten as
hX i X X
m (xi x)2 + N x2 + c xi xi yi = 0.
OR hX i X
2 2
m (xi x) + N x + cN x xi yi = 0.
OR
hX i hX i
m (xi x)2 + N x2 + cN x (xi x)(yi y) + N xy = 0.
Divide by N , we get
m x2 + x2 + cx [cov(x, y) + xy] = 0.
This reduces to
cov(x, y)
mx2 cov(x, y) = 0 hence m= .
x2
Therefore,
cov(x, y)
c = y mx = y x.
x2
4
y = mx + c is a line and pi is the perpendicular distance of the data point
(xi , yi ) from the line. Consider
X
P = p2i .