Sie sind auf Seite 1von 14

Assignment 1

Due on August 31, 2016

For all programs, use a subset of the following data as specified.


X1
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0

X2
100
200
300
400
500
600
700
800
900

X3
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009

Y1
5.1
6.1
6.9
7.8
9.2
9.9
11.5
12.0
12.8

Y2
102.1
202.4
303
403.4
504.2
604.8
704.8
805.7
905.7

Y3
4.1
6.0
9.2
12.0
17
20
25.5
31
36.4

1. Do the following.
a. Model Y1 as a linear function of X1. Use linear regression to learn the model parameters.

J()

Assuming 0 to be zero and 1 to vary from -10 to 15, here is the graph for the cost function J()

The Matlab code that generates this graph is


%initialize variables
m = 9;
teta0 = 0;
teta1 = 0;
k = 0;
J = 1;
teta1_array = 0;

%varying 1 from -10 to 15


for teta1 = -10:1:15
sum = 0;

%calculating the summation for all distances


for i = 1:m
sum = sum + ( ( (teta0 + teta1*x1(i)) - y1(i) )^2 );
end

end

%saving values on arrays


k = k + 1;
J(k) = (1/(2*m))*sum;
teta1_array(k) = teta1;

%plot graph
plot(teta1_array, J)

Theta 1

By using the gradient descent method with = 0.01 we have

After 6000 iterations, 0 = 1.0556 and 1 = 1.9943. Therefore, by knowing the values of 0 and 1 it is
possible to plot the learned curve.

Y1

Here is the Matlab code that learns the values of 0 and 1


%initialize variables
m = 9;
teta0 = 0;
teta1 = 0;
teta0_array = 0;
teta1_array = 0;
J = 0;
alpha = 0.01;

%loop for the 6000 iterations


for k = 1:6000

%initializing the summation for J, teta0 and teta1


sum_J = 0;
sum_teta0 = 0;
sum_teta1 = 0;
%performing the summations
for i = 1:m

%gradient with respect to teta0


sum_teta0 = sum_teta0 + ((teta0 + teta1*x1(i)) - y1(i));

%gradiend with respect to teta1


sum_teta1 = sum_teta1 + (((teta0 + teta1*x1(i)) y1(i))*x1(i));

end

%cost function
sum_J = sum_J + ( ( (teta0 + teta1*x1(i)) - y1(i) )^2 );

%calculating the next teta


teta0 = teta0 - (alpha/m)*sum_teta0;
teta1 = teta1 - (alpha/m)*sum_teta1;

end

%storing the values


teta0_array(k) = teta0;
teta1_array(k) = teta1;
J(k) = (1/(2*m))*sum_J;

%calculating the array with the predicted Y


y_tetas = 0;
for i = 1:m
y_tetas(i) = teta0 + teta1*x1(i);
end
b. Predict output for X1 = 4.10 and X1 = 6.5.

After knowing 0 and 1 it becomes easy to predict the values, as the learned curve equation becomes:
h(x) = 0 + 1x =>

h(x) = 1.0556 + 1.9943x

Therefore, h(4.10) = 9.2322 and h(6.5) = 14.0185

J(theta)

J(theta)

c. Repeat gradient descent learning for = 0.01, 0.1, 1.0, and 100. Plot J for the learning
duration.

J(theta)

J(theta)

d. Interpret the results in c.

The concave shape of the cost function is very sharp, which means that it converges rapidly to the
minimal value. Therefore, with an bigger than 0.1, the value of starts to jump from each side of the
concave curve instead of converging to the minimum value.
2. Do the following.
a. Model Y2 as a linear function of X1 and X2. Use linear regression to learn the model
parameters without scaling features. Use an appropriate value for .

Assuming 0 to be zero and varying 1 and 2 from -10 to 15, the graph for the cost function will be on a
three-dimensional space.
Cost function J() with two independent variables
10 7
4

J()

3
2
1
0
20
10

0
-10

-10

-5

10

15

The Matlab code that generates this graph is


%initialize variables
m = 9;
teta0 = 0;
teta1 = 0;
teta2 = 0;
k = 0;
J = 0;
teta1_array = 0;
teta2_array = 0;

%varying 1 from -10 to 15


for teta1 = -10:1:15
for teta2 = -10:1:15
sum = 0;

%calculating the somation for all distances


for i = 1:m
sum = sum + ( ( (teta0 + teta1*x1(i) + teta2*x2(i)) y2(i) )^2 );
end

end

end

%saving values on arrays


k = k + 1;
J(k) = (1/(2*m))*sum;
teta1_array(k) = teta1;
teta2_array(k) = teta2;

Theta 1

Theta 0

By using the gradient descent method with = 0.000001 and 6000000 iterations we have

Theta 2

After 6000000 iterations, 0 = 0.4727, 1 = 0.7141 and 2 = 1.0014. Therefore, by knowing the values of
0, 1 and 2 it is possible to plot the learned curve h(x) = 0 + 1X1 + 2X2
Learned curve from the data set

Original data set


Learned curve

1000
800

y2

600
400
200
0
2

1000

500
4

x1

x2

Here is the Matlab code that learns the values of 0, 1 and 2


%initialize variables
m = 9;
teta0 = 0;
teta1 = 0;
teta2 = 0;
teta0_array = 0;
teta1_array = 0;
teta2_array = 0;
J = 0;
alpha = 0.000001;

%loop for the 6000000 iterations


for k = 1:6000000

%initializing the summation for J, teta0 and teta1


sum_J = 0;
sum_teta0 = 0;
sum_teta1 = 0;
sum_teta2 = 0;
%performing the summations
for i = 1:m

%summation for the gradient with respect to teta0


sum_teta0 = sum_teta0 + ( (teta0 + teta1*x1(i) + teta2*x2(i))
- y2(i) );
% summation for the gradient with respect to teta1
sum_teta1 = sum_teta1 + ( ( (teta0 + teta1*x1(i) +
teta2*x2(i)) - y2(i) )*x1(i) );
% summation for the gradient with respect to teta2
sum_teta2 = sum_teta2 + ( ( (teta0 + teta1*x1(i) +
teta2*x2(i)) - y2(i) )*x2(i) );

%cost function
sum_J = sum_J + ( ( (teta0 + teta1*x1(i) + teta2*x2(i)) y2(i) )^2 );
end
%calculating the next teta
teta0 = teta0 - (alpha/m)*sum_teta0;
teta1 = teta1 - (alpha/m)*sum_teta1;
teta2 = teta2 - (alpha/m)*sum_teta2;
%storing the values
teta0_array(k) = teta0;
teta1_array(k) = teta1;
teta2_array(k) = teta2;

end

J(k) = (1/(2*m))*sum_J;

%calculating the array with the predicted Y


y_tetas = 0;
for i = 1:m
y_tetas(i) = teta0 + teta1*x1(i) + teta2*x2(i);
end
b. Plot J for the learning duration.
Cost function for the first 60 iterations

10 4

18
16
14

J(theta)

12
10
8
6
4
2
0

10

20

30

N of iterations

40

50

60

c. Repeat a and b by scaling features.

By scaling features x1 and x2 it is possible to converge to values for 0, 1 and 2 with a larger and much
less iterations. The scaling method chosen was to subtract mean of the data set and divide by the range.
The following Matlab code was added to calculate the new scaled data set:
for i = 1:m
x1_scaled(i) = (x1(i) - mean(x1))/range(x1);
x2_scaled(i) = (x2(i) - mean(x2))/range(x2);
end

The number of iterations was reduced to 300 and was changed to 0.1. Below are the cost function and
the convergence curves for 0, 1 and 2.

Theta 1
J(theta)

Theta 0
Theta 2

d. Find parameter vector using standard mathematical approach.

In order to compute the solution using the mathematical approach, the following formula was used:
=(

where X is a matrix in which the first row is filled with ones and the subsequent rows are the training
sample data and Y is a column vector with all values of y from the training sample. This solution relies on
XTX being invertible. It will not be invertible in case of redundant features or too many features for the
size of the training sample data (n >> m).

The following Matlab code solves the equation to find the matrix:
x_matrix = 0;
y_matrix = 0;

for i=1:m
x_matrix(i,1)
x_matrix(i,2)
x_matrix(i,3)
y_matrix(i,1)
end

=
=
=
=

1;
x1(i);
x2(i);
y2(i);

A=inv(transpose(x_matrix)*x_matrix)
B=transpose(x_matrix)*y_matrix
theta_matrix = A*B

This is the output for the first problem (model Y 1 as a linear function of X1):
x_matrix =
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000

2.0000
2.5000
3.0000
3.5000
4.0000
4.5000
5.0000
5.5000
6.0000

y_matrix =
5.1000
6.1000
6.9000
7.8000
9.2000
9.9000
11.5000
12.0000
12.8000
A =

1.1778
-0.2667

B =
81.3000
355.1000

-0.2667
0.0667

theta_matrix =
1.0600
1.9933

The computation gives the correct values of 0 = 1.0600 and 1 = 1.9933, which are similar to the values
found using the Machine Learning approach.
For the second problem however, the data has redundant features (X 1 and X2 are very similar) which
makes XTX not invertible. Therefore, the final result is wrong. Here is the output of the Matlab program
when modeling Y2 as a linear function of X1 and X2:
x_matrix =
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000

2.0000
2.5000
3.0000
3.5000
4.0000
4.5000
5.0000
5.5000
6.0000

y_matrix =
102.1000
202.4000
303.0000
403.4000
504.2000
604.8000
704.8000
805.7000
905.7000

100.0000
200.0000
300.0000
400.0000
500.0000
600.0000
700.0000
800.0000
900.0000

Warning: Matrix is close to singular or badly scaled. Results may be


inaccurate. RCOND = 9.247842e-22.
> In mathematical_solution (line 13)
A =
1.0e+14 *
-2.2518
1.5012
-0.0075

B =
1.0e+06 *
0.0045
0.0212
2.8710

1.5012
-1.0008
0.0050

theta_matrix =
512
0
5

-0.0075
0.0050
-0.0000

3. Do the following.
a. Model Y3 as a quadratic function of X1. Use regression to learn the model parameters.

Any polynomial hypothesis can be mapped to a linear regression problem in a higher dimensional
feature space. Therefore, by transforming h(x) = 0 + 1X1 + 2X12 into h(Z) = 0 + 1Z1 + 2Z2 where Z1 is
equal to X1 and Z2 is equal to X12, the polynomial problem becomes a linear multivariable regression.
The same Matlab code for multiple linear regression was used with a little addition:

%convert a quadratic hypotesis function to a higher dimensional


feature space:
%h(x) = teta0 + teta1*x1 + teta2*x1^2 => h(Z) = teta0 + teta1*Z1 +
teta2*Z2
%Therefore, Z1 = x1 and Z2 = x1^2
for i = 1:m
z1(i) = x1(i);
z2(i) = x1(i)^2;
end

Theta 1
y3

Theta 2

Theta 0

Here are the convergence curves for 0, 1 and 2 and the learned curve

b. Plot J for the learning duration.

Cost function for 60 iterations

250

J(theta)

200

150

100

50

10

20

30

N of iterations

40

50

60

Das könnte Ihnen auch gefallen