Assignment 1

Assignment 1
Due on August 31, 2016
For all programs, use a subset of the following data as specified.

X1
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
X2
100
200
300
400
500
600
700
800
900
X3
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
Y1
5.1
6.1
6.9
7.8
9.2
9.9
11.5
12.0
12.8
Y2
102.1
202.4
303
403.4
504.2
604.8
704.8
805.7
905.7
Y3
4.1
6.0
9.2
12.0
17
20
25.5
31
36.4
1. Do the following.
a. Model Y1 as a linear function of X1. Use linear regression to learn the model parameters.
J()
Assuming 0 to be zero and 1 to vary from -10 to 15, here is the graph for the cost function J()
The Matlab code that generates this graph is

%initialize variables
m = 9;
teta0 = 0;
teta1 = 0;
k = 0;
J = 1;
teta1_array = 0;
%varying 1 from -10 to 15

for teta1 = -10:1:15
sum = 0;
%calculating the summation for all distances

for i = 1:m
sum = sum + ( ( (teta0 + teta1*x1(i)) - y1(i) )^2 );
end
end
%saving values on arrays

k = k + 1;
J(k) = (1/(2*m))*sum;
teta1_array(k) = teta1;
%plot graph
plot(teta1_array, J)
Theta 1
By using the gradient descent method with = 0.01 we have
After 6000 iterations, 0 = 1.0556 and 1 = 1.9943. Therefore, by knowing the values of 0 and 1 it is
possible to plot the learned curve.
Y1
Here is the Matlab code that learns the values of 0 and 1

m = 9;
teta0 = 0;
teta1 = 0;
teta0_array = 0;
teta1_array = 0;
J = 0;
alpha = 0.01;
%loop for the 6000 iterations

for k = 1:6000
%initializing the summation for J, teta0 and teta1

sum_J = 0;
sum_teta0 = 0;
sum_teta1 = 0;
%performing the summations
for i = 1:m
%gradient with respect to teta0

sum_teta0 = sum_teta0 + ((teta0 + teta1*x1(i)) - y1(i));
%gradiend with respect to teta1

sum_teta1 = sum_teta1 + (((teta0 + teta1*x1(i)) y1(i))*x1(i));
end
%cost function
sum_J = sum_J + ( ( (teta0 + teta1*x1(i)) - y1(i) )^2 );
%calculating the next teta

teta0 = teta0 - (alpha/m)*sum_teta0;
end
%storing the values

J(k) = (1/(2*m))*sum_J;
%calculating the array with the predicted Y

y_tetas = 0;
for i = 1:m
y_tetas(i) = teta0 + teta1*x1(i);
end
b. Predict output for X1 = 4.10 and X1 = 6.5.
After knowing 0 and 1 it becomes easy to predict the values, as the learned curve equation becomes:
h(x) = 0 + 1x =>
h(x) = 1.0556 + 1.9943x
Therefore, h(4.10) = 9.2322 and h(6.5) = 14.0185
J(theta)
J(theta)
c. Repeat gradient descent learning for = 0.01, 0.1, 1.0, and 100. Plot J for the learning
duration.
J(theta)
J(theta)
d. Interpret the results in c.
The concave shape of the cost function is very sharp, which means that it converges rapidly to the
minimal value. Therefore, with an bigger than 0.1, the value of starts to jump from each side of the
concave curve instead of converging to the minimum value.
a. Model Y2 as a linear function of X1 and X2. Use linear regression to learn the model
parameters without scaling features. Use an appropriate value for .
Assuming 0 to be zero and varying 1 and 2 from -10 to 15, the graph for the cost function will be on a
three-dimensional space.
Cost function J() with two independent variables
10 7
4
J()
3
2
1
0
20
10
0
-10
-10
-5
10
15
The Matlab code that generates this graph is

m = 9;
teta0 = 0;
teta1 = 0;
teta2 = 0;
k = 0;
J = 0;
teta1_array = 0;
teta2_array = 0;
%varying 1 from -10 to 15

for teta1 = -10:1:15
for teta2 = -10:1:15
sum = 0;
%calculating the somation for all distances

for i = 1:m
sum = sum + ( ( (teta0 + teta1*x1(i) + teta2*x2(i)) y2(i) )^2 );
end
end
end
%saving values on arrays

k = k + 1;
J(k) = (1/(2*m))*sum;
Theta 1
Theta 0
By using the gradient descent method with = 0.000001 and 6000000 iterations we have
Theta 2
After 6000000 iterations, 0 = 0.4727, 1 = 0.7141 and 2 = 1.0014. Therefore, by knowing the values of
0, 1 and 2 it is possible to plot the learned curve h(x) = 0 + 1X1 + 2X2
Learned curve from the data set
Original data set

Learned curve
1000
800
y2
600
400
200
0
2
1000
500
4
x1
x2
Here is the Matlab code that learns the values of 0, 1 and 2

m = 9;
teta0 = 0;
teta1 = 0;
teta2 = 0;
teta0_array = 0;
teta1_array = 0;
teta2_array = 0;
J = 0;
alpha = 0.000001;
%loop for the 6000000 iterations

for k = 1:6000000
%initializing the summation for J, teta0 and teta1

sum_J = 0;
sum_teta0 = 0;
sum_teta1 = 0;
sum_teta2 = 0;
%performing the summations
for i = 1:m
%summation for the gradient with respect to teta0

sum_teta0 = sum_teta0 + ( (teta0 + teta1*x1(i) + teta2*x2(i))
- y2(i) );
% summation for the gradient with respect to teta1
sum_teta1 = sum_teta1 + ( ( (teta0 + teta1*x1(i) +
teta2*x2(i)) - y2(i) )*x1(i) );
% summation for the gradient with respect to teta2
sum_teta2 = sum_teta2 + ( ( (teta0 + teta1*x1(i) +
teta2*x2(i)) - y2(i) )*x2(i) );
%cost function
sum_J = sum_J + ( ( (teta0 + teta1*x1(i) + teta2*x2(i)) y2(i) )^2 );
end
%calculating the next teta
%storing the values
end
J(k) = (1/(2*m))*sum_J;
%calculating the array with the predicted Y

y_tetas = 0;
for i = 1:m
y_tetas(i) = teta0 + teta1*x1(i) + teta2*x2(i);
end
b. Plot J for the learning duration.
Cost function for the first 60 iterations
10 4
18
16
14
J(theta)
12
10
8
6
4
2
0
10
20
30
N of iterations
40
50
60
c. Repeat a and b by scaling features.
By scaling features x1 and x2 it is possible to converge to values for 0, 1 and 2 with a larger and much
less iterations. The scaling method chosen was to subtract mean of the data set and divide by the range.
The following Matlab code was added to calculate the new scaled data set:
for i = 1:m
x1_scaled(i) = (x1(i) - mean(x1))/range(x1);
x2_scaled(i) = (x2(i) - mean(x2))/range(x2);
end
The number of iterations was reduced to 300 and was changed to 0.1. Below are the cost function and
the convergence curves for 0, 1 and 2.
Theta 1
J(theta)
Theta 0
Theta 2
d. Find parameter vector using standard mathematical approach.
In order to compute the solution using the mathematical approach, the following formula was used:
=(
where X is a matrix in which the first row is filled with ones and the subsequent rows are the training
sample data and Y is a column vector with all values of y from the training sample. This solution relies on
XTX being invertible. It will not be invertible in case of redundant features or too many features for the
size of the training sample data (n >> m).
The following Matlab code solves the equation to find the matrix:
x_matrix = 0;
y_matrix = 0;
for i=1:m
x_matrix(i,1)
x_matrix(i,2)
x_matrix(i,3)
y_matrix(i,1)
end
=
=
=
=
1;
x1(i);
x2(i);
y2(i);
A=inv(transpose(x_matrix)*x_matrix)
B=transpose(x_matrix)*y_matrix
theta_matrix = A*B
This is the output for the first problem (model Y 1 as a linear function of X1):
x_matrix =
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
2.0000
2.5000
3.0000
3.5000
4.0000
4.5000
5.0000
5.5000
6.0000
y_matrix =
5.1000
6.1000
6.9000
7.8000
9.2000
9.9000
11.5000
12.0000
12.8000
A =
1.1778
-0.2667
B =
81.3000
355.1000
-0.2667
0.0667
theta_matrix =
1.0600
1.9933
The computation gives the correct values of 0 = 1.0600 and 1 = 1.9933, which are similar to the values
found using the Machine Learning approach.
For the second problem however, the data has redundant features (X 1 and X2 are very similar) which
makes XTX not invertible. Therefore, the final result is wrong. Here is the output of the Matlab program
when modeling Y2 as a linear function of X1 and X2:
x_matrix =
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
2.0000
2.5000
3.0000
3.5000
4.0000
4.5000
5.0000
5.5000
6.0000
y_matrix =
102.1000
202.4000
303.0000
403.4000
504.2000
604.8000
704.8000
805.7000
905.7000
100.0000
200.0000
300.0000
400.0000
500.0000
600.0000
700.0000
800.0000
900.0000
Warning: Matrix is close to singular or badly scaled. Results may be

inaccurate. RCOND = 9.247842e-22.
> In mathematical_solution (line 13)
A =
1.0e+14 *
-2.2518
1.5012
-0.0075
B =
1.0e+06 *
0.0045
0.0212
2.8710
1.5012
-1.0008
0.0050
theta_matrix =
512
0
5
-0.0075
0.0050
-0.0000
a. Model Y3 as a quadratic function of X1. Use regression to learn the model parameters.
Any polynomial hypothesis can be mapped to a linear regression problem in a higher dimensional
feature space. Therefore, by transforming h(x) = 0 + 1X1 + 2X12 into h(Z) = 0 + 1Z1 + 2Z2 where Z1 is
equal to X1 and Z2 is equal to X12, the polynomial problem becomes a linear multivariable regression.
The same Matlab code for multiple linear regression was used with a little addition:
%convert a quadratic hypotesis function to a higher dimensional

feature space:
%h(x) = teta0 + teta1*x1 + teta2*x1^2 => h(Z) = teta0 + teta1*Z1 +
teta2*Z2
%Therefore, Z1 = x1 and Z2 = x1^2
for i = 1:m
z1(i) = x1(i);
z2(i) = x1(i)^2;
end
Theta 1
y3
Theta 2
Theta 0
Here are the convergence curves for 0, 1 and 2 and the learned curve
b. Plot J for the learning duration.
Cost function for 60 iterations
250
J(theta)
200
150
100
50
10
20
30
N of iterations
40
50
60

Assignment 1

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Assignment 1

Hochgeladen von

Copyright:

Verfügbare Formate

Assignment 1

Due on August 31, 2016

For all programs, use a subset of the following data as specified.

The Matlab code that generates this graph is

%varying 1 from -10 to 15

%calculating the summation for all distances

%saving values on arrays

By using the gradient descent method with = 0.01 we have

Here is the Matlab code that learns the values of 0 and 1

%loop for the 6000 iterations

%initializing the summation for J, teta0 and teta1

%gradient with respect to teta0

%gradiend with respect to teta1

%calculating the next teta

%storing the values

%calculating the array with the predicted Y

h(x) = 1.0556 + 1.9943x

Therefore, h(4.10) = 9.2322 and h(6.5) = 14.0185

d. Interpret the results in c.

The Matlab code that generates this graph is

%varying 1 from -10 to 15

%calculating the somation for all distances

%saving values on arrays

Original data set

Here is the Matlab code that learns the values of 0, 1 and 2

%loop for the 6000000 iterations

%initializing the summation for J, teta0 and teta1

%summation for the gradient with respect to teta0

%calculating the array with the predicted Y

c. Repeat a and b by scaling features.

d. Find parameter vector using standard mathematical approach.

Warning: Matrix is close to singular or badly scaled. Results may be

%convert a quadratic hypotesis function to a higher dimensional

b. Plot J for the learning duration.

Cost function for 60 iterations

Das könnte Ihnen auch gefallen