Sie sind auf Seite 1von 4

University of Skövde

School of Informatics
Master of Data science
Haftamu Hailu Tefera
al17hafte@student.his.e

Advanced Artificial Intelligence - IT714A

Evaluation of Machine Learning algorithms and Bias Variance Trade off analysis

The main goal of this assignment to evaluate machine learning algorithms on a particular data set
called abalone and analyzing the bias variance tradeoff as complexity of the model increases to fit
the data using different polynomial regressions.

The selected dataset consists of different attributes of the plant called abalone. My experiment is
predicting the age of the plant from different physical measurements like height, width, height and
others. Here in my experiment, I used diameter of the data set to predict the age.

For this assignment, I used Python as a programming language and polynomial regression plus the
collection of functions and classes which are defined inside the sklearn machine learning library.
Firstly, I modeled the relationship between the independent and response variable using simple
linear mode but later used polynomial regression to fit the data because the relationship between
independent variable (diameter) and the response variable is not linear.

Graph of actual data vs predicted data using the model

Firstly, the data is divided into train and test sets, I train the model using the train data set and later
I applied the model on the unseen data (the test set), and finally I plotted the graphs as follows

Fitting the data using different complexities

1|Page Bias variance trade off analysis


Comparison between real data and predicted data

Diameter Actual Predicted(Model Observed


data data) error
0.455 9 10.92075297 1.92
0.440 8 10.71917958 2.7
0.445 16 10.78824462 5.3
0.490 9 11.34109616 2.3
0.385 14 9.81603842 4.19

Even if I used different polynomial regression with various degrees, but there is error between the
actual data points and predicted value of using the model. When I use simple modes, they do not
represent the actual relationship between the response variable (ring) and the feature variable
(diameter). Using complex models are also sensitive to small change in the data. Therefore, to
decide which model is perfect for my data set I performed bias variance analysis by varying
complexities and I obtained the following complexity vs bias variance.

Bias variance graph

As we can see from the graph the variance changes slightly from complexity to complexity but the
bias is almost remains the same.
According the Occam's razor principle for this experiment bias and variance is low at complexity 1
which balances between the bias and variance errors.

2|Page Bias variance trade off analysis


I calculated the bias and variance using 15 different polynomials and 10 models as follows. The
values of the bias and variance are iteratively stored in an array. Here I stored five different bias
and variance values.

Degree of the polynomial Bias Variance


1 1.0825238483546171 0.37027046004464464
2 1.0825232368634246 0.36722314421445895
3 1.0825262515682708 0.37065378285258632
4 1.082521012912604 0.37143326266730792
5 1.0825227151319923 0.37557117740903923
Table of Bias Variance results

Code for the experiment

1. Code for plotting predicted values and real data on the test data

plt.plot(X_test, y_test, 'ro', label="actual")


max_degree =5
for d in range(1, max_degree+1):
m = fit_poly(X_train, y_train, d)
Pred=apply_poly(m,X_test)
plt.plot(X_test,Pred,label=" fit"+str(d)+"Poly Degree")
plt.legend(bbox_to_anchor=(0.0, 1.02, 1., .102), loc=4,
ncol=2, mode="expand", borderaxespad=0.)
plt.xlabel("Diameter")
plt.ylabel("Age(Rings)")
plt.grid()
plt.show()

2. Code for bias Function

def bias(Pred, actual):


result=float(sum(Pred))/len(actual)
return (((result- actual) ** 2).mean())

3. Code for variance function


def variance(pred,avg):
result=np.mean(avg)
return np.mean((pred-result)**2)

3|Page Bias variance trade off analysis


4. Code for displaying the above bias and variance graphs

n_models = 10
max_degree = 15
var_values=[]
bias_values = []
for degree in range(1, max_degree):
models = []
for m in range(n_models):
#training the model
model = fit_poly(X_train, y_train, degree)
#testing the model on the test data
Pred = apply_poly(model, X_test)
b=bias(Pred,y_test)/n_models
bias_values.append(b)
va=variance(Pred,Pred)
va=va/n_models
var_values.append(va)
pl.plot(bias_values, label=”bias”, range(1, max_degree))
plt.plot(var_values, label="variance”, range(1, max_degree))
plt.xlabel("Complexity")
plt.ylabel("Bias Value")
plt.grid()
plt.legend()

Reference
1. http://scikit-learn.org/stable/
2. Class lecture notes on machine learning

4|Page Bias variance trade off analysis

Das könnte Ihnen auch gefallen