Heng Huang, Ph.D. Department of Computer Science and Engineering
Fall 2013 Heng Huang Machine Learning 2 Supervised Learning Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 3 Linear Regression Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 4 Linear Regression Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 5 Examples Voltage -> Temperature Stock prediction -> Money Processes, memory -> Power consumption Protein structure -> Energy Robot arm controls -> Torque at effector Location, industry, past losses -> Premium
Fall 2013 Heng Huang Machine Learning 6 Linear Regression 0 10 20 30 40 0 10 20 30 20 22 24 26 T e m p e r a t u r e
0 10 20 0 20 40 Given examples Predict given a new point Fall 2013 Heng Huang Machine Learning 7 Linear Regression Prediction Prediction 0 20 0 20 40 0 10 20 30 40 0 10 20 30 20 22 24 26 T e m p e r a t u r e
Fall 2013 Heng Huang Machine Learning 8 Ordinary Least Squares (OLS) Prediction Observation 0 20 0 Error or residual Sum squared error Fall 2013 Heng Huang Machine Learning 9 Linear Regression. Optimization. Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 10 Linear Regression. Optimization. Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 11 Solving Linear Regression Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 12 Solving Linear Regression Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 13 Gradient Descent Solution Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 14 Gradient Descent Method Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 15 Gradient Descent Method Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 16 Gradient Descent Method Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 17 Online Gradient Algorithm Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 18 Online Regression Algorithm Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 19 On-line Learning. Example Ref: Milos Hauskrecht Fall 2013 Heng Huang Machine Learning 20 Fitting Model by Maximum Likelihood To enter the probabilistic world, lets say our model y(x, w) predicts t with an error that is modeled as a Gaussian random variable with precision b.
The likelihood for all data samples is
Can let dependence on x be implicit, because we are not modeling the distribution of x. Ref: Chuck Anderson Fall 2013 Heng Huang Machine Learning 21 Fitting Model by Maximum Likelihood Taking the logarithm of the likelihood we get
Then take derivative, actually gradient, with respect to w.
Setting this equal to zero we can solve for w. Ref: Chuck Anderson Similar to equation (1.54) Fall 2013 Heng Huang Machine Learning 22 Fitting Model by Maximum Likelihood These sums can be expressed as matrix operations if we define
Now the above equation becomes the following one and the solution for w continues.
Ref: Chuck Anderson Fall 2013 Heng Huang Machine Learning 23 Regularization To avoid overfitting, we want to limit the complexity of the model. The simplest linear model is the constant model, with all parameters w equal to zero except for w 0 . Can include in the error function by adding sum of squared weights to the squared error term (disregarding all other terms not dependent on w).
Again, taking the gradient, setting equal to zero, and solving for w, we get Ref: Chuck Anderson Fall 2013 Heng Huang Machine Learning 24 Other Regularizers We do not need to use the squared error, provided we are willing to do more computation. Other powers of the weights can be used.