Sie sind auf Seite 1von 19

Machine Learning

Math Essentials

Jeff Howbert Introduction to Machine Learning Winter 2012 1

Areas of math essential to machine learning

Machine learning is part of both statistics and computer

– Probability
– Statistical inference
– Validation
– Estimates of error, confidence intervals
Linear algebra
l b
– Hugely useful for compact representation of linear
transformations on data
– Dimensionality reduction techniques
p theoryy

Jeff Howbert Introduction to Machine Learning Winter 2012 2

Why worry about the math?

There are lots of easy-to-use machine learning

packages out there.
After this course, you will know how to apply
several of the most ggeneral-purpose
p p algorithms.

To get really useful results, you need good
at e at ca intuitions
tu t o s about certain
ce ta general
ge e a
machine learning principles, as well as the inner
workings of the individual algorithms.

Jeff Howbert Introduction to Machine Learning Winter 2012 3

Why worry about the math?

These intuitions will allow you to:

– Choose the right algorithm(s) for the problem
– Make good choices on parameter settings,
validation strategies
– Recognize over- or underfitting
– Troubleshoot poor / ambiguous results
– Put appropriate bounds of confidence /
uncertainty on results
– Do a better job of coding algorithms or
p g them into more complex
analysis pipelines
Jeff Howbert Introduction to Machine Learning Winter 2012 4

a∈A set membership: a is member of set A

|B| cardinality: number of items in set B
|| v || norm: length of vector v
∑ summation
∫ integral
ℜ th sett off reall numbers
the b
ℜn real number space of dimension n
n = 2 : plane or 2-space
n = 3 : 3- (dimensional) space
n > 3 : n-space or hyperspace
yp p

Jeff Howbert Introduction to Machine Learning Winter 2012 5


x, y, z, vector (bold, lower case)

u, v
A, B, X matrix (bold, upper case)
y = f( x ) function (map): assigns unique value in
range of y to each value in domain of x
dyy / dx derivative of y with respect
p to single
variable x
y = f(( x ) function on multiple
p variables, i.e. a
vector of variables; function in n-space
∂y / ∂xi partial derivative of y with respect to
element i of vector x
Jeff Howbert Introduction to Machine Learning Winter 2012 6
Linear algebra applications

1) Operations on or between vectors and matrices

2) Coordinate transformations
3) Dimensionality reduction
4) Linear regression
5) Solution of linear systems of equations
6) M
Many others

Applications 1) – 4) are directly relevant to this

course. Today we’ll start with 1).

Jeff Howbert Introduction to Machine Learning Winter 2012 43

Why vectors and matrices?

Most common form of data vector

organization for machine Refund Marital Taxable

learning is a 2D array, where Status Income Cheat

Yes Single 125K No

– rows represent
p samples
p No
(records, items, datapoints) Yes Married 120K No
No Divorced 95K Yes

– columns represent
p attributes No Married 60K No

(features, variables)
Yes Divorced 220K No
No Single 85K Yes
No Married 75K No
Natural to think of each sample No Single 90K Yes

as a vector of attributes, and


whole array as a matrix matrix

Jeff Howbert Introduction to Machine Learning Winter 2012 44


Definition: an n-tuple of values (usually real

– n referred to as the dimension of the vector
– n can be any positive integer
integer, from 1 to infinity
Can be written in column form or row form
– Column form is conventional
– Vector elements referenced by subscript
⎛ x1 ⎞
⎜ ⎟
x=⎜ M ⎟ x T = ( x1 L xn )
⎜x ⎟ T
⎝ n⎠ means " transpose"
t "
Jeff Howbert Introduction to Machine Learning Winter 2012 45

Can think of a vector as:

– a point in space or
– a directed line segment with a magnitude and

Jeff Howbert Introduction to Machine Learning Winter 2012 46

Vector arithmetic

Addition of two vectors

– add corresponding elements
z = x + y = (x1 + y1 L xn + yn )

– result is a vector

Scalar multiplication of a vector

– multiply each element by scalar
y = ax = (a x1 L axn )

– result is a vector

Jeff Howbert Introduction to Machine Learning Winter 2012 47

Vector arithmetic

Dot product of two vectors

– multiply
lti l corresponding
di elements,
l t ththen add
dd products
d t
a = x ⋅ y = ∑ xi yi
i =1

– result is a scalar
Dot product alternative form
a = x ⋅ y = x y cos (θ ) θ

Jeff Howbert Introduction to Machine Learning Winter 2012 48


Definition: an m x n two-dimensional array of

values (usually real numbers).
– m rows
– n columns
Matrix referenced by two-element subscript
– first element in
⎛ a11 L a1n ⎞
subscript is row ⎜ ⎟
A=⎜ M O M ⎟
– second element in ⎜a L a ⎟
⎝ m1 mn ⎠
subscript is column
– example: A24 or a24 is element in second row,
fourth column of A
Jeff Howbert Introduction to Machine Learning Winter 2012 49

A vector can be regarded as special case of a

matrix, where one of matrix dimensions = 1.
Matrix transpose (denoted T)
– swap columns and rows
row 1 becomes column 1, etc.
– m x n matrix becomes n x m matrix
– example: ⎛2 4 ⎞
⎜ ⎟
⎜7 6 ⎟
⎛ 2 7 − 1 0 3⎞
A = ⎜⎜ ⎟⎟ AT = ⎜ − 1 − 3⎟
⎝ 4 6 − 3 1 8⎠ ⎜ ⎟
⎜0 1 ⎟
⎜3 8 ⎟
⎝ ⎠
Jeff Howbert Introduction to Machine Learning Winter 2012 50
Matrix arithmetic

Addition of two matrices C= A+B =

– matrices must be same size
⎛ a11 + b11 L a1n + b1n ⎞
– add corresponding elements: ⎜ ⎟
⎜ M O M ⎟
cij = aij + bij ⎜a + b ⎟
⎝ m1 m 1 L a mn + bmn ⎠
– result is a matrix of same size

Scalar multiplication of a matrix B = d ⋅A =

– multiply each element by scalar: ⎛ d ⋅ a11 L d ⋅ a1n ⎞
⎜ ⎟
bij = d ⋅ aij ⎜ M O M ⎟
– result is a matrix of same size ⎜d ⋅a L d ⋅ a ⎟
⎝ m1 mn ⎠

Jeff Howbert Introduction to Machine Learning Winter 2012 51

Matrix arithmetic

Matrix-matrix multiplication
– vector-matrix multiplication
p jjust a special
p case


Multiplication is associative
Multiplication is not commutative
A ⋅ B ≠ B ⋅ A (generally)
Transposition rule:
( A ⋅ B )T = B T ⋅ A T

Jeff Howbert Introduction to Machine Learning Winter 2012 52

Matrix arithmetic

RULE: In any chain of matrix multiplications, the

column dimension of one matrix in the chain must
match the row dimension of the following matrix
in the chain.
A3x5 B5x5 C3x1
A ⋅ B ⋅ AT CT ⋅ A ⋅ B AT ⋅ A ⋅ B C ⋅ CT ⋅ A
A⋅B⋅A C⋅A⋅B A ⋅ AT ⋅ B CT ⋅ C ⋅ A
Jeff Howbert Introduction to Machine Learning Winter 2012 53
Vector projection

Orthogonal projection of y onto x

– Can take place in any space of dimensionality > 2
– Unit vector in direction of x is
x / || x ||
– Length of projection of y in
direction of x is
|| y || ⋅ cos(θ ) x
projx( y )
– Orthogonal projection of
y onto x is the vector
projx( y ) = x ⋅ || y || ⋅ cos(θ ) / || x || =
[ ( x ⋅ y ) / || x ||2 ] x (using dot product alternate form)

Jeff Howbert Introduction to Machine Learning Winter 2012 54

Optimization theory topics

Maximum likelihood
Expectation maximization
Gradient descent

Jeff Howbert Introduction to Machine Learning Winter 2012 55

Das könnte Ihnen auch gefallen