Sie sind auf Seite 1von 19

Machine Learning

Math Essentials

Jeff Howbert Introduction to Machine Learning Winter 2012 1


Areas of math essential to machine learning

Machine learning is part of both statistics and computer


science
– Probability
– Statistical inference
– Validation
– Estimates of error, confidence intervals
Li
Linear algebra
l b
– Hugely useful for compact representation of linear
transformations on data
– Dimensionality reduction techniques
Optimization
p theoryy

Jeff Howbert Introduction to Machine Learning Winter 2012 2


Why worry about the math?

There are lots of easy-to-use machine learning


packages out there.
After this course, you will know how to apply
several of the most ggeneral-purpose
p p algorithms.
g

HOWEVER
To get really useful results, you need good
mathematical
at e at ca intuitions
tu t o s about certain
ce ta general
ge e a
machine learning principles, as well as the inner
workings of the individual algorithms.

Jeff Howbert Introduction to Machine Learning Winter 2012 3


Why worry about the math?

These intuitions will allow you to:


– Choose the right algorithm(s) for the problem
– Make good choices on parameter settings,
validation strategies
g
– Recognize over- or underfitting
– Troubleshoot poor / ambiguous results
– Put appropriate bounds of confidence /
uncertainty on results
– Do a better job of coding algorithms or
incorporating
p g them into more complex
p
analysis pipelines
Jeff Howbert Introduction to Machine Learning Winter 2012 4
Notation

a∈A set membership: a is member of set A


|B| cardinality: number of items in set B
|| v || norm: length of vector v
∑ summation
∫ integral
ℜ th sett off reall numbers
the b
ℜn real number space of dimension n
n = 2 : plane or 2-space
n = 3 : 3- (dimensional) space
p
n > 3 : n-space or hyperspace
yp p

Jeff Howbert Introduction to Machine Learning Winter 2012 5


Notation

x, y, z, vector (bold, lower case)


u, v
A, B, X matrix (bold, upper case)
y = f( x ) function (map): assigns unique value in
range of y to each value in domain of x
dyy / dx derivative of y with respect
p to single
g
variable x
y = f(( x ) function on multiple
p variables, i.e. a
vector of variables; function in n-space
∂y / ∂xi partial derivative of y with respect to
element i of vector x
Jeff Howbert Introduction to Machine Learning Winter 2012 6
Linear algebra applications

1) Operations on or between vectors and matrices


2) Coordinate transformations
3) Dimensionality reduction
4) Linear regression
5) Solution of linear systems of equations
6) M
Many others
th

Applications 1) – 4) are directly relevant to this


course. Today we’ll start with 1).

Jeff Howbert Introduction to Machine Learning Winter 2012 43


Why vectors and matrices?

Most common form of data vector


organization for machine Refund Marital Taxable

learning is a 2D array, where Status Income Cheat

Yes Single 125K No

– rows represent
p samples
p No
No
Married
Single
100K
70K
No
No
(records, items, datapoints) Yes Married 120K No
No Divorced 95K Yes

– columns represent
p attributes No Married 60K No

(features, variables)
Yes Divorced 220K No
No Single 85K Yes
No Married 75K No
Natural to think of each sample No Single 90K Yes

as a vector of attributes, and


10

whole array as a matrix matrix

Jeff Howbert Introduction to Machine Learning Winter 2012 44


Vectors

Definition: an n-tuple of values (usually real


numbers).
– n referred to as the dimension of the vector
– n can be any positive integer
integer, from 1 to infinity
Can be written in column form or row form
– Column form is conventional
– Vector elements referenced by subscript
⎛ x1 ⎞
⎜ ⎟
x=⎜ M ⎟ x T = ( x1 L xn )
⎜x ⎟ T
⎝ n⎠ means " transpose"
t "
Jeff Howbert Introduction to Machine Learning Winter 2012 45
Vectors

Can think of a vector as:


– a point in space or
– a directed line segment with a magnitude and
direction

Jeff Howbert Introduction to Machine Learning Winter 2012 46


Vector arithmetic

Addition of two vectors


– add corresponding elements
z = x + y = (x1 + y1 L xn + yn )
T

– result is a vector

Scalar multiplication of a vector


– multiply each element by scalar
y = ax = (a x1 L axn )
T

– result is a vector

Jeff Howbert Introduction to Machine Learning Winter 2012 47


Vector arithmetic

Dot product of two vectors


– multiply
lti l corresponding
di elements,
l t ththen add
dd products
d t
n
a = x ⋅ y = ∑ xi yi
i =1

– result is a scalar
y
Dot product alternative form
a = x ⋅ y = x y cos (θ ) θ
x

Jeff Howbert Introduction to Machine Learning Winter 2012 48


Matrices

Definition: an m x n two-dimensional array of


values (usually real numbers).
– m rows
– n columns
Matrix referenced by two-element subscript
– first element in
⎛ a11 L a1n ⎞
subscript is row ⎜ ⎟
A=⎜ M O M ⎟
– second element in ⎜a L a ⎟
⎝ m1 mn ⎠
subscript is column
– example: A24 or a24 is element in second row,
fourth column of A
Jeff Howbert Introduction to Machine Learning Winter 2012 49
Matrices

A vector can be regarded as special case of a


matrix, where one of matrix dimensions = 1.
Matrix transpose (denoted T)
– swap columns and rows
row 1 becomes column 1, etc.
– m x n matrix becomes n x m matrix
– example: ⎛2 4 ⎞
⎜ ⎟
⎜7 6 ⎟
⎛ 2 7 − 1 0 3⎞
A = ⎜⎜ ⎟⎟ AT = ⎜ − 1 − 3⎟
⎝ 4 6 − 3 1 8⎠ ⎜ ⎟
⎜0 1 ⎟
⎜3 8 ⎟
⎝ ⎠
Jeff Howbert Introduction to Machine Learning Winter 2012 50
Matrix arithmetic

Addition of two matrices C= A+B =


– matrices must be same size
⎛ a11 + b11 L a1n + b1n ⎞
– add corresponding elements: ⎜ ⎟
⎜ M O M ⎟
cij = aij + bij ⎜a + b ⎟
⎝ m1 m 1 L a mn + bmn ⎠
– result is a matrix of same size

Scalar multiplication of a matrix B = d ⋅A =


– multiply each element by scalar: ⎛ d ⋅ a11 L d ⋅ a1n ⎞
⎜ ⎟
bij = d ⋅ aij ⎜ M O M ⎟
– result is a matrix of same size ⎜d ⋅a L d ⋅ a ⎟
⎝ m1 mn ⎠

Jeff Howbert Introduction to Machine Learning Winter 2012 51


Matrix arithmetic

Matrix-matrix multiplication
– vector-matrix multiplication
p jjust a special
p case

TO THE BOARD!!

Multiplication is associative
A⋅(B⋅C)=(A⋅B)⋅C
Multiplication is not commutative
A ⋅ B ≠ B ⋅ A (generally)
Transposition rule:
( A ⋅ B )T = B T ⋅ A T

Jeff Howbert Introduction to Machine Learning Winter 2012 52


Matrix arithmetic

RULE: In any chain of matrix multiplications, the


column dimension of one matrix in the chain must
match the row dimension of the following matrix
in the chain.
Examples
A3x5 B5x5 C3x1
Right:
A ⋅ B ⋅ AT CT ⋅ A ⋅ B AT ⋅ A ⋅ B C ⋅ CT ⋅ A
Wrong:
A⋅B⋅A C⋅A⋅B A ⋅ AT ⋅ B CT ⋅ C ⋅ A
Jeff Howbert Introduction to Machine Learning Winter 2012 53
Vector projection

Orthogonal projection of y onto x


– Can take place in any space of dimensionality > 2
– Unit vector in direction of x is
y
x / || x ||
– Length of projection of y in
direction of x is
θ
|| y || ⋅ cos(θ ) x
projx( y )
– Orthogonal projection of
y onto x is the vector
projx( y ) = x ⋅ || y || ⋅ cos(θ ) / || x || =
[ ( x ⋅ y ) / || x ||2 ] x (using dot product alternate form)

Jeff Howbert Introduction to Machine Learning Winter 2012 54


Optimization theory topics

Maximum likelihood
Expectation maximization
Gradient descent

Jeff Howbert Introduction to Machine Learning Winter 2012 55

Das könnte Ihnen auch gefallen