Beruflich Dokumente
Kultur Dokumente
1
Machine Learning Srihari
2
Machine Learning Srihari
Scalar
• Single number
– In contrast to other objects in linear algebra,
which are usually arrays of numbers
• Represented in lower-case italic x
– They can be real-valued or be integers
• E.g., let x ∈! be the slope of the line
– Defining a real-valued scalar
• E.g., let n ∈! be the number of units
– Defining a natural number scalar
5
Machine Learning Srihari
Vector
• An array of numbers arranged in order
• Each no. identified by an index
• Written in lower-case bold such as x
– its elements are in italics lower case, subscripted
⎡ x ⎤
⎢ 1 ⎥
⎢ x2 ⎥
⎢ ⎥
x=⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ xn ⎥
⎢⎣ ⎥⎦
• If each element is in R then x is in Rn
• We can think of vectors as points in space
– Each element gives coordinate along an axis
6
Machine Learning Srihari
Matrices
• 2-D array of numbers
– So each element identified by two indices
• Denoted by bold typeface A
– Elements indicated by name in italic but not bold
• A1,1 is the top left entry and Am,nis the bottom right entry
• We can identify nos in vertical column j by writing : for the
horizontal coordinate
• E.g., ⎡ A A ⎤
A= ⎢ ⎥
1,1 1,2
⎢ A2,1 A2,2 ⎥
⎣ ⎦
Tensor
• Sometimes need an array with more than two
axes
– E.g., an RGB color image has three axes
• A tensor is an array of numbers arranged on a
regular grid with variable number of axes
– See figure next
• Denote a tensor with this bold typeface: A
• Element (i,j,k) of tensor denoted by Ai,j,k
8
Machine Learning Srihari
Shapes of Tensors
9
Machine Learning Srihari
Transpose of a Matrix
• An important operation on matrices
• The transpose of a matrix A is denoted as AT
• Defined as
(AT)i,j=Aj,i
– The mirror image across a diagonal line
• Called the main diagonal , running down to the right
starting from upper left corner
Matrix Addition
• We can add matrices to each other if they have
the same shape, by adding corresponding
elements
– If A and B have same shape (height m, width n)
C = A+B ⇒C i,j = Ai,j +Bi,j
Multiplying Matrices
• For product C=AB to be defined, A has to have
the same no. of columns as the no. of rows of B
– If A is of shape mxn and B is of shape nxp then
matrix product C is of shape mxp
Multiplying Vectors
• Dot product between two vectors x and y of
same dimensionality is the matrix product xTy
• We can think of matrix product C=AB as
computing Cij the dot product of row i of A and
column j of B
14
Machine Learning Srihari
15
Machine Learning Srihari
Vector x is converted
into vector y by
multiplying x by a matrix W
Linear Transformation
• Ax=b
– where A ∈! n×n and b ∈! n
– More explicitly A x + A x +....+ A x = b 11 1 12 2 1n n 1
n equations in
A21 x1 + A22 x2 +....+ A2n xn = b2
n unknowns
An1 x1 + Am2 x2 +....+ An,n xn = bn
⎡ A ! A1,n ⎤ ⎡ x ⎤ ⎡ b ⎤
⎢ 1,1 ⎥ ⎢ 1 ⎥ ⎢ 1 ⎥ Can view A as a linear transformation
A= ⎢ " " " ⎥ x = ⎢ " ⎥ b= ⎢ " ⎥
⎢ A ⎥ ⎢ x ⎥ ⎢ b ⎥ of vector x to vector b
⎢ n,1 ! Ann ⎥ ⎢⎣ n ⎥⎦ ⎢⎣ n ⎥⎦
⎣ ⎦
nxn nx1 n x1
Matrix Inverse
• Inverse of square matrix A defined as
A−1 A = In
• We can now solve Ax=b as follows:
Ax = b
A−1 Ax = A−1b
In x = A−1b
−1
x = A b
20
Example: System of Linear
Machine Learning Srihari
m
y(x,w)=∑wi x i
i=1 21
Machine Learning Srihari
Closed-form solutions
• Two closed-form solutions
1. Matrix inversion x=A-1b
2. Gaussian elimination
22
Machine Learning Srihari
2. Gaussian Elimination
followed by back-substitution
Norms
• Used for measuring the size of a vector
• Norms map vectors to non-negative values
• Norm of vector x = [x1,..,xn]T is distance from
origin to x
– It is any function f that satisfies:
( )
f x = 0 ⇒x = 0
( ) ( ) Triangle Inequality
f( x +y) ≤ f x + f y
∀α ∈R f (αx ) = α f (x )
29
LP Norm
Machine Learning Srihari
• Definition: 1
⎛ p⎞ p
x = ⎜ ∑ xi ⎟
p ⎝ i ⎠
– L2 Norm
• Called Euclidean norm
– Simply the Euclidean distance
between the origin and the point x
– written simply as ||x||
– Squared Euclidean norm is same as xTx 22 + 22 = 8 = 2 2
– L1 Norm
• Useful when 0 and non-zero have to be distinguished
– Note that L2 increases slowly near origin, e.g., 0.12=0.01)
– L∞ Norm x = max x i
∞ i
• Loss Function
1 N
λ
E(w) = ∑ {y(x n ,w) − tn } + || w 2 ||
! 2
2 n =1 2
F
⎝ i,j ⎠
⎡ 2 −1 5 ⎤
⎢ ⎥
A=⎢ 0 2 1 ⎥ A = 4 + 1 + 25 + .. + 1 = 46
⎢ 3 1 1 ⎥ V matrix W matrix
⎣ ⎦
I J K
• Frobenius in ML I1×(I+1) × V(I+1)×J=netJ
– Layers of neural network hj=f(netj) f(x)=1/(1+e-x)
involve matrix multiplication
– Regularization:
• minimize Frobenius of weight
matrices ||W(i)|| over L layers
33
Machine Learning Srihari
34
Machine Learning Srihari
⎧⎪ 1 ⎫
N(x | µ,Σ) =
1 1
exp ⎪⎨− (x − µ)T Σ−1(x − µ)⎪⎪⎬
(2π)D/2 | Σ |1/2 ⎪⎪⎩ 2 ⎪⎪⎭
If Cov(X,Y)=0 then E(XY)=E(X)E(Y)
Machine Learning Srihari
• Orthogonal Vectors
– A vector x and a vector y are
orthogonal to each other if xTy=0
• If vectors have nonzero norm, vectors at
90 degrees to each other
– Orthonormal Vectors
• Vectors are orthogonal & have unit norm
• Orthogonal Matrix
– A square matrix whose rows are mutually
orthonormal: ATA=AAT=I Orthogonal matrices are of
– A-1=AT interest because their inverse is
very cheap to compute
Machine Learning Srihari
Matrix decomposition
• Matrices can be decomposed into factors to
learn universal properties, just like integers:
– Properties not discernible from their representation
1. Decomposition of integer into prime factors
• From 12=2×2×3 we can discern that
– 12 is not divisible by 5 or
– any multiple of 12 is divisible by 3
– But representations of 12 in binary or decimal are different
2. Decomposition of Matrix A as A=Vdiag(λ)V-1
• where V is formed of eigenvectors and λ are eigenvalues,
e.g,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
A= ⎢ 2 1 ⎥ has eigenvalues λ=1 and λ=3 and eigenvectors V: vλ=1 = ⎢ 1 ⎥ ,vλ=3 = ⎢ 1 ⎥
⎢ −1 ⎥ ⎢ 1 ⎥
⎢ 1 2 ⎥ ⎣ ⎦ ⎣ ⎦
⎣ ⎦
Machine Learning Srihari
Eigenvector
• An eigenvector of a square matrix
A is a non-zero vector v such that
multiplication by A only changes
the scale of v
Av=λv
– The scalar λ is known as eigenvalue
• If v is an eigenvector of A, so is
any rescaled vector sv. Moreover Wikipedia
• Consider Av=w A= ⎢ M M
⎢
⎢ An,1 L
⎣
M ⎥
Ann ⎥
⎥
⎦
v= ⎢ M
⎢ v
⎢⎣ n
⎥
⎥
⎥⎦
w= ⎢ M ⎥
⎢ w ⎥
⎢⎣ n ⎥⎦
Example of Eigenvalue/Eigenvector
• Consider the matrix ⎡ 2 1 ⎤
A= ⎢ ⎥
⎢ 1 2 ⎥
⎣ ⎦
• Taking determinant of (A-λI), the char poly is
⎡ 2 −λ 1 ⎤
| A −λI |= ⎢⎢ ⎥ = 3 − 4λ + λ 2
⎥
⎣ 1 2 −λ ⎦
Eigendecomposition
• Suppose that matrix A has n linearly
independent eigenvectors {v(1),..,v(n)} with
eigenvalues {λ1,..,λn}
• Concatenate eigenvectors to form matrix V
• Concatenate eigenvalues to form vector
λ=[λ1,..,λn]
• Eigendecomposition of A is given by
A=Vdiag(λ)V-1
43
Machine Learning Srihari
45
Machine Learning Srihari
48
Machine Learning Srihari
SVD Definition
• Write A as a product of 3 matrices: A=UDVT
– If A is m×n, then U is m×m, D is m×n, V is n×n
• Each of these matrices have a special structure
• U and V are orthogonal matrices
• D is a diagonal matrix not necessarily square
– Elements of Diagonal of D are called singular values of A
– Columns of U are called left singular vectors
– Columns of V are called right singular vectors
Use of SVD in ML
1. SVD is used in generalizing matrix inversion
– Moore-Penrose inverse (discussed next)
2. Used in Recommendation systems
– Collaborative filtering (CF)
• Method to predict a rating for a user-item pair based on the
history of ratings given by the user and given to the item
• Most CF algorithms are based on user-item rating matrix
where each row represents a user, each column an item
– Entries of this matrix are ratings given by users to items
• SVD reduces no.of features of a data set by reducing space
dimensions from N to K where K < N
51
Machine Learning Srihari
Moore-Penrose Pseudoinverse
• Most useful feature of SVD is that it can be
used to generalize matrix inversion to non-
square matrices
• Practical algorithms for computing the
pseudoinverse of A are based on SVD
A+=VD+UT
– where U,D,V are the SVD of A
• Pseudoinverse D+ of D is obtained by taking the
reciprocal of its nonzero elements when taking transpose
of resulting matrix
53
Machine Learning Srihari
Trace of a Matrix
• Trace operator gives the sum of the elements
along the diagonal
Tr(A )= ∑ Ai ,i
i ,i
F
(
A = Tr(A)) 2
54
Machine Learning Srihari
Determinant of a Matrix
• Determinant of a square matrix det(A) is a
mapping to a scalar
• It is equal to the product of all eigenvalues of
the matrix
• Measures how much multiplication by the
matrix expands or contracts space
55
Machine Learning Srihari
Example: PCA
• A simple ML algorithm is Principal Components
Analysis
• It can be derived using only knowledge of basic
linear algebra
56
Machine Learning Srihari
57
Machine Learning Srihari
60
Machine Learning Srihari
62