Beruflich Dokumente
Kultur Dokumente
Teacher:!
Correct output for trial 1 is 1.!
Correct output for trial 2 is 1.!
a! Correct output for trial 3 is 0.!
!
. . .!
Correct output for trial n is 0.!
threshold!
inputs!
Perceptron with 2 inputs:!
Points in x space!
xA
Let ! and! xB
be two pts on hyperplane.!
A B
Then ! y(x ) = 0 = y(x )
Therefore:!
T B A
w (x x )=0
Bishop
x_A – x_B is parallel to decision boundary!
Interpretation of
vector
subtraction!
Distance to hyperplane!
l = ||x|| cos ✓
Distance to hyperplane.!
wt x w0
l = ||x|| cos ✓ = ||x|| =
||x||||w|| ||w||
SVM Kernel!
Winston
Digit Segment Perceptron!
Winston!
Perceptron learning does not generalize to multilayer!
Winston!
An early model of an error signal: 1960’s!
Descriptive complexity is low!
Hinton
Why the simple system does not work
• A two layer network with a single winner in the top layer is equivalent to
having a rigid template for each shape.
– The winner is the template that has the biggest overlap with the ink.
• The ways in which shapes vary are much too complicated to be captured by
simple template matches of whole shapes.
– To capture all the allowable variations of a shape we need to learn the
features that it is composed of.
Hinton
Examples of handwritten digits from a test set
General layered feedforward network!
hidden
layers
input vector
Hinton
Matrix representations of networks!
Need for supervised training!
Sigmoid plots!
sigmoid!
derivative!
Simple transformations on functions!
Odd and Even Functions!
Odd: f(x) = -f(-x)! Flip about y-axis and flip about x-axis.!
Data is presented as
a scatter plot.
0
logsig(x) = logsig(x) · (1 logsig(x))
0 2
tanh(x) = 1 tanh (x)
If you know the value of a function at a particular point, you can quickly
compute the derivative of the function.!
Error Signal (Cost function)!
For regression.
Sum of squares
# of classes
S X
X c
For classification
E= dk (t) ln dk (t)
t=1 k=1
Cross entropy
Error Measure for Linear Unit!
f(x)!
Δx!
x - axis!
Δx
x! x + Δx!
Principle of gradient descent!
Positive gradient.!
S denotes # of
cases in a batch.!
# of cases in a batch = 4!
o!
Composite error surface for linear unit learning AND!
o!
g(x)
Formula for sigmoidal unit.!
Chain rule!
Derivative of sigmoid.!
Error surface with a sigmoidal unit!
o!
plateau!
Backpropagation of errors!
New form!
New form!
wji = ⌘oi j (hj )
Arbitrary location in network.!
For output layer!
0
n = [dn on ] g (hn )
3. Small values are chosen to avoid the saturated regions of the
sigmoid where the slope is near zero.!
Time complexity of backprop!
Ashish Gupta!
Sparse Autoencoder!
The number of hidden units is greater than the number of input units.!
The cost function is chosen to make most of the hidden units inactive.!
Minimize wt
Minimize # of magnitude.!
SS error! active units!
Limited-memory BFGS!
hidden
layers
input vector
Hinton
Training Procedure: Make sure lab results apply to
application domain.!
Otherwise:!
x2
x2
x1
x1 x3
D=2 D=3
A feature preprocessing example!
80 hidden units!
Hinton
Synaptic noise sources:!
1. Probability of vesicular!
release.!
2. Magnitude of response!
in case of vesicular !
release.!
Possible biological source of a quasi-global !
Difference-of-reward signal!
Definition:!
1. Model class-conditional densities p(x | C_k).!
Motivation:!
Aoccdrnig to a rscheearch at an
Elingshuinervtisy, it deosn't mttaer
inwaht oredr the ltteers in a
wrod are, the olnyiprmoetnt tihng is
tahtfristand lsat ltteer is at the rghit
pclae. The rsetcan be a toatl mses andyou
can sitll raed it wouthit a porbelm. Tihs
isbcuseae we do not raed erveylteter by its
slef, but the wrod as a wlohe
Top-down or Context Effects! Kaniza figures!