Beruflich Dokumente
Kultur Dokumente
Neural Networks
What are Neural Networks?
• An extremely simplified version of
the brain
• Essentially a function approximator
Transform inputs into outputs to the
best of its ability
inputs outputs
inputs NN outputs
What are Neural Networks?
What are Neural Networks?
• For the most part, neural networks are useful in
situations where multiple regression is used.
• In a neural network
• the independent variables are called input cells
• the dependent variable is called an output cell
(depending on the network model there might be
more than one output).
• As in regression there is a certain number of
observations (say N).
• Each observation contains a value for each input cell
(or independent variable) and the output cell (or
dependent variable).
In an illustration of neural network, each circle
is a cell of the network. There are three
different types of cells:
1 8
5
2
6
3
7
4
The figure above has 4 input variables and one hidden layer. With the
exception of cell 0, each cell in the network is connected by an arc to
each cell in the next network layer. There is an arc connecting cell 0
with each cell in a Hidden or Output Layer.
INPUT and OUTPUT VALUES
For any observation, each cell in the network
has an associated input value and output value.
INP(8)=w08(1)+w58OUT(5)+w68OUT(6)+ 78OUT(7)
OUT(8) = f(INP(8))
(OUT (8) O )
j 1
j j
2
SCALING DATA
input
e einput
f (input ) input input
e e
Note that for this function an input near -
yields an output near 0 and an input near +
yields an output near 1.
REMARKS
1. The sigmoid function is often called the
squashing function because it "squashes" values
on the interval [-, +] to the unit interval [0 1].
2. The slope of the sigmoid function for the [0
1] interval is given by
f'(Input) = f(Input)(1-f(Input))
And
and
Define;
R2(Train) = SSR(Train)/SST(Train)
If the network is to be useful for forecasting,
the R2 computed from the Test portion of the
data should be close to R2(Train).
CONTINUOUS AND BINARY DATA
If your dependent variable assumes only
two values (say 0 and 1) we say we have
binary data.
In this case the usual procedure is to try and
train the network until as many outputs as
possible are below 0.1 and above 0.9.
Then those observations with predictions
less than .1 are classified as 0 and those
observations with predictions larger than .9
are classified as 1.
If we do not have binary data we say that
the data is continuous.
Examples of the Use of
Neural Networks
Example 1: The efficient market hypothesis of
financial markets states that the "past history"
of a stock's returns yields no information about
the future return of the stock. White (1988)
examines returns on IBM to see if the market
is efficient. He begins by running a multiple
regression where the dependent variable is the
next day's return on IBM stock and the five
independent variables are the return on IBM
during each of the last five days.
This regression yielded R2 = .0079, which is
consistent with the efficient market hypothesis.
White then ran a neural network (containing one
hidden layer) with the output cell corresponding to
the next day's return on IBM and 5 input cells
corresponding to the last five days' return on IBM.
This neural network yielded R2 = .179. This implies
that the past five days of IBM returns do contain
information that can be used to make predictions
about tomorrow's return on IBM!
According to the October 9, 1993 Economist,
Fidelity manages 2.6 billion dollars in assets using
neural nets. One of the neural net funds has beat
the S&P 500 by 2-7% a quarter for more than
three years.
Example 2: Researchers at Carnegie-Mellon
University have developed ALVINN 1
(Automated Land Vehicle in a Neural Network),
a neural network that can drive a car! It can tell
if cars are nearby and then slow down the car.
Within ten years a neural network may be
driving your car!
Example 3: In finance and accounting it is
important to accurately predict whether or not
a company will go bankrupt during the next year.
Altman(1968) developed a method (Altman's Z-
statistic) to predict whether or not a firm will
go bankrupt during the next year based on the
firm's financial ratios. This method used a
version of regression called discriminant
analysis. Neural networks using financial ratios
as input cells have outperformed Altman's Z.
Example 4: The September 22, 1993 New York
Times reported that Otis Elevator uses neural
networks to direct elevators. For example, if
elevator 1 is on floor 10 and going up, elevator 2
is on floor 6 going down, and elevator 3 is on
floor 2 and going up, which elevator should
answer a call to go down from floor 7?
Example 5: Many banks (Mellon and Chase are
two examples) and credit card companies use
neural networks to predict (on the basis of past
usage patterns) whether or not a credit card
transaction should be disallowed. AVCO
Financial used a neural net to determine
whether or not to lend people money. They
increased their loan volume by 25% and
Example 6: "Pen" computers and personal digital
assistants often use neural nets to read the
user's handwriting. The "inputs" to the network
are a binary representation of what the user
has written. For example, let a 1 = place where
we have written something and a 0 = a place
where we have not written something. An input
to the network might look like
0001111
0001000
0001110
00000l0
0001110
The neural net must decide to classify this
input as an "s", a "5" or something else.
1 3
We assume that for some θ the transfer function for the output cell is defined
by f(x) = 1 if x and f(x) = 0 if x<. Given this transfer function, it is natural to
ask whether there are any values for and the wij's that will enable the above
network to make the correct predictions for the data set in the table in
previous slide.
From the figure in previous slide we find
Observation 1: w03<
Observation 2: w03 + w13
Observation 3: w03 + w23
Observation 4: w03 + w13 + w23<
There are no values of w03, w13, w23 and that satisfy
the following inequalities:
Observation 1: w03<
Observation 4: w03 + w13 + w23<
To see this note that together
Observation 3: w03 + w23
Observation 4: w03 + w13 + w23<
imply w13<0.
Adding w13<0 to Observation 1: w03<
implies that w03 + w13< θ, which contradicts the following
inequality
Observation 2: w03 + w13
Thus we have seen that there is no way the neural network
of this figure can correctly predict the output for each
observation.
Suppose we add a hidden layer with two nodes
to the previous figure’s neural net.This yields
the neural network in the following figure.
0 5
3
1
4
2
We define the transfer function fi at cell i by fi(x) = 1 if xi and fi(x) = 0 for x<i. If
we choose 3 = .4, 4 = 1.2, 5 = .5, w03 = w04 = w05 = 0, w13 = w14 = w23 = w24 =
1, w35 = .6, w45 = -.2, then the neural network for the figure above will yield
correct predictions for each observation. We now verify that this is the case.
Observation 1: