Sie sind auf Seite 1von 3

Classical Conditioning (RW Model) and Operant Conditioning

(Static Choice Model)


Anup Singh, Adam Gross, Niral Desai, Maulik Desai
April 2, 2015

Classical Conditioning Model

One of the earlier theories of classical conditioning was proposed by Rescorla and Wagner in 1972. In their
original paper, they introduce the notion of the associative strength of a conditioned stimulus (CS) that
quantifies the association between the CS and the conditioned response (CR). For an organism to learn
something, the associative strength V must increase over subsequent trials, where V = ( V ) defines
the change in V after each trial. The parameters and are best understood as learning rates, where
represents the salience of the CS (that is, how likely it is to induce a CR) and represents the strength of the
unconditioned stimulus (US). Both and are defined to be between zero and one, and their product equals
the overall learning rate. However, learning to associate a CS to a CR must be accomplished by certain
point (otherwise, why bother?), so a final parameter is introduced that quantifies the asymptotic level of
V , which is the point at which learning effectively ends and V 0 asymptotically. Finally, Rescorla and
Wagner extended this idea to encompass multiple CS, where they proposed a simple sum Vtot = Va + Vx for
two conditioned stimuli. Thus, the final model for a compound CS becomes:
Va = ( Vtot )
Vx = ( Vtot )
Vtot = Va + Vx
To implement the R-W Model in a computer environment, we plan to create two simple stimuli. Both
will represent a light either flashing on or staying off at two different rates fa and fx , so it will be slight
extension of the model. To do this, we will run a simulation for a predetermined time, observing when a
particular CS becomes dominant by having a higher associative strength, which Rescorla and Wagner denote
as overshadowing. We also plan to vary parameters , , and to generate curves that show how changing
the parameters affects the rate of learning. We can compare our results to a control white noise stimulus.
In the paper Assessment of the Rescorla Wagner Model (Miller et. al.), the R-W Models consistency with
predicting results from compound conditioning is highlighted as one of its major successes. Thus, for our
project, we want to look into two interesting compound conditioning phenomena known as overshadowing
and blocking, and how the RW Model accounts for these surprising features. In Overshadowing, two neutral
elements combine to form a compound neutral stimulus which then is conditioned to form a compound CS.
However, when the elements that make up this compound CS are tested individually, one of the elements
elicits a stronger response than the other. An example that illustrates this overshadowing phenomenon is for
rats, they naturally respond to sound stimuli rather than light stimuli, thus meaning that sound overshadows
the light (which is likely due to the nature of rats lives and what they rely on more for survival). However,
these overshadowing stimuli can be reversed with experimentation, as expected through conditioning. To
model overshadowing through the RW-Model, we initially would set neutral stimulus elements Va , Vb = 0
and Vab = Va + Vb = 0, and then we can give one of the two elements a higher salience in the sense that it
is favored (i.e. the sound over light) so a = 0.7 and b = 0.3, and then as we generate the model of the
associative strength (as outlined in the first paragraph), we will be able to observe the overshadowing as the
Va stimulus will produce a stronger conditioned response.
1

Blocking is another phenomenon from compound conditioning and it describes the situation in which one
of the stimuli is already fully conditioned, meaning its counterpart stimulus (i.e. within the compound) wont
elicit a response. This can be simply explained by appending the Pavlovs dog experiment. If Pavlov had
then extended the experiment to make a compound stimulus in which he added light with the bell sound for
salivation after already fully conditioning the dog to salivate with the bell, just exposing the dog to the light
will not make him salivate. Thus, here since the bell is fully conditioned, it is said to block the conditioning
by the light because the light does not elicit any response due to the already fully-conditioned response given
to the dog. To implement this phenomenon of blocking, using the same variables as with overshadowing,
we would set Va = 0 and Vb = 1 (for 100 % representing the fully conditioned) and let a = b = 0.5 to
be the same assuming there is neither preference for either stimuli. Then, as we go through the trials, as
expected, the compound CS value will remain the same and specifically, the neutral stimulus will maintain
its value and the fully-conditioned stimulus will stay at its 100% value (i.e. staying fully conditioned and
thus showing the blocking phenomena).
Overall, for classical conditioning, we are aiming to implement the general RW model with its given
parameters in MATLAB as described in the first section. Then, as described in the latter paragraphs, we
want to highlight some of its major successes (specifically within compound conditioning) by implementing
the model to show the phenomena of overshadowing and blocking naturally following the procedures outlined
above.

Operant Conditioning Model

Indirect and direct actor models of operant conditioning are two ways to represent behaviors elicited through
static action choice experiments. Static action choice models describe an animals choice between a number
of available options as a random choice according to a probability distribution defined by parameters mi ,
called action values. An example in nature is that of bees choosing to visit various colors of flowers to collect
nectar. Typically for static action choice models, the probability of the animal choosing response A is
exp(mA )
P
.
exp(mi )
indicates variability of the animals choices; a small indicates a reluctance to environmental adaptation,
where high gives a highly unstable but quickly adapting behavior to sudden changes in environment.
The indirect actor model of static action choice weights the probabilities of actions according to the
expectation of rewards for the action. That is, we set the action values of each possible choice equal to the
expected reward of that choice. The animal population dynamically adapts to the result of a trial by altering
the probabilities of taking any particular action via updating the values of mi according to a prediction error
(response) - (mi ). We plan to implement this model in MATLAB by designing a basic binary-choice indirect
actor experiment where two choices give unequal responses, one response based probabilistically and the
other consistently. We will then vary the environmental responses of the actions in time at different rates,
so that the animal populations must adapt to dynamic environments. We could vary the returns of each
choice slowly, or change how stochastic the response of each choice is (for example, drawing the response
from various distributions such as changing from Gaussian distributions to Poisson distributions).
Further, we would like to build an evolving population of individuals with various values of ranging
from 0 to 1 and investigate how the population can evolve under various environmental stresses (types
of changes and frequency of changes in the responses to choices). Types of environmental stresses can
include instantaneously swapping distributions of responses that each choice gives. Altering parameters in
distributions smoothly over consecutive iterations of trials is a way to introduce slow environmental changes.
Through these models, we can view how explorative individual (with very high and hence a disposition to
rapidly change choice policies) and stubborn individuals (with low and hence hesitance to adjust choice
policies) respond to the changes in environmental responses. Evolution can be simulated by assigning a
chance of reproduction according to a simple probability distribution dependent on whether individuals with
varying parameters successfully acquire nectar consistently during periods of environmental changes.

Literature Consulted
1. Classical Conditioning
(a) Original Rescorla & Wagner Paper from 1972:
http://www.ualberta.ca/ egray/teaching/Rescorla%20&%20Wagner%201972.pdf
(b) http://users.ipfw.edu/abbott/314/Rescorla2.htm
(c) Paper assessing RW Model (with compound conditioning focus): http://graulab.tamu.edu/JGrau/Psyc606/Papers/Miller1993.pdf
2. Operant Conditioning
(a) Learning Automata A Survey (paper by Narendra & Thatachar from 1974)
(b) http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5408453
(c) Williams Paper on Gradient-Following Algorithms for Connectionist Reinforcement Learning
(d) http://incompleteideas.net/sutton/williams-92.pdf

Das könnte Ihnen auch gefallen