Beruflich Dokumente
Kultur Dokumente
3131
Prior Griffiths and Tenenbaum (2006) use publicly avail- which are used to construct large-scale neural models. The
able real-world data to identify the true prior distribution first two principles are described in the following sections.
p(ttotal ) over life spans (shown in Figure 1A). The principle of representation also describes how probability
distributions can be represented using spiking neurons. The
Likelihood The likelihood p(t|ttotal ) is the probability of third principle is not required for this paper, and its details
encountering a person at age t given that their total life span is can be found elsewhere (Eliasmith & Anderson, 2003).
ttotal . Griffiths and Tenenbaum (2006) assume for simplicity
Principle 1 Representation
that we are equally likely to meet a person at any point in his
or her life. As a result, this probability is uniform, p(t|ttotal ) = In the NEF, information is represented as time-varying vec-
1/ttotal , for all t < ttotal (and 0 for t ttotal ). tors of real numbers by populations of neurons. We say that a
population of neurons has activities ai (x), which encode an n-
dimensional stimulus vector, x = [x1 , x2 , . . . , xn ], by defining
Prediction function Combining the prior with the likeli-
the encoding:
hood according to Equation 1 yields a probability distribu-
ai (x) = Gi [Ji (x)] , (3)
tion p(ttotal |t) over all possible life spans ttotal for a person
encountered at age t. As is standard in Bayesian predic- where Gi [] is the nonlinear transfer function describing the
tion, Griffiths and Tenenbaum (2006) use the median of this neurons spiking response, and Ji (x) is the current entering
distributionthe point at which it is equally likely that the the soma of the neuron. For the purpose of our model, we
true life span is either longer or shorteras the estimate for have chosen Gi [] to be the leaky integrate-and-fire (LIF) neu-
ttotal . This identifies a prediction function that specifies a pre- ron model. The soma current is defined by:
dicted value of ttotal for each observed value of t.
Ji (x) = i hei , xin + Jibias , (4)
Results Results obtained by Griffiths and Tenenbaum
where Ji (x) is the current in the soma, i is a gain and con-
(2006) through this Bayesian model are shown in Figure 1B.
version factor, x is the stimulus vector to be encoded, ei
is the encoding vector which corresponds to the preferred
(
A) (
B) stimulus of the neuronconsistent with the standard idea
of a preferred direction vector (Schwartz, Kettner, & Geor-
gopoulos, 1988)and Jibias is a bias current that accounts
for background activity. The notation h, in indicates an n-
dimensional dot-product.
Given this encoding, the original stimulus vector can be
estimated by decoding those activities as follows:
x = ai (x)di . (5)
i
3132
have chosen these filters (hi (t)) to be the postsynaptic currents the encoding and decoding functions in () basis for each
(PSCs) induced in subsequent neuron by the arrival of a spike. neuron. We now substitute these into Eq 10:
Eliasmith and Anderson (2003) have shown that this assump- " #
tion causes minimal information loss which can be further
ai (x(; k)) = Gi i kn n ()eim m () + Jibias
reduced by increasing the population size. m,n
This temporal code can be combined with the population " #
code defined before (Eqs. 3, 4, 5), to provide a general popu- bias
= Gi i kn eim mn + Ji
lation temporal code for vectors. The encoding and decoding m,n (14)
equations for such a code are given by Eq. 7 and Eq. 8:
bias
h i = Gi i kn ein + Ji
(t tim ) = Gi i hei , xin + Jibias (7) h
n
i
= Gi i hei , kin + Jibias .
x = hi (t tm )di . (8)
i,m
This way, function encoding is expressed as vector encod-
Representing probability distributions Probability distri- ing identical to Eq. 7. Similarly, function decoding can also
butions are essentially functions of some parameters. Having be expressed as vector decoding as follows:
described how to represent vectors using the NEF, we con-
sider the relationship between vector and function representa- k = ai (k)di . (15)
tion. For any representation, we need to specify the domain of i
that representation. In case of vectors, the domain is the sub- To summarize, we have shown that it is mathematically
space of the vector space that is represented by the neurons equivalent to talk in terms of (finite-dimensional) function
(e.g., the x vector). We define the relevant function domain spaces or (finite-dimensional) vector spaces. Since probabil-
by parameterizing the set of represented functions by an n- ity distributions are most generally functions, we can approx-
dimensional vector of coefficients k = [k1 , k2 , . . . , kn ]. These imate them as high-dimensional vectors over a fixed set of
define any function of interest over a fixed set of basis func- basis functions using the NEF.
tions () as follows:
Principle 2 Transformation
n
x(; k) = k j j (), for k p(k). (9) Transformations of neural representations are functions of the
j=1 vector variables represented by neural populations.
To perform a transformation f (x) in the NEF, instead of
Thus we define a particular probability distribution p(k) by finding the representational decoders di to extract the orig-
limiting the space spanned by the basis () to some sub- inally encoded variable x, we can re-weight the decoding
space of interest depending on the application. This is also to specify some function f (x) other than identity. In other
the domain over which the optimization to find the decoders f (x)
words, we can find the decoders di (also known as trans-
in Eq. 5 is performed.
formational decoders) by using least-squares optimization
Next, we define population encoding and decoding analo-
to minimize the difference between the decoded estimate of
gous to that in Eqs 3 and 5 for functions:
f (x) and the actual f (x), which results in the transformation:
h i
ai (x(; k)) = ai (k) = Gi i hei (), x(; k)in + Jibias (10) f (x)
x = ai (x)di . (16)
x(; k) = ai (k)di (), (11) i
i
Both linear and nonlinear functions of the encoded vector
where ei () and di () are the encoding and decoding func- variable can be computed in this manner (Eliasmith & An-
tions of the neurons. We project these functions onto the same derson, 2003). In the NEF, connection weights between neu-
basis () used to identify the function space. For simplic- rons can be defined in terms of encoders and decoders as:
f (x)
ity, we assume that () is an orthonormal basis an anal- i j = j e j di , where i indexes the presynaptic population,
ogous derivation for a bi-orthonormal set can be found else- f (x)
j indexes the postsynaptic population, and di are represen-
where (Eliasmith & Martens, 2011). Hence, we get the fol- tational or transformational decoders.
lowing encoding and decoding functions:
n Neural model of life span prediction
ei () = ei j j () (12) Figure 2 shows the architecture of the neural model for
j=1 life span inference built using the NEF. All neural ensem-
n
bles (populations of neurons; symbolically represented by
di () = di j j (), (13)
five circles) are 20 dimensional and contain 200 LIF neu-
j=1
rons each, except the Normalized Posterior ensemble which
where ei j and di j identify the n coefficients that represent is 120 dimensional and contains 800 LIF neurons. The
3133
105
Human predictions
100 Direct mode (no neurons)
Predicted ttotal
95 Neural model predictions
90
85
80
Figure 2: A schematic diagram of the neural model. Here 75
Likelihood and Prior contain 200 neurons each, Prod- 70
uct network contains 4000 neurons and Normalized Poste-
rior contains 800 neurons. 65
0 20 40 60 80 100
t values
product network computes an element-wise product of its
inputs. Though multiplication is nonlinear, it has a well- Figure 3: Inference results from neural model (95% con-
characterized implementation in neurons that does not require fidence intervals), compared to humans and Direct mode -
nonlinear interactions, and can be implemented accurately our model with computations in low-dimensional (20 dimen-
with the NEF (Gosmann, 2015). The product network makes sional basis) space, but without neurons.
use of this characterization. It has 40 neural ensembles of 100
neurons each for a total of 4,000 neurons. The entire model
contains 5,200 neurons. approach the Direct mode results (800 neurons provide the
To represent the probability distributions (prior and likeli- best fit to human data). Thus, neural results match the human
hood) needed to perform the task, we define a basis 20 () to data better due to the approximate representation of the nor-
span the space of each distribution. To compute the basis we malized posterior by the neurons in the Normalized Posterior
sample from a family of 120 dimensional distributions and do ensemble. The tuning curves of the neurons in this ensemble
Singular Value Decomposition to obtain a 20 dimensional ba- were fit to a function space consisting of a family of distribu-
sis. This basis is used to determine the encoders (as given by tions which have three parameters (similar to the parameters
Eq. 12) used in the NEF simulation. The same basis is used in the prior) and also depend on the current age (t) (similar to
for the optimization to find the neuron decoders (as given by the likelihood function). The three parameters: a - the skew-
Eq. 13) that are needed to perform the desired computations. ness parameter was varied from -7 to -4, scale - used to scale
Similar to the encoding and decoding functions, the 120 di- the distribution was varied from 26 to 29 and loc - used to
mensional prior and likelihood functions are also projected shift the distribution was varied between 49 to 101. The cur-
to the 20 dimensional space through weights over the basis. rent age (t) was varied in the range of +/-5 for a given age in a
Refer to the supplemental material for details. trial except ages below 5 for which the range was taken to be
The likelihood input and prior input are nodes that pro- from [1, 10]. This provides the function space that was used
vide the named 20 dimensional inputs to the neural ensem- to sample the encoders for Normalized Posterior ensemble.
bles Likelihood and Prior respectively. The product network We use the Kolmogorov-Smirnov (K-S) test to examine
receives input from these ensembles and computes the pos- the goodness of fit of the neural model predictions relative
terior distribution (in the 20 dimensional space). The out- to the Griffiths and Tenenbaum (2006) model. The data
put connection from product network to Normalized Poste- used for the K-S test are shown in Figure 4b. The dissimi-
rior reconstructs the posterior back to 120 dimensional space larity of the Griffiths and Tenenbaum (2006) model relative
and computes the normalization function using principle 2 to human predictions is 9.628, while that of the neural model
of the NEF. Thus, the Normalized Posterior ensemble rep- is 1.959, indicating the much closer fit of the neural model
resents the normalized posterior distribution. Next we ap- to the human data. Figure 4a shows a comparison between
proximate the median of this distribution on the connection the Griffiths and Tenenbaum (2006) model, the computational
between the Normalized Posterior ensemble and the Predic- model (our replication of their model), and direct mode (our
tion node (again using principle 2). We read out the model model with computations in a compressed 20 dimensional
prediction from the Prediction node. space, but without neurons). Since the results obtained from
Figure 3 shows the inference results obtained from the the direct mode are the same as the computational model,
spiking neural network run in the Nengo (Bekolay et al., the low dimensional embedding is not losing any informa-
2014) software package. Model predictions are plotted for tion. However, we expect some error due to this constraint
current ages (t) from 1 to 100. The difference between the for more complex priors (though we have not explored the
results in Direct mode and Neuron mode is due to the limited minimum dimensionality for this prior).
number of neurons in the Normalized Posterior ensemble. As Overall, our results suggest that the closer fit of the neu-
the number of neurons in this ensemble increases, the results ral data can be solely attributed to fitting the neuron tuning
3134
105 100
Computational model Human predictions
100 Direct mode (no neurons) 95 Tenenbaum el al. predictions
Predicted ttotal
Predicted ttotal
95 Tenenbaum et al. model Neural model predictions
90
90
85
85
80 80
75 75
70 70
0 20 40 60 80 100 10 20 30 40 50 60 70 80 90 100
t values t values
(a) No error due to low dimensional embedding. (b) Data used for the goodness of fit test.
Figure 4: (a) Results from Griffiths and Tenenbaum (2006) model (only data corresponding to human data), Computational
model i.e., our replication of their model, and Direct mode i.e., our model with computations in low-dimensional space, but
without neurons. (b) Kolmogorov-Smirnov (K-S) test results. Dissimilarity relative to human predictions - Griffiths and Tenen-
baum (2006) model: 9.628, neural model: 1.959. Neural model and human data are median predictions. Note: Griffiths and
Tenenbaum (2006) model data and human data were obtained from Figure 1B through a web plot digitizer.
curves in the Normalized Posterior ensemble, where 800 neu- lihood of the observed data, L(; X) (or equivalently the log-
rons provide the best match to human performance. Since likelihood for numerical stability):
the low-dimensional neural implementation can be made to n
match the human data, this is some evidence in support of = argmax L(; X) = argmax log p(xi , zi |).
the hypothesis that human brains represent low-dimensional i=1 zi
3135
and with respect to 2 : computation. We constructed the network using the NEF to
map function spaces into vector spaces and approximate the
log (p(zi |)/zi ) 1 necessary computations. We suggested a means of estimating
(zi )2 2 /4 .
= (21)
2 2
the prior for the life span task that can be implemented using
By linearity of differentiation, we then know that the deriva- these same methods.
tives of Q() are zero when: Notes Supplemental material (scripts and derivations) can
be found at https://github.com/ctn-waterloo/cogsci17-infer.
n
Q(|(t) )
= T (xi , zi )(zi )2 = 0 Acknowledgments
d i=1 zi
(22) This work was supported by CFI and OIT infrastructure fund-
ni=1 zi zi T (xi , zi )
= , and similarly: ing, the Canada Research Chairs program, NSERC Discov-
ni=1 zi T (xi , zi ) ery grant 261453, ONR grant N000141310419, AFOSR grant
n
FA8655-13-1-3084, OGS, and NSERC CGS-D.
Q(|(t) ) 1
= T (xi , zi ) (zi )2 2 /4 = 0
2
d i=1 zi 2 References
ni=1 zi (zi )2 T (xi , zi ) Bekolay, T., Bergstra, J., Hunsberger, E., DeWolf, T., Stewart,
2 = . T. C., Rasmussen, D., . . . Eliasmith, C. (2014). Nengo: a
ni=1 zi T (xi , zi )
(23) Python tool for building large-scale functional brain mod-
Finally, by the generalized Bayes rule, we know: els. Frontiers in neuroinformatics, 7, 48.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Max-
p(zi , xi |(t) ) imum likelihood from incomplete data via the EM algo-
T (xi , zi ) = p(zi |xi , (t) ) = ,
zi p(zi , xi |(t) ) rithm. Journal of the royal statistical society. Series B
(methodological), 138.
which we may compute via Eq. 1. We also note that since Doya, K. (2007). Bayesian brain: Probabilistic approaches
T () is a probability density function over zi , that: to neural coding. MIT press.
Eliasmith, C. (2013). How to build a brain: A neural archi-
n n
tecture for biological cognition. Oxford University Press.
T (xi , zi ) = 1 = n. Eliasmith, C., & Anderson, C. H. (2003). Neural engineer-
i=1 zi i=1
ing: Computation, representation, and dynamics in neuro-
Therefore, each EM iteration must make the update (t+1) = biological systems. MIT press.
((t+1) , (t+1) ), where: Eliasmith, C., & Martens, J. (2011). Normalization for prob-
abilistic inference with neurons. Biological cybernetics,
1 n zi zi p(zi , xi |(t) ) 104(4), 251262.
(t+1) = p(zi , xi |(t) )
n i=1 zi Gosmann, J. (2015). Precise multiplications with
s (24) the NEF [Technical Report]. University of Water-
1 n zi (zi (t+1) )2 p(zi , xi |(t) ) loo, Waterloo, Ontario, Canada. Retrieved from
(t+1) = .
n i=1 zi p(zi , xi |(t) ) http://dx.doi.org/10.5281/zenodo.35680
Griffiths, T. L., Chater, N., Norris, D., & Pouget, A. (2012).
This converges to some locally optimal estimate of the hyper- How the Bayesians got their beliefs (and what those beliefs
parameters. For initial (0) chosen sufficiently close to global actually are): comment on bowers and davis (2012).
optimum given by Eq. 17, this converges to the optimum. Griffiths, T. L., & Tenenbaum, J. B. (2006). Optimal predic-
This provides a tractable procedure for updating the prior. tions in everyday cognition. Psychological science, 17(9),
In particular, we begin with some initial guess at the hyper- 767773.
parameters, and then update them iteratively to better explain Jacobs, R. A., & Kruschke, J. K. (2011). Bayesian learning
the observed data. In practice only a few iterations are re- theory applied to human cognition. Wiley Interdisciplinary
quired (results not shown). Once we have an estimate of the Reviews: Cognitive Science, 2(1), 821.
hyperparameters (), we then know the prior p(ttotal |). This Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, A. (2006).
prior can be used directly by the previously described model Bayesian inference with probabilistic population codes.
to provide a good prediction. In fact, it is possible to run both Nature neuroscience, 9(11), 14321438.
the prior optimization and inference at the same time, and Schwartz, A. B., Kettner, R. E., & Georgopoulos, A. P.
both will become progressively more accurate over time. (1988). Primate motor cortex and free arm movements to
visual targets in three-dimensional space. I. Relations be-
Conclusions tween single cell discharge and direction of movement. The
We have presented a spiking neural network able to effec- Journal of Neuroscience, 8(8), 29132927.
tively perform Bayesian inference in a manner that more Xu, F., & Tenenbaum, J. B. (2007). Word learning as
accurately matches human behavior than an ideal Bayesian Bayesian inference. Psychological review, 114(2), 245.
3136