Beruflich Dokumente
Kultur Dokumente
MY SUBREDDITS
MACHINELEARNING
45
want to related
join? sign in or create an account in seconds | English
comments
search
thispostwassubmittedon04May2015
45points(87%upvoted)
shortlink: http://redd.it/34u0go
username
password
remember me
reset password
login
MachineLearning
subscribe
38,155 readers
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
1/11
6/4/2015
kunjaan
kanak
cavedave naive
olaf_nij
BeatLeJuce
about moderation team
<
>
discussions in /r/MachineLearning
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
2/11
6/4/2015
share
all 27 comments
sorted by: best
[] rcwll 19 points 1 month ago
3/11
6/4/2015
of research that extends well beyond ELMs and is definitely worth knowing about.
permalink
Simple is not stupid, in fact one could argue that backprop is stupid if it's
unnecessarily complex. Most of the current research into ANN does not respect
biological constraints of real neural networks under evolutionary pressure. Backprop
is simply not biologically plausible.
The idea behind ELM goes back to the 50s (see Wigner matrices, and also random
representations, random projections, sketches, SDM, Compressed sensing, random
weights, Monte-Carlo methods applied to neural networks, Correlation matrix
memories, Correlation in real neural networks). It's not just faster but you can build
deeper models with the same amount of resources. The SVD does the bulk of work.
There is a long thread of experimental evidence that suggests something similar is
going on in the nervous system.
The ELM author (Guang-Bin Huang) brought up this parable in his talk (in funny
English): "say, you want to fill up a Chinese lake with rocks until you get a horizontal
surface (but filled with stones instead of water), and you can see the bottom of the
empty lake , it's a curve (a function). There are two types of people to try fill the
lake, an engineer and a Chinese farmer. The engineer tries to carefully calculate the
size of the lake, and the sizes of stones to fill it, he thinks - How many stones do I
need to bring - 100, 500 or 1000 (hidden nodes)? And how would each stone fill a
certain portion of the lake without "spilling" beyond lake boundaries? Each stone's
size needs to be carefully tuned and the engineer spends a lot of time try to adjust
his cohort of stones to fill the lake optimally. Now a Chinese farmer comes, who
knows nothing about calculations and looks at the nearby Himalaya mountain and
decides to blow up the mountain and create a whole bunch of rocks of random size.
And then he picks up a random number of rocks of different size and starts throwing
them all over the lake until it is full. Now when the farmer picks up a stone (hidden
neuron), does he need to know what's the size of the lake or the size of the stones?
No - just use a random generator and spread the rocks all over. So which method is
faster? The engineer needs to calculate the dimensions of the lake, the height of
each stone, the number of stones and make sure the aggregate of multiple hidden
neurons matches the boundaries of the lake. On the other hand the farmer doesn't
care, if a large stone goes above the surface of the lake he simply smashes it with a
hammer (beta parameter), the number of rocks - 100 or 1000 doesn't matter much
either, and he gets the job done much faster."
permalink
parent
I agree, it's not 'stupid' if it works, that's why I kept putting the word 'stupid' in
scare quotes.
Though I'm not sure I agree that ELMs are simple, either; they do use a very
large number of hidden neurons, and regularizing the second layer very
important to getting good generalization out of them, which to me implies that
the first layer is in fact quite complex!
And yes, the theory behind them goes back much farther than the ELM; I
suppose I should have specified that the '92 paper is the first one I know of that
casts it in the form of a neural network problem, and is an almost exact match to
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
4/11
6/4/2015
ELMs. They discuss setting weights either to completely random values (as in the
initial ELM) or by setting weights to a random subsample of the data set (which is
similar to some of the later extensions to ELMs), and in either case solve the
resulting linear problem directly and show that the performance is competitive
with standard backprop. It's hardly a new idea, at any rate.
With respect to the deeper models, are you referring to the stacked ELM-AEs, or
is there some other method?
permalink
parent
parent
I've tried to clarify the point above. See Rolls' text on biologically plausible
learning.
permalink
parent
As far as I understand, the hasing trick in Vowpal Wabbit applies to feature names,
not values.
permalink
parent
The feature values get their value multiplied by +/-1 and added together if the
hashes (edit: hashes of the feature names, I should say) collide and both
features are populated for the given example. You can look at it as a sparse
random sign matrix; but it's always completely independent of the data, hence
the comparison to ELMs.
permalink
parent
parent
Yes, sorry. The way I wrote it was not clear; should have said "The
feature values get their value multiplied by +/-1 and added together if the
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
5/11
6/4/2015
parent
Really interesting how things get rediscovered over time. I thought everyone knew
about the hashing trick, but I guess that's my personal bias.
permalink
parent
6/11
6/4/2015
I think he's acknowledging their usefulness in SVM contexts but asks why in that
case you wouldn't just use SVM
permalink
parent
The reason you wouldn't just use an SVM is to avoid building the full kernel
matrix. When you have lots of data this is really expensive, but using random
features means you can easily train kernelized SVMs with SGD.
permalink
parent
Can you post a link to the source? Google doesn't seem to know it (yet). The quote
doesn't sound like it's from a private email.
An ELM is basically a 2-layer neural net in which the first layer is fixed and random,
and the second layer is trained. There is a number of issues with this idea.
I don't know if Yann's message predates the IEEE ELM paper. It shows ML-ELM (stacked
ELM-AEs and one ELM) getting 0.97% error on PI-MNIST after 444 seconds of CPUtraining.
permalink
https://www.facebook.com/yann.lecun/posts/10152872571572143
permalink
parent
The attached please find a new paper entitled: What are Extreme Learning
Machines? Filling the Gap between Frank Rosenblatts Dream and John von
Neumanns Puzzle. Although we have these thoughts in mind for several years, we
did not plan to publish it too early. In case you have received a malign and personal
attack letter which has been circulating recently, it may be necessary to share with
you this paper in response to this malign and personal attack timely. I would feel
deeply sorry for the inconveniences caused if any.
As many of you know, someone has been working very hard in order to destroy ELM
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
7/11
6/4/2015
research work in the past years. Recently 2-3 researchers have also broadcasted a
malign and personal attack letter.
This malign letter basically stated that ELM is the same as RVFL, RBF network, LSSVM, etc without referring to them. Actually this turns out to be misleading in the
sense that it tells other researchers that RVFL, RBF network, LS-SVM, etc are also
same. And the differences and relationships between ELM and these earlier works
have been discussed in some ELM papers as early as 2007.
This malign letter was originally submitted to an IEEE journal. Actually we
encouraged the journal to accept it (by removing malign sentences and keeping
technical contents) for publication together with a response from ELM community,
but the authors of the malign letter didn't agree the journal to have a response from
ELM community. It's a normal practice to have a response, the journal made great
effort, and agreed to accept their paper but to publish it with a response from ELM
community. After the anonymous authors read the response from us, they withdrew
the letter in the last minute as they knew that they would not be able to foolish other
researchers if published with a response from ELM community. They then
subsequently circulated it by unfortunately MISUSED other researchers names
including respectful renown pioneer Yao-Han Pao who passed away several years
ago, to thousands of researchers, various societies, editors of journals. The objective
is clear, its not for professional discussion, but just try to damage others and
destroy harmony academic and research environments. They even made rude
requests and asked journals not to accept ELM papers any more. Of course, their
objectives will not be achieved as we believe that research works are evaluated by
professional justification instead of malign letters. True researchers would not give
up finding the truth of nature even under such threatening. Otherwise, we would live
in a horrible academic and research environment, and every researcher could
become a target to malign and threaten.
Thus, in response to this malign and attack letter and abnormal behavior, I was
invited by numerous researchers to write a paper to explain the entire ideas and
philosophy behind ELM. It also explains why ELM resembles biological learning. I
tried my best to write it in a positive way, and wish any discussion over academic
research could be made in a peaceful and professional way. Different opinions,
academic criticisms, and professional controversy are welcome indeed, in this case,
we can brainstorm and have motivation to discover the truth of nature.
In the future, you may wish to add a non-preferable reviewers list when you submit
your paper related to ELM. If you face inconvenience on unfair comments, you may
refer to the attached paper to support your work if necessary.
(The attached also please find the other two papers which show how ELM works for
multi-layer cases in contrast to the only single hidden layer cases mentioned in the
anonymous malign letter.)
1. What are Extreme Learning Machines? Filling the Gap between Frank
Rosenblatts Dream and John von Neumanns Puzzle
2. Extreme Learning Machine for Multilayer Perceptron
3. Local Receptive Fields Based Extreme Learning Machine
permalink
parent
Welcome to Reddit! You can't attach files to comments here. You have to submit
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
8/11
6/4/2015
parent
http://www.reddit.com/r/MachineLearning/comments/3500xx/what_are_extreme_le
permalink
parent
When you say that bit-float multiplication is 5 times faster, you are comparing to a
naive implementation, right?
How many bit-float multiplications do you do per second (when you do matrix-matrix
products) ?
permalink
parent
parent
You use full-batch LBFGS, so you must need a matrix-matrix product (which,
for float32-float32 products is compute-bound for good implementations).
Doing a matrix product as a sequence of matrix-vector products would be
much slower.
Are you saying that the above changes with bit-float32 products?
permalink
parent
To me, it is just a random feature expansion. It is actually very similar to doing kernel
approximation with kitchen sinks, only without a theoretical motivation. But I am
worried about crediting the ELM authors with using random weights. There has been
earlier and better work, and ELM is mostly published in "barely peer-reviewed" journals.
We are now adding it so scikit-learn, because it is simple and often works. But the
2000+ citation "Extreme learning machine" paper has very faulty math and I don't like
to give any credit to it. Theorem 2.1 says "we will now prove that this one-dimensional
vector doesn't lie in any N-1 dimensional subspace". The statement of Theorem 2.2 is
"any epsilon > 0 is positive".
See yourself: http://www.di.unito.it/~cancelli/retineu11_12/ELM-NC-2006.pdf
permalink
9/11
6/4/2015
parent
I really don't have a problem with "no theoretical justification". Imho, random
forests don't have a decent one. I'm not sure why people trust them more than
neural nets. What I dislike is publishing clearly made-up mathematics, pretending
to do science, and people citing it. That is clearly unethical, as you say.
permalink
parent
In literature, many neural networks such as CNN and random forest do have
theoretical justification, however, ELM's universal approximation and
classification capabilities have been proved rigorously. G.-B. Huang, L. Chen and
C.-K. Siew, Universal Approximation Using Incremental Constructive
Feedforward Networks with Random Hidden Nodes, IEEE Transactions on Neural
Networks, vol. 17, no. 4, pp. 879-892, 2006. More proofs have made on its
variants Random Kitchen Sink.
permalink
parent
about
help
apps&tools
<3
blog
about
values
team
source code
site rules
FAQ
wiki
reddiquette
transparency
reddit gold
reddit store
redditgifts
reddit.tv
radio reddit
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
10/11
6/4/2015
advertise
jobs
contact us
Use of this site constitutes acceptance of our User Agreement and Privacy Policy {Genitive}. 2015 reddit inc. All rights reserved.
REDDIT and the ALIEN Logo are registered trademarks of reddit inc.
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
11/11