Yann LeCun - What's So Great About - Extreme Learning Machines - MachineLearning

6/4/2015
MY SUBREDDITS
Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

FRONT - ALL - RANDOM | GADGETS - SPORTS - GAMING - PICS - WORLDNEWS - VIDEOS - ASKREDDIT - AWW
MORE
- MUSIC
MACHINELEARNING
45
want to related
join? sign in or create an account in seconds | English
comments
Yann LeCun: What's so great

about "Extreme Learning
Machines"? (self.MachineLearning)
submitted 1 month ago by downtownslim
Yann LeCun's thoughts about

ELMs:
What's so great about
"Extreme Learning
Machines"?
There is an interesting
sociological phenomenon
taking place in some
corners of machine learning
right now. A small research
community, largely
centered in China, has
rallied around the concept
of "Extreme Learning
Machines".
Frankly, I don't understand
what's so great about ELM.
Would someone please care
to explain?
An ELM is basically a 2layer neural net in which
the first layer is fixed and
random, and the second
layer is trained. There is a
number of issues with this
idea.
First, the name: an ELM is
exactly what Minsky &
Papert call a Gamba
Perceptron (a Perceptron
whose first layer is a bunch
of linear threshold units).
The original 1958
Rosenblatt perceptron was
an ELM in that the first
layer was randomly
connected.
Second, the method:
search
thispostwassubmittedon04May2015
45points(87%upvoted)
shortlink: http://redd.it/34u0go
username
password
remember me
reset password
login
Submit a new link

Submit a new text post
MachineLearning
subscribe
38,155 readers
~53 users here now

News, Research Papers, Videos, Lectures,
Softwares and Discussions on:
Machine Learning
Data Mining
Information Retrieval
Predictive Statistics
Learning Theory
Search Engines
Pattern Recognition
Analytics
Journal Club
Current Discussion Thread
Current Voting Thread
Current Simple Question Thread
AMAs:
Andrew Ng and Adam Coates (4/15/2015)
Jrgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Beginners:
Please have a look at our FAQ and LinkCollection
Metacademy is a great resource which compiles
lesson plans on popular machine learning topics.
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
1/11
6/4/2015
connecting the first layer

randomly is just about the
stupidest thing you could
do. People have spent the
almost 60 years since the
Perceptron to come up with
better schemes to nonlinearly expand the
dimension of an input
vector so as to make the
data more separable (many
of which are documented in
the 1974 edition of Duda &
Hart). Let's just list a few:
using families of basis
functions such as
polynomials, using "kernel
methods" in which the basis
functions (aka neurons) are
centered on the training
samples, using clustering or
GMM to place the centers of
the basis functions where
the data is (something we
used to call RBF networks),
and using gradient descent
to optimize the position of
the basis functions (aka a
2-layer neural net trained
with backprop).
Setting the layer-one
weights randomly (if you do
it in an appropriate way)
can possibly be effective if
the function you are trying
to learn is very simple, and
the amount of labelled data
is small. The advantages
are similar to that of an
SVM (though to a lesser
extent): the number of
parameters that need to be
trained supervised is small
(since the first layer is
fixed) and easily
regularized (since they
constitute a linear
classifier). But then, why
You could also try /r/mlclass or /r/MLQuestions

Related Subreddit :
Statistics
Computer Vision
Compressive Sensing
NLP
Jobs
/r/datacleaning
created by kunjaan
a community for 5 years
discuss this ad on reddit

MODERATORS
message the moderators
kunjaan
kanak
cavedave naive
olaf_nij
BeatLeJuce
about moderation team
<
>
discussions in /r/MachineLearning
Using Amazon Machine Learning to

Predict the Weather
2/11
6/4/2015
not use an SVM or an RBF

net in the first place?
There may be a very
narrow area of simple
classification problems with
small datasets where this
kind of 2-layer net with
random first layer may
perform OK. But you will
never see them beat
records on complex tasks,
such as ImageNet or
speech recognition.
http://www.extremelearning-machines.org/
27 comments
share
pocket
all 27 comments
sorted by: best
[] rcwll 19 points 1 month ago
Just to add to kjearns comments;

I don't know if I'd characterize them as "great," but they're certainly interesting. They're
the proverbial "stupidest thing you can do that actually kind of works." And despite the
fact that they are 'stupid,' they actually do produce good (but no, not best-in-class)
results, even (pace Prof. LeCunn) on complex tasks like (MNIST)
[http://arxiv.org/abs/1412.8307] . Although some version of this has been known since
at least 1992 (Schmidt, Kraaijveld, and Duin; "Feed Forward Neural Networks with
Random Weights", earliest reference to this that I know of).
Which forces the question "If we can get 90% of the performance of a very complex
classifier from a very stupid one, then what exactly is doing the bulk of the hard work?"
But despite the fact that the name seems carefully designed to prevent people from
taking them seriously, the general class of methods -- constructing a completely random
transformation completely independent of whatever data you're about to apply it to,
which is not further tuned, and then classifying based on that transformation -- is well
established and much better explored under other names elsewhere. Look at the "hash
trick" in vowpal wabbit ; that's just applying a random (sparse) matrix to your inputs
and then doing linear classification. There's an entire bibliography of very good technical
papers on (random features)[http://nuitblanche.blogspot.com/search/label/RandomFeatures] on (Nuit-blanche)[http://nuitblanche.blogspot.com] which often discuss it in terms of compressive sensing: if you can
recover an entire image from completely random features, under certain conditions, then
recovering a much smaller amount of information, a single bit for binary classification,
should be even easier.
They aren't really the brilliant new paradigm that the ELM people seem to think that it is,
it's something that's been independently discovered in some form at least four other
times I can think of (Schmidt et al and hash kernels above, the "no-prop" algorithm, and
"smashed filters"). But the surprising effectiveness of random features is an active field
3/11
6/4/2015
of research that extends well beyond ELMs and is definitely worth knowing about.
permalink
pocket
[] saigeco 4 points 1 month ago*
Simple is not stupid, in fact one could argue that backprop is stupid if it's
unnecessarily complex. Most of the current research into ANN does not respect
biological constraints of real neural networks under evolutionary pressure. Backprop
is simply not biologically plausible.
The idea behind ELM goes back to the 50s (see Wigner matrices, and also random
representations, random projections, sketches, SDM, Compressed sensing, random
weights, Monte-Carlo methods applied to neural networks, Correlation matrix
memories, Correlation in real neural networks). It's not just faster but you can build
deeper models with the same amount of resources. The SVD does the bulk of work.
There is a long thread of experimental evidence that suggests something similar is
going on in the nervous system.
The ELM author (Guang-Bin Huang) brought up this parable in his talk (in funny
English): "say, you want to fill up a Chinese lake with rocks until you get a horizontal
surface (but filled with stones instead of water), and you can see the bottom of the
empty lake , it's a curve (a function). There are two types of people to try fill the
lake, an engineer and a Chinese farmer. The engineer tries to carefully calculate the
size of the lake, and the sizes of stones to fill it, he thinks - How many stones do I
need to bring - 100, 500 or 1000 (hidden nodes)? And how would each stone fill a
certain portion of the lake without "spilling" beyond lake boundaries? Each stone's
size needs to be carefully tuned and the engineer spends a lot of time try to adjust
his cohort of stones to fill the lake optimally. Now a Chinese farmer comes, who
knows nothing about calculations and looks at the nearby Himalaya mountain and
decides to blow up the mountain and create a whole bunch of rocks of random size.
And then he picks up a random number of rocks of different size and starts throwing
them all over the lake until it is full. Now when the farmer picks up a stone (hidden
neuron), does he need to know what's the size of the lake or the size of the stones?
No - just use a random generator and spread the rocks all over. So which method is
faster? The engineer needs to calculate the dimensions of the lake, the height of
each stone, the number of stones and make sure the aggregate of multiple hidden
neurons matches the boundaries of the lake. On the other hand the farmer doesn't
care, if a large stone goes above the surface of the lake he simply smashes it with a
hammer (beta parameter), the number of rocks - 100 or 1000 doesn't matter much
either, and he gets the job done much faster."
permalink
parent
pocket
[] rcwll 3 points 29 days ago
I agree, it's not 'stupid' if it works, that's why I kept putting the word 'stupid' in
scare quotes.
Though I'm not sure I agree that ELMs are simple, either; they do use a very
large number of hidden neurons, and regularizing the second layer very
important to getting good generalization out of them, which to me implies that
the first layer is in fact quite complex!
And yes, the theory behind them goes back much farther than the ELM; I
suppose I should have specified that the '92 paper is the first one I know of that
casts it in the form of a neural network problem, and is an almost exact match to
4/11
6/4/2015
ELMs. They discuss setting weights either to completely random values (as in the
initial ELM) or by setting weights to a random subsample of the data set (which is
similar to some of the later extensions to ELMs), and in either case solve the
resulting linear problem directly and show that the performance is competitive
with standard backprop. It's hardly a new idea, at any rate.
With respect to the deeper models, are you referring to the stacked ELM-AEs, or
is there some other method?
permalink
parent
pocket
[] remington_steele 2 points 29 days ago
Can you imagine this slug[2] doing something equivalent to differential

calculus with its humble 20k neurons?
This is one of the silliest arguments I hear. I don't think sub-atomic particles are
"doing" differential geometry to describe their quantum field behavior any more
than the slug is "doing" differential calculus. That is just the math we use to
describe the process. I have no idea why "arguments" like this are as prevalent
as they are in scientific discourse (or even pop pseudo-science discourse).
permalink
parent
pocket
[] saigeco 2 points 29 days ago*
I've tried to clarify the point above. See Rolls' text on biologically plausible
learning.
permalink
parent
pocket
[] Foxtr0t 2 points 1 month ago
As far as I understand, the hasing trick in Vowpal Wabbit applies to feature names,
not values.
permalink
parent
pocket
[] rcwll 2 points 1 month ago*
The feature values get their value multiplied by +/-1 and added together if the
hashes (edit: hashes of the feature names, I should say) collide and both
features are populated for the given example. You can look at it as a sparse
random sign matrix; but it's always completely independent of the data, hence
the comparison to ELMs.
permalink
parent
pocket
[] remington_steele 3 points 29 days ago
I'm confused - it seems to be that /u/Foxtr0t is correct (and it was also my

understanding of VW's hash trick that it's purely on the feature names). For
one, VW doesn't even store the feature names in a set, so even detecting
collisions is out of the question. Then there's this page that says the same
thing about only hashing feature names. The only ANDing that's done is to
reduce the bit depth of the default hash function.
Even the paper you linked pretty much only discusses hashing the feature
names, not the values.
permalink
parent
pocket
[] rcwll 3 points 29 days ago
Yes, sorry. The way I wrote it was not clear; should have said "The
feature values get their value multiplied by +/-1 and added together if the
5/11
6/4/2015
hashes of their names collide..."

Check out the bottom of the summation in equation 3 in the paper; it's
basically saying "To construct the i-th hash feature, add up all of the
original features that hash to a value of i". You hash the feature names to
tell you 'where to put' the feature values in the reduced space. When two
original features have the same hash value (e.g. a collision), you just add
them together.
The +/- 1 thing, incidentally, comes from a different paper (Weinberger,
Dasgupta, Langford, Smola, and Attenberg; "Feature hashing for large
scale multitask learning." ICML 2009). They pick the sign randomly from
another hash function (and not based on whether there's a collision, sorry
if that was ambiguous in the way I phrased it), which helps reduce the
impact of any potential collisions on the distortion of the inner product.
Imagine you've got the original data as row vectors; you're multiplying it
by a random matrix with exactly 1 nonzero element per row, where
element ij of that matrix is nonzero (either 1 or +/-1 with 50%
probability, depending on which implementation you use) if h(i) = j, and 0
otherwise. It's essentially building a random assignment of elements in
the original feature vector to the reduced feature vector and adding them
up. The hash function lets you do it in a fast and memory efficient
manner, but the two operations produce the same output. Equation 2 of
the Weinberger paper is maybe clearer.
permalink
parent
pocket
[] GibbsSamplePlatter 3 points 1 month ago
Really interesting how things get rediscovered over time. I thought everyone knew
about the hashing trick, but I guess that's my personal bias.
permalink
parent
pocket
[] kjearns 20 points 1 month ago*
I think LeCun is too negative here.

Random features have been reasonably successful in the kernel world, where they give
you a tractable and theoretically sound way to approximate the primal problem for a
non-linear kernel machine. This is important because the main obstacle to scaling kernel
methods working with the (#data)x(#data) kernel matrix which does not appear in the
primal problem. This was originally done by Rahimi and Recht in Random Kitchen Sinks
(http://www.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf) and later made
even faster by Alex Smola in Fast Food
(http://cs.stanford.edu/~quocle/LeSarlosSmola_ICML13.pdf). An example of using
random features to apply kernel methods to very large data sets can be seen in Le
Song's Doubly Stochastic Gradients paper (http://arxiv.org/abs/1407.5599).
However, I think that even if LeCun is overly negative about random features, he is
correct about them being less powerful than learned features. You can see this in the
Deep Fried Convnets paper (http://arxiv.org/abs/1412.7149) which looks at both
ordinary (random features) and adaptive (learned features) versions of Fast Food. The
differences are invisible on MNIST, but substantial on ImageNet. I think this points
towards the limits of what is possible with random features, and indicates that even
though sometimes random features work surprisingly well, learned features are
6/11
6/4/2015
genuinely more powerful.

Another reason to think random features won't scale are that this idea never caught on:
http://www.robotics.stanford.edu/~ang/papers/nipsdlufl10-RandomWeights.pdf. That
paper shows that using random weights with a linear classifier on top works well to
compare shallow network architectures. This is great when it works but If you try to do
something similar with a modern imagenet network it fails horribly (you can see it fail
horribly in figure 3 of this paper http://arxiv.org/abs/1411.1792).
permalink
pocket
[] letitgo12345 6 points 1 month ago
I think he's acknowledging their usefulness in SVM contexts but asks why in that
case you wouldn't just use SVM
permalink
parent
pocket
[] kjearns 10 points 1 month ago
The reason you wouldn't just use an SVM is to avoid building the full kernel
matrix. When you have lots of data this is really expensive, but using random
features means you can easily train kernelized SVMs with SGD.
permalink
parent
pocket
[] 5at 6 points 1 month ago
Can you post a link to the source? Google doesn't seem to know it (yet). The quote
doesn't sound like it's from a private email.
An ELM is basically a 2-layer neural net in which the first layer is fixed and random,
and the second layer is trained. There is a number of issues with this idea.
I don't know if Yann's message predates the IEEE ELM paper. It shows ML-ELM (stacked
ELM-AEs and one ELM) getting 0.97% error on PI-MNIST after 444 seconds of CPUtraining.
permalink
pocket
[] clbam8 2 points 1 month ago
https://www.facebook.com/yann.lecun/posts/10152872571572143
permalink
parent
pocket
[] ml456 6 points 1 month ago
There is a complaint on ELM

http://theanonymousemail.com/view/?msg=ZHEZJ1AJ
The posting is from Yann's facebook
https://www.facebook.com/yann.lecun/posts/10152872571572143
permalink
pocket
[] in0o1li 6 points 29 days ago
The attached please find a new paper entitled: What are Extreme Learning
Machines? Filling the Gap between Frank Rosenblatts Dream and John von
Neumanns Puzzle. Although we have these thoughts in mind for several years, we
did not plan to publish it too early. In case you have received a malign and personal
attack letter which has been circulating recently, it may be necessary to share with
you this paper in response to this malign and personal attack timely. I would feel
deeply sorry for the inconveniences caused if any.
As many of you know, someone has been working very hard in order to destroy ELM
7/11
6/4/2015
research work in the past years. Recently 2-3 researchers have also broadcasted a
malign and personal attack letter.
This malign letter basically stated that ELM is the same as RVFL, RBF network, LSSVM, etc without referring to them. Actually this turns out to be misleading in the
sense that it tells other researchers that RVFL, RBF network, LS-SVM, etc are also
same. And the differences and relationships between ELM and these earlier works
have been discussed in some ELM papers as early as 2007.
This malign letter was originally submitted to an IEEE journal. Actually we
encouraged the journal to accept it (by removing malign sentences and keeping
technical contents) for publication together with a response from ELM community,
but the authors of the malign letter didn't agree the journal to have a response from
ELM community. It's a normal practice to have a response, the journal made great
effort, and agreed to accept their paper but to publish it with a response from ELM
community. After the anonymous authors read the response from us, they withdrew
the letter in the last minute as they knew that they would not be able to foolish other
researchers if published with a response from ELM community. They then
subsequently circulated it by unfortunately MISUSED other researchers names
including respectful renown pioneer Yao-Han Pao who passed away several years
ago, to thousands of researchers, various societies, editors of journals. The objective
is clear, its not for professional discussion, but just try to damage others and
destroy harmony academic and research environments. They even made rude
requests and asked journals not to accept ELM papers any more. Of course, their
objectives will not be achieved as we believe that research works are evaluated by
professional justification instead of malign letters. True researchers would not give
up finding the truth of nature even under such threatening. Otherwise, we would live
in a horrible academic and research environment, and every researcher could
become a target to malign and threaten.
Thus, in response to this malign and attack letter and abnormal behavior, I was
invited by numerous researchers to write a paper to explain the entire ideas and
philosophy behind ELM. It also explains why ELM resembles biological learning. I
tried my best to write it in a positive way, and wish any discussion over academic
research could be made in a peaceful and professional way. Different opinions,
academic criticisms, and professional controversy are welcome indeed, in this case,
we can brainstorm and have motivation to discover the truth of nature.
In the future, you may wish to add a non-preferable reviewers list when you submit
your paper related to ELM. If you face inconvenience on unfair comments, you may
refer to the attached paper to support your work if necessary.
(The attached also please find the other two papers which show how ELM works for
multi-layer cases in contrast to the only single hidden layer cases mentioned in the
anonymous malign letter.)
1. What are Extreme Learning Machines? Filling the Gap between Frank
Rosenblatts Dream and John von Neumanns Puzzle
2. Extreme Learning Machine for Multilayer Perceptron
3. Local Receptive Fields Based Extreme Learning Machine
permalink
parent
pocket
[] 5at 1 point 29 days ago
Welcome to Reddit! You can't attach files to comments here. You have to submit
8/11
6/4/2015
them as links, separately, from /r/MachineLearning

I have a question though: the AE in H-ELM is trained iteratively, so how can HELM be faster than ML-ELM?
permalink
parent
pocket
[] kourouklides 1 point 28 days ago
Here is the Reddit link to the attached file he is referring to:
http://www.reddit.com/r/MachineLearning/comments/3500xx/what_are_extreme_le
permalink
parent
pocket
[] godspeed_china 3 points 1 month ago
please see my RBR paper: http://arxiv.org/abs/1501.02990 large number of random

feature works well.
permalink
pocket
[] 5at 3 points 1 month ago*
When you say that bit-float multiplication is 5 times faster, you are comparing to a
naive implementation, right?
How many bit-float multiplications do you do per second (when you do matrix-matrix
products) ?
permalink
parent
pocket
[] godspeed_china 3 points 1 month ago
yes, compare to naive implementation, SSE based bit-float multiplication is 5

times faster. I learn this trick from
http://stackoverflow.com/questions/16051365/fast-dot-product-of-a-bit-vectorand-a-floating-point-vector It is a vector-matrix multiplication, thus is memory
IO bounded, AVX instruction can not provide more speed.
permalink
parent
pocket
[] 5at 3 points 1 month ago
You use full-batch LBFGS, so you must need a matrix-matrix product (which,
for float32-float32 products is compute-bound for good implementations).
Doing a matrix product as a sequence of matrix-vector products would be
much slower.
Are you saying that the above changes with bit-float32 products?
permalink
parent
pocket
[] t3kcit 2 points 29 days ago
To me, it is just a random feature expansion. It is actually very similar to doing kernel
approximation with kitchen sinks, only without a theoretical motivation. But I am
worried about crediting the ELM authors with using random weights. There has been
earlier and better work, and ELM is mostly published in "barely peer-reviewed" journals.
We are now adding it so scikit-learn, because it is simple and often works. But the
2000+ citation "Extreme learning machine" paper has very faulty math and I don't like
to give any credit to it. Theorem 2.1 says "we will now prove that this one-dimensional
vector doesn't lie in any N-1 dimensional subspace". The statement of Theorem 2.2 is
"any epsilon > 0 is positive".
See yourself: http://www.di.unito.it/~cancelli/retineu11_12/ELM-NC-2006.pdf
permalink
pocket
[] compsens 1 point 29 days ago*

9/11
6/4/2015
I am myself torn on two accounts with regards to this issue:

First, it is obvious that the marketing strategy has worked and that over and over
again, we never see theoretical justification for it (I am speaking as the person who
writes a small blog on compressive sensing). So passed the issue with how that
marketing has been done (and I agree it may, from the outside looking in, look like
it is not compatible with current ethical standards) and the lack of theoretical
justification, I am also seeing something that is below most of the radars that is the
subject of the second account, namely: There is growing body of literature that takes
a look at a very diverse set of problems "in "barely peer-reviewed" journals" and for
which this technique achieves more than reasonable results.
So if we are to judge its efficacy, it seems to me wrong to judge it from the
standpoint of current vision benchmarks for which enormous funding goes to (in the
form of CNNs/Deep Learning effort undertaken by internet giants and leading
universities). If we are to judge it with a different threshold in mind (because say,
not everybody is pouring money into classifying your hyperspectral images or the
temperature readings of the latest ballbearings assembly in your homemade rotor)
the technique ought to be a good first approximation while waiting for further
refinement. For that reason alone, it ought to be part of scikit-learn even though I
realize there is a problem is trying to figure out how to reference it.
With regards to the "no theoretical justification", let's be charitable here in that quite
a few techniques in ML do not rise to a high level of scrutiny there: CNNs and
Random Forest being prominent in that category :-)
permalink
parent
pocket
[] t3kcit 1 point 28 days ago
I really don't have a problem with "no theoretical justification". Imho, random
forests don't have a decent one. I'm not sure why people trust them more than
neural nets. What I dislike is publishing clearly made-up mathematics, pretending
to do science, and people citing it. That is clearly unethical, as you say.
permalink
parent
pocket
[] songofsong 1 point 21 days ago
In literature, many neural networks such as CNN and random forest do have
theoretical justification, however, ELM's universal approximation and
classification capabilities have been proved rigorously. G.-B. Huang, L. Chen and
C.-K. Siew, Universal Approximation Using Incremental Constructive
Feedforward Networks with Random Hidden Nodes, IEEE Transactions on Neural
Networks, vol. 17, no. 4, pp. 879-892, 2006. More proofs have made on its
variants Random Kitchen Sink.
permalink
parent
pocket
about
help
apps&tools
<3
blog
about
values
team
source code
site rules
FAQ
wiki
reddiquette
transparency
Alien Blue iOS app

reddit AMA app
mobile beta
buttons
reddit gold
reddit store
redditgifts
reddit.tv
radio reddit
10/11
6/4/2015
advertise
jobs
contact us
Use of this site constitutes acceptance of our User Agreement and Privacy Policy {Genitive}. 2015 reddit inc. All rights reserved.
REDDIT and the ALIEN Logo are registered trademarks of reddit inc.
11/11

Yann LeCun - What's So Great About - Extreme Learning Machines - MachineLearning

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Yann LeCun - What's So Great About - Extreme Learning Machines - MachineLearning

Hochgeladen von

Copyright:

Verfügbare Formate

6/4/2015

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

Yann LeCun: What's so great

Yann LeCun's thoughts about

Submit a new link

~53 users here now

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

connecting the first layer

You could also try /r/mlclass or /r/MLQuestions

a community for 5 years

discuss this ad on reddit

message the moderators

Using Amazon Machine Learning to

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

not use an SVM or an RBF

Just to add to kjearns comments;

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

[] saigeco 4 points 1 month ago*

[] rcwll 3 points 29 days ago

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

[] remington_steele 2 points 29 days ago

Can you imagine this slug[2] doing something equivalent to differential

[] saigeco 2 points 29 days ago*

[] Foxtr0t 2 points 1 month ago

[] rcwll 2 points 1 month ago*

[] remington_steele 3 points 29 days ago

I'm confused - it seems to be that /u/Foxtr0t is correct (and it was also my

[] rcwll 3 points 29 days ago

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

hashes of their names collide..."

[] GibbsSamplePlatter 3 points 1 month ago

[] kjearns 20 points 1 month ago*

I think LeCun is too negative here.

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

genuinely more powerful.

[] letitgo12345 6 points 1 month ago

[] kjearns 10 points 1 month ago

[] 5at 6 points 1 month ago

[] clbam8 2 points 1 month ago

[] ml456 6 points 1 month ago

There is a complaint on ELM

[] in0o1li 6 points 29 days ago

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

[] 5at 1 point 29 days ago

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

them as links, separately, from /r/MachineLearning

[] kourouklides 1 point 28 days ago

Here is the Reddit link to the attached file he is referring to:

[] godspeed_china 3 points 1 month ago

please see my RBR paper: http://arxiv.org/abs/1501.02990 large number of random

[] 5at 3 points 1 month ago*

[] godspeed_china 3 points 1 month ago

yes, compare to naive implementation, SSE based bit-float multiplication is 5

[] 5at 3 points 1 month ago

[] t3kcit 2 points 29 days ago

[] compsens 1 point 29 days ago*

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

I am myself torn on two accounts with regards to this issue:

[] t3kcit 1 point 28 days ago

[] songofsong 1 point 21 days ago

Alien Blue iOS app

Yann LeCun: What's so great about "Extreme Learning Machines"? : MachineLearning

Das könnte Ihnen auch gefallen