Sie sind auf Seite 1von 5

Large-Scale

Large-Scale
Machine
Machine Learning
Learning
John
JohnLangford
Langford
Microsoft
MicrosoftResearch
Research

Yann LeCun

Yann
YannLeCun
LeCun
Courant
CourantInstitute
Institute

What
WhatisisData
DataScience?
Science?
Data Science: automatically extracting knowledge from data
Mathematics & Statistics
Machine Learning
Domain Expertise
Applications in Business
Lots and lots
Applications in the Sciences
Astronomy, Cosmology
High-energy Physics
Biology, Genomics
Neuroscience
The Social Sciences

Mathematics &

Machine

Statistics

Learning

Computation

Data
Science
conventional

Danger

research

Zone!

Domain Expertise

Medicine
Government

Yann LeCun

[afterDrewConway'sDataScienceVennDiagram]

Large
LargeScale
ScaleMachine
MachineLearning
Learning
Class website:
http://cilvr.cs.nyu.edu/doku.php?id=courses:bigdata:start
http://cilvr.cs.nyu.edu courses big data
Forum, discussion, Q&A on Piazza
https://piazza.com/class#spring2013/csciga3033002
Evaluation:
Programming assignments
Project
Final exam
Computing infrastructure
100-node cluster, 8 CPUs/node, Hadoop (donated by Yahoo! Labs)
Software
Torch: http://www.torch.ch/
Vowpal Wabbit:
https://github.com/JohnLangford/vowpal_wabbit/wiki
Yann LeCun

Big
BigData?
Data?
Data often comes to in the form of a table
N: dimension of each vector (possibly very sparse)
T: number of training samples (possibly infinite)
Big Data is large T, or large N, or both
Large T, small N: great!
Infinite T, small N: on-line / streaming
Small T, large N: hell!
Problems:
(distributed) data storage and access
can't use algo super-linear in T
Large N: overfitting
T
Parallelizing
Dealing with unbalanced set
Representing high-dim data

Yann LeCun

Intro
Online Linear learning

Syllabus
Syllabus

2nd order optimization methods


LBFGS
Online Non-linear learning
Boosted Decision Trees
Hadoop, Allreduce
Parallel learning, OpenMP, CUDA
Inverted Indicies & Predictive Indexing
Hashing, LSH, linear/non-linear dimensionality reduction
Feature Learning, deep learning
Many Classes
Active Learning
Exploration and Learning
Yann LeCun

Das könnte Ihnen auch gefallen