Online Learning With Stream Mining

Online Learning with Stream Mining
Mikio L. Braun, @mikiobraun http://blog.mikiobraun.de TWIMPACT http://twimpact.com
Machine Learning Meetup San Francisco, April 24, 2013 (c) 2013 by TWIMPACT
Event Data
Finance Gaming Monitoring
Advertisment
Sensor Networks
Social Media
Attribution: flickr users kenteegardin, fguillen, torkildr, Docklandsboy, brewbooks, ellbrown, JasonAHowie Machine Learning Meetup San Francisco, April 24, 2013 (c) 2013 by TWIMPACT
Online Learning
Isn't all learning online?

Isn't Machine Learning easily Online?
Stochastic gradient descent
converges, e.g. if
http://leon.bottou.org/research/stochastic
Online vs Batch: Non-stationarities
Very good tutorial by Albert Bifet et al. on these issues at http://sites.google.com/site/advancedstreamingtutorial

Time horizons vs. Learning rate
You can't just do online learning on event data!

Also, Event Data is huge
The problem: You easily get A LOT OF DATA!

100 events per second 360k events per hour 8.6M events per day 260M events per month 3.2B events per year
So, online learning challenges:

So much data! Concept Drift Online (as in not batch) is not the whole story.
Digging into Least Squares

Idea: Batch method like least squares on recent portion of the data.
this could be huge!
d with entries d d x d is probably ok
But: It's just a sum!
Another Problem: High-dimensional Spaces
Potentially large spaces:

distinct words: >100k IP addresses: >100M users in a social network: >10M
http://wordle.net
http://www.flickr.com/photos/arenamontanus/269158554/
Stream Mining to the rescue
Stream mining algorithms:
answer stream queries with finite resources how often does an item appear in a stream? how many distinct elements are in the stream? what are the top-k most frequent items?
Continuous Stream of Data
Typical examples:

Bounded Resource Analyzer
Stream Queries
The Trade-Off
Big Data
Stream Mining Map Reduce and friends
Fast
Exact
First seen here: http://www.slideshare.net/acunu/realtime-analytics-with-apache-cassandra Machine Learning Meetup San Francisco, April 24, 2013 (c) 2013 by TWIMPACT
Heavy Hitters (a.k.a. Top-k)
Count activities over large item sets (millions, even more, e.g. IP addresses, Twitter users) Interested in most active elements only.
Case 1: element already in data base 142 142 12 132 142 432 553 712 023 15 12 8 5 3 2 713 3 Case 2: new element 713 023 2 13
Fixed tables of counts
Metwally, Agrawal, Abbadi, Efficient computation of Frequent and Top-k Elements in Data Streams, Internation Conference on Database Theory, 2005
Count-Min Sketches

Summarize histograms over large feature sets Like bloom filters, but better
m bins 0 1 0 2 0 1 5 4 3 0 3 5 0 2 2 0 0 0 1 0 2 3 3 2 0 5 7 0 0 2 3 8 n different hash functions
Query result: 1
Updates for new entry
Query: Take minimum over all hash functions

G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. LATIN 2004, J. Algorithm 55(1): 58-75 (2005) .
Clustering with count-min Sketches
Online clustering
For each data point:

Map to closest centroid ( compute distances) Update centroid
count-min sketches to represent sum over all vectors in a class
0 1 0 2
0 1 5 4
3 0 3 5
0 2 2 0
0 0 1 0
2 3 3 2
0 5 7 0
0 2 3 8
Aggarwal, A Framework for Clustering Massive-Domain Data Streams, IEEE International Conference on Data Engineering , 2009
Heavy Hitters over Time-Window
Time
Keep quite a big log (a month?) Constant write/erase in database Alternative: Exponential decay
DB
Exponential Decay
Instead of a fixed window, use exponential timestamp decay

score halftime
The beauty: updates are recursive
time shift term

Exponential Decay
Collect stats by a table of expdecay counters

counters[item] ts[item] # counters # last timestamp
update(C, item, timestamp, count) update counts

C.counters[item] = count + weight(timestamp, C.ts[item]) * C.counters[item] C.ts[item] = timestamp C.lastupdate = timestamp
score(C, item) return score

return weight(C.lastupdate, C.ts[item]) * C.counters[item]
Least Squares Revisited

Need to compute For each

do
Then, reconstruct
As a reminder:
More: Maximum-Likelihood
Estimate probabilistic models
based on
which is slightly biased, but simpler
But wait, how do I 1/n with randomly spaced events?
Outlier detection
Once you have a model, you can compute p-values (based on recent time frames!)
TF-IDF
estimate word document frequencies
for each word: update(word, t, 1.0) for each document: update(#docs, t, 1.0) query: score(word) / score(#docs)
Extracting a relevant subset
Classification with Nave Bayes
Naive Bayes is also just counting, right?

frequency of word in document Number of times word appears in class
class priors
Priors
Multinomnial nave Bayes
Total number of words in class
Classification with Naive Bayes
ICML 2003
Classification with Naive Bayes
7 Steps to improve NB:

transform TF to log( . + 1) IDF-style normalization square length normalization use complement probability another log normalize those weights again Predict linearly using those weights
What about non-parametric methods and Kernel Methods?
Problem here, no real accumulation of information in statistics, e.g. SVMs
sum over all
elements!
Could still use streamdrill to extract a representative subset.
Streamdrill

Heavy Hitters counting + exponential decay Instant counts & top-k results over time windows. Indices! Snapshots for historical analysis Beta demo available at http://streamdrill.com, launch imminent
Architecture Overview
REST API
Create a trend
/1/create/plays/user:song:location?size=1000? timescales=day,hour
Update a word
/1/update/plays/frank:123123:San+Francisco
Another word (with timestamp)

/1/query/plays?city=San+Francisco /1/query/plays?user=paul
/1/update/plays/paul:145323:Berlin?ts=131341354135
Get most played songs for SF or Paul Get score for a word
/1/query/score/hello
Example: Twitter Stock Analysis
http://play.streamdrill.com/vis/
Trends:

symbol:combinations symbol:hashtag symbol:keywords symbol:mentions symbol trend symbol:url
$AAPL:$GOOG $AAPL:#trading $GOOG:disruption $GOOG:WallStreetCom $AAPL
$FB:http://on.wsj.com/15fHaZW

Twitter
tweets JavaScript via REST Tweet Analyzer updates streamdrill
Summary

Doesn't always have to be scaling! Stream mining: Approximate results with finite resources. streamdrill: stream analysis engine

Online Learning With Stream Mining

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Online Learning With Stream Mining

Hochgeladen von

Copyright:

Online Learning with Stream Mining

Mikio L. Braun, @mikiobraun http://blog.mikiobraun.de TWIMPACT http://twimpact.com

Isn't all learning online?

Isn't Machine Learning easily Online?

Stochastic gradient descent

Online vs Batch: Non-stationarities

Very good tutorial by Albert Bifet et al. on these issues at http://sites.google.com/site/advancedstreamingtutorial

Time horizons vs. Learning rate

You can't just do online learning on event data!

Also, Event Data is huge

The problem: You easily get A LOT OF DATA!

So, online learning challenges:

Digging into Least Squares

this could be huge!

d with entries d d x d is probably ok

But: It's just a sum!

Another Problem: High-dimensional Spaces

Potentially large spaces:

distinct words: >100k IP addresses: >100M users in a social network: >10M

Stream Mining to the rescue

Stream mining algorithms:

Bounded Resource Analyzer

Heavy Hitters (a.k.a. Top-k)

Fixed tables of counts

Updates for new entry

Query: Take minimum over all hash functions

Clustering with count-min Sketches

For each data point:

Map to closest centroid ( compute distances) Update centroid

count-min sketches to represent sum over all vectors in a class

Heavy Hitters over Time-Window

Instead of a fixed window, use exponential timestamp decay

The beauty: updates are recursive

time shift term

Collect stats by a table of expdecay counters

update(C, item, timestamp, count) update counts

score(C, item) return score

Least Squares Revisited

Need to compute For each

Estimate probabilistic models

which is slightly biased, but simpler

But wait, how do I 1/n with randomly spaced events?

estimate word document frequencies

Extracting a relevant subset

Classification with Nave Bayes

Naive Bayes is also just counting, right?

Multinomnial nave Bayes

Total number of words in class

Classification with Naive Bayes

Classification with Naive Bayes

7 Steps to improve NB:

What about non-parametric methods and Kernel Methods?

Problem here, no real accumulation of information in statistics, e.g. SVMs

sum over all

Could still use streamdrill to extract a representative subset.

Another word (with timestamp)

Example: Twitter Stock Analysis

Example: Twitter Stock Analysis

symbol:combinations symbol:hashtag symbol:keywords symbol:mentions symbol trend symbol:url

$AAPL:$GOOG $AAPL:#trading $GOOG:disruption $GOOG:WallStreetCom $AAPL

Example: Twitter Stock Analysis

Example: Twitter Stock Analysis

Example: Twitter Stock Analysis

tweets JavaScript via REST Tweet Analyzer updates streamdrill

Das könnte Ihnen auch gefallen