Sie sind auf Seite 1von 171

SCALING INSTAGRAM

AIRBNB OPEN AIR SUMMIT 2015

Mike Krieger
INSTAGRAM

co-founder, technical lead

So Paulo, Brazil

photo: Diego Torres Silvestre

Stanford SymSys

photo: Waqas Mustafeez

@mikeyk
mike@instagram.com

LAST TIME I GAVE


A TALK AT AIRBNB

it was April 2012

we were 2 years old

2 product guys with


no backend experience

@goldenretrieverbailey

we had been acquired


the week before

I had not slept much

we had an engineering
team of 4 people

we had about
30 million monthly actives

Taylor Swift CMA

TODAY...

we're 5 years old

sleeping (slightly) more

hired better coders than me

we have an eng
team of 95 people

we have over
300 million monthly actives

Taylor Swift Grammy

THIS TALK

how is Instagram infra


different in 2015?

what guides our evolution?

how we adapted to
infra, team, and product changes

ORIGINAL PHILOSOPHY

do the simple
thing first

aka YAGNI

aka Use Boring Technology

boring means
operationally quiet, too

nginx &
redis &
memcached &
postgres &
gearman &
django

2015 EDITION

nginx &
redis &
memcached &
postgres &
gearman &
django

nginx &
cassandra &
memcached &
postgres &
rabbitmq &
django

nginx &
cassandra &
memcached &
postgres & unicorn &
rabbitmq & proxygen &
scribe &
django
thrift

1
do the simple
2
thing first
until your
{scale, team, product}
changes

1
do the simple
2
thing first
until your
{scale, team, product}
changes

scaling = replacing all


components of a car while
driving at 100mph

which components to
replace & when

DEEPER DIVE

Async Tasks (site scale)


Code Deployment (team scale)
Search (product scale)

ASYNC TASKS

requests
should
take
<
3s
ADS
CAROUSEL ADS

fan-out delivery to all your


ADSfollowers' feeds
CAROUSEL ADS

especially
popular
users
ADS
CAROUSEL ADS

post to external services


ADS(eg FB & Twitter)
CAROUSEL ADS

v1:
Gearman
ADS
CAROUSEL ADS

async
task
broker
ADS
CAROUSEL ADS

1 gearman broker
4
app
servers
ADS
1 async worker box

CAROUSEL ADS

dead
simple
to
set
up
ADS
CAROUSEL ADS

memcached-like
in
simplicity
ADS
CAROUSEL ADS

got us through
ADS
1.5 years of growth
CAROUSEL ADS

photo: MAMJODH

messy to add/deploy
ADS new workers
CAROUSEL ADS

single core, 60ms mean


ADSsubmission time
CAROUSEL ADS

1s+ADS
enqueue time under load
CAROUSEL ADS

8 gearman brokers
400 app servers
ADS12,000+ threads
32
async
worker
boxes
CAROUSEL ADS

v2:
sharded
gearman
ADS
CAROUSEL ADS

BROKERS[node_index % len(BROKERS)]

ADS
CAROUSEL ADS

no
graceful
failover
ADS
CAROUSEL ADS

# of app
servers
growing
quickly
ADS
CAROUSEL ADS

persistence was more dangerous


ADS
than not persisting
CAROUSEL ADS

simple thing was waking us up &


ADS
becoming operational burden
CAROUSEL ADS

operating
at
new
scale
ADS
CAROUSEL ADS

time
to
move
on
ADS
CAROUSEL ADS

your infra

please thank all your soon to be


ADS
decommissioned infra pieces
CAROUSEL ADS

basically didn't think about


ADS
Gearman until we had to
CAROUSEL ADS

do
the
simple
thing
next
ADS
CAROUSEL ADS

roll
your
own
ADS
CAROUSEL ADS

rewrite
gearman
ADS
CAROUSEL ADS

v3:
celery
and
rabbitmq
ADS
CAROUSEL ADS

celery
ADS
for much simpler worker code
CAROUSEL ADS

rabbitmq
ADS
low(ish) maintenance
CAROUSEL ADS

any dev can add async task with


ADS
one @task decorator
CAROUSEL ADS

kick
off
with
function.delay()
ADS
CAROUSEL ADS

replication + failover
ADS + persistence
CAROUSEL ADS

ADS

5ms mean
10ms P90

CAROUSEL ADS

opportunity to gain both


ADS
operational & dev efficiency
CAROUSEL ADS

more details:
ADS
http://bit.ly/igcelery
CAROUSEL ADS

DEPLOYMENT

the ADS
art of getting code to prod
CAROUSEL ADS

v1:
fab
and
git
pull
ADS
CAROUSEL ADS

fabric:
Python
remote
scripting
ADS
CAROUSEL ADS

> fab djangos update_git


> fab djangos restart_django
ADS
CAROUSEL ADS

great
for
2
engineers
ADS
CAROUSEL ADS

past
12
machines
=
pain
ADS
CAROUSEL ADS

v2: fab parallel mode


ADS to the rescue
CAROUSEL ADS

> fab -z20 djangos update_git


> fab -z20 djangos restart_django

ADS
CAROUSEL ADS

worked
up
to
70
machines
ADS
CAROUSEL ADS

the ADS
year of the GitHub DDOSs
CAROUSEL ADS

swear
it
wasn't
us
deploying
ADS
CAROUSEL ADS

v3:
fab
rollout
ADS
CAROUSEL ADS

> fab -z20 djangos rollout:server


...doing fresh git fetch
...zipping up origin/master
...uploading to S3
...pulling down zip
ADS
...unpacking zip
...mapping 'current' symlink
CAROUSEL ADS
...restarting Django

lasted
us
another
1.5
years
ADS
CAROUSEL ADS

IG
infra
2
to
10
eng
ADS
CAROUSEL ADS

hey, can I roll out?


ADS
wait! I'm already rolling
CAROUSEL ADS

v4:
enter
Sauron
ADS
CAROUSEL ADS

lasted
us
another
1.5
years
ADS
CAROUSEL ADS

v5: scaling institutional


ADS knowledge
CAROUSEL ADS

did you remember to roll to a canary?


don't roll to the workers with a -z of > 40!
did you
tail the error logs?
ADS
CAROUSEL
ADS
did you
catch that new tier we deployed?

> fab -z20 djangos rollout:server


...grabbing lock from Sauron
...doing fresh git fetch
...zipping up origin/master
...uploading to S3
ADS
...pulling down zip to canary 1
...unpacking zip on canary 1
CAROUSEL ADS
...mapping 'current' symlink on canary 1
...restarting Django on canary 1

...tailing error logs on canary 1


...ok, 200 responses are even
...deploying to async worker 1
...measuring success rate on worker 1
...looks good, deploying widely

ADS

CAROUSEL ADS

hold on, aren't you basically


doing
continuous
deployment,
ADS
but not?
CAROUSEL ADS

backend
committers++
ADS
CAROUSEL ADS

human
lock
contention
ADS
CAROUSEL ADS

v5:ADS
continuous deployment
CAROUSEL ADS

extended Sauron with Jenkins


ADS integration
CAROUSEL ADS

ADS

take human
procedure,
automate
ADS
CAROUSEL ADS

deeply understood every step of


ADS our deploy
CAROUSEL ADS

has scaled to 50+ committers on


ADS
backend codebase
CAROUSEL ADS

SEARCH

v1:
minimize
moving
parts
ADS
CAROUSEL ADS

SELECT id FROM users WHERE


full_name LIKE ...
ADS
CAROUSEL ADS

postgres & search, sittin' in


ADS
a b-tree
CAROUSEL ADS

prefix-only,
plz
ADS
CAROUSEL ADS

haystack
was
pretty
small
ADS
CAROUSEL ADS

ok,
but
Bieber
ADS

CELEBRITY_OVERRIDES = {
'taylor swift': 19151555,
(
:
E
D
'taylorswift': 19151555,
O
C
L
A
U
T
'justinbieber': 6860189,
C
A
ADS
'justin bieber': 6860189
CAROUSEL
ADS
}

ok, but Selena & Taylor & Harry &


ADS
Zayn & ...

aka product
needs
have
evolved
ADS

ADS

v2: Solr

Lucene-based
HTTP/JSON
interface
ADS
great indexing options

curl -XPUT 'http://solr/update/json' -d '{


{"add":
{"doc": {
"username" : "justinbieber",
"followed_by": 12345678
}
}
}'

ADS

CAROUSEL ADS

- CELEBRITY_OVERRIDES = {
- 'taylor swift': 19151555,
- 'taylorswift': 19151555,
- 'justin bieber': 68680189
ADS
- }
CAROUSEL ADS

<1ADS
month to transfer over

launch
Android
ADS

4x
the
queries
ADS

no
SolrCloud
yet
ADS

index twice?
ADS
partition by prefix?

scale
had
changed
ADS

v3:
ElasticSearch
ADS

curl -XPUT 'http://es:9200/users/user/6860189' -d '{


"username" : "justinbieber",
"followed_by": 12345678
}'

ADS
CAROUSEL ADS

also Lucene based


easy
query
API
ADS
out-of-box cluster support

very
simple
to
set
up
ADS

in a steady state,
ADS
worked beautifully

but (at least in 2013) had high


ADS
operational overhead

ADS

split brain

AWS
autodiscovery
ADS

had
to
keep
queries
simple
ADS

not enough engineers to fully


ADSstaff search team

meanwhile,
instagration
ADS

v4:
Unicorn
ADS

FB's
graph
search
system
ADS

core idea: use social edges as


ADS
part of the search

// people who I follow named Justin


(and (term justin*)
(term followedby:4))
// people followed by the people I follow, named Justin

ADS

(and (term justin*)


(apply followedby:(term followedby:4))
// people named Justin, prioritizing the people I follow
CAROUSEL
ADS
(weak-and (term followedby:4 :optional-hits 2)
(term justin*))

double-digit % increase in search


ADS
clicks per daily active

bonus:
new
Explore
photos
ADS

v1:
most
liked,
globally
ADS

tryingADS
to everything to everyone

v2: photos liked by


ADS people I follow

let's
get
social
ADS

// photos I haven't liked, but the people I follow liked


(difference
(or likedby:friendA likedby:friendB )
likedby:4
)

ADS
CAROUSEL ADS

ADS

who I follow (not always) who has


ADS
my taste

// photos I haven't liked yet, liked by people whose photos


I already liked
(difference
(apply liker:
(extract owner: liker:4))
liker:4)

ADS
CAROUSEL ADS

6x increase in taps into photos


ADS on Explore

http://bit.ly/fbunicorn
ADS

TAKEAWAYS

1
do the simple
2
thing first
until your
{scale, team, product}
changes

ground your evolution in


ADSproblem-solving
CAROUSEL ADS

then do the next simplest thing

get in touch:
ADS
mike@instagram.com
CAROUSEL ADS

Das könnte Ihnen auch gefallen