Sie sind auf Seite 1von 57

mixi.

jp
scaling out with open source

Batara Kesuma
mixi, Inc.
bkesuma@mixi.co.jp
Introduction
•Batara Kesuma
•CTO of mixi, Inc.
What is mixi?
•Social networking service
• Diary, community, message, review, photo
album, etc.
• Invitation only
•Largest and fastest growing
SNS in Japan
Latest information
- Friends new diary
- Comments history
- Communities topics
Friends - Friends new reviews
- Friends new albums

My latest diaries
and reviews

Community listing User Testimonials


History of mixi
•Development started in
December 2003
• Only 1 engineer (me)
• 4 months of coding
•Opened on February 2004
Two months later
•10,000 users
•600,000 PV/day
The “Oh crap!” factor
•This model works
•But how do we scale out?
The first year
•The online population of mixi
grew significantly
•600 users to 210,000 users
The second year
•210,000 users to 2 million users
And now?
More than 3.7 million users
15,000 new users/day
Population of Japan is: 127 million
Internet users: 86.7 million
Source CIA Factbook
70% of active users
(last login less than 72 hours)
Average user spends 3
hours 20 minutes on mixi
per week
Ranked 35th on Alexa
worldwide, and 3rd in
Japan
PV growth in 2 years

Google
Japan

mixi

Amazon
Japan
Users growth in 2 years
3,500,000

Users

2,625,000

1,750,000

875,000

0
04/03 05/03 06/03
Our technology
solutions
The technology behind
•Linux 2.6
•Apache 2.0
•MySQL
•Perl 5.8
•memcached
•Squid
S T RE
UE QU
mod_proxy E Q EST images
R

mod_perl
memcached
HOT OBJECTS

diary cluster message cluster other cluster


Powered by
MySQL
•More than 100 MySQL servers
•Add more than 10 servers/month
•Non-persistent connection
•Mostly InnoDB
•Heavily rely on the use of DB
partitioning (our own solution)
DB replication
•MySQL server load gets heavy
•Add more slaves
DB
REQUEST

Replicate
IT E)
W R
Y (
UER
Q

mod_perl QUERY (READ) DB


DB replication
•Classic problem with DB SLAVES
50 writes/s

replication 25 reads/s

SLAVES 50 writes/s

25 reads/s
MASTER 50 writes/s MASTER
50 writes/s 50 reads/s 50 writes/s 50 writes/s

25 reads/s
100 100
50 writes/s
reads/s reads/s
50 writes/s
50 reads/s
25 reads/s
Some statistics
•Diary related tables
• Read 85%
• Write 15%

•Message related tables


• Read 75%
• Write 25%
DB partitioning
•Replication couldn’t keep up
anymore
•Try to split the DB
How to split?
user A user B user C

message tables Splitting


diary tables vertically by
users or
splitting
other tables
horizontally by
table types
DB
Vertical partition
user A user B user C
message tables

diary tables

other tables

DB

DB 1 DB 2
Vertical partition
•Too many tables to deal with at
one time
•The transition in splitting gets
complex and difficult
Horizontal partition
$dbh = $db->load_dbh(type =>
$dbh = $db->load_dbh();
“message”);

message tables
message tables
NEW DB
diary tables
$dbh = $db->load_dbh(type =>
“diary”);

other tables diary tables

NEW DB
OLD DB Also called level 1
partitioning within mixi
Partition map for level 1
•Small and static
•Just put it in configuration file
•For example:
$DB_DIARY = ‘DBI:mysql:host=db1;database=diary’;
$DB_MESSAGE = ‘DBI:mysql:host=db2;database=message’;
...
Easy transition
mod_perl 1 Writes to both DBs

W
TE

RI
RI

TE
W

AD

RE
AD
RE

Shifts reads 3

SELECT
OLD DB NEW DB
INSERT IGNORE
2 Copies in background
Problems with level 1
•Cannot use JOIN anymore
• Use FEDERATED TABLE from MySQL 5
• Or do SELECT twice which is faster than
using FEDERATED TABLEs
• If table is small, just duplicate it
Next step
•When the new DB gets
overloaded
•We split the DB, yet again
•Get ready for level 2
Partitioning key
•user id, message id
•Choose wisely!
user A user B

message tables message tables


or

user id message id
Level 2 partition
user A user B user C user D

message tables
LEVEL 1 DB

message tables message


NEW DBtables
NODE 1 NODE 2
Partition map for level 2
•Big and dynamic
•Cannot put it all in configuration
file
Partition map for level 2
•Manager based
• Use another DB to do the partition
mapping
•Algorithm based
• Partition map is counted inside
application
• node_id = member_id % TOTAL_NODE
Manager based
MANAGER message tables
DB NODE 1
node_id=2
2 Returns node_id message tables
1 Asks for
node_id
NODE 2
user_id=14
message tables
mod_perl
3 Connects to node NODE 3
Algorithm based
1 Computes node_id message tables
node_id=(user_id%3)+1
node_id=3 NODE 1
number of nodes = 3

mod_perl message tables


NODE 2

message tables
2 Connects to node
NODE 3
Manager based
•Pros:
• Easy to manage
• Add a new node, move data between
nodes
•Cons:
• This process increases by 1 query for
partition map
• It needs to send a request to the manager
Algorithm based
•Pros:
• Application servers can compute node id
by themselves
• Bypass the connection to the manager
•Cons:
• Difficult to manage
• Adding new nodes is tricky
Adding nodes is tricky
old_node_id=(member_id%2)+1
3 Copies in background
number of nodes = 2

new_node_id=(member_id%4)+1
1 Adds a new application logic
number of nodes = 4 NODE 1

COPY
WRITE
mod_perl NODE 2
READ
+

COPY
WR 2 Writes to both DBs
ITE if node_id is different

NODE 3

RE
AD NODE 4
4 Shifts reads
Problems with level 2
member tables
• Too many connections
to different DBs
NODE 1
• Fortunately, on mixi,
member tables the majority are small
NODE 2 data sets
member tables • Cache them all by
NODE 3 using distributed
memory caching
community tables • We rarely hit the DB
NODE 1 • Average page load
time is about 0.02 sec*
community tables

NODE 2
* depending on data sets average load time may vary
Caching
•memcached
• Also used in LiveJournal, Slashdot, etc
•Install server on mod_perl
machine
•39 machines x 2 GB memory
Summary of DB partitioning
•Level 1 partition (split by table
types)
•Level 2 partition (split by
partitioning key)
• Manager based
• Algorithm based
Summary of DB partitioning
1 Split by table types user A user B user C

message tables message tables


LEVEL 1
diary tables
2 Split by
partitioning key

other tables

OLD DB
message tables message tables

LEVEL 2 LEVEL 2
Image Servers
Statistics
•Total size is more than 8 TB of
storage
•Growth rate is about 23 GB / day
•We use MySQL to store
metadata only
Two types of images
•Frequently accessed images
• Number of image files is relatively small
(about a few million files)
• For example, user profile photos,
community logos
•Rarely accessed images
• About hundred millions of image files
• Diary photos, album photos, etc
Frequently accessed images
•Few hundred GBs of files
•Distribute via the use of FTP and
Squid
•Third party Content Delivery
Network
Frequently accessed images

Squid CDN
sto1.mixi.jp sto2.mixi.jp
2 Pull images from storage

mod_perl UPLOAD Storage


1 Uploads to storage
Rarely accessed images
•Few TBs of files
•Newer files get accessed more
often
•Cache hit ratio is very bad
•Distribute directly from storage
Uploading rarely accessed images

Storage
MANAGER 2 Arranges a pair sto1.mixi.jp
DB of area_id

area_id Storage
=1,2 sto2.mixi.jp
Assigns a id for
Storage
1
an image file
abc.gif sto3.mixi.jp
AD
O
PL

D
U

mod_perl UP LO A
3 Uploads image
Storage
to storage sto4.mixi.jp
Viewing rarely accessed images
7 Asks for abc.gif
User Storage
8 Returns abc.gif
sto1.mixi.jp
1 Asks for
view_diary.pl 6 Returns view_diary.pl Storage
and URL for abc.gif
sto2.mixi.jp
2 Detects abc.gif 5 Creates
in view_diary.pl mod_perl image URL
Storage
abc.gif area_id =1 sto3.mixi.jp
3 Asks for area_id
4 Returns area_id

MANAGER Storage
sto4.mixi.jp
DB
To do
•Try MySQL Cluster
•Try to implement better
algorithm
• Consistent hashing?
• Linear hashing?
•Level 3 partitioning?
• Split again by timestamp?
Questions?
Thank you
•Further questions to
bkesuma@mixi.co.jp
•We are hiring :)
•Have a nice day!

Das könnte Ihnen auch gefallen