Mixi JP Scaling Out With Open Source

mixi.
jp
scaling out with open source
Batara Kesuma
mixi, Inc.
bkesuma@mixi.co.jp
Introduction
•Batara Kesuma
•CTO of mixi, Inc.
What is mixi?
•Social networking service
• Diary, community, message, review, photo
album, etc.
• Invitation only
•Largest and fastest growing
SNS in Japan
Latest information
- Friends new diary
- Comments history
- Communities topics
Friends - Friends new reviews
- Friends new albums
My latest diaries
and reviews
Community listing User Testimonials

History of mixi
•Development started in
December 2003
• Only 1 engineer (me)
• 4 months of coding
•Opened on February 2004
Two months later
•10,000 users
•600,000 PV/day
The “Oh crap!” factor
•This model works
•But how do we scale out?
The first year
•The online population of mixi
grew significantly
•600 users to 210,000 users
The second year
•210,000 users to 2 million users
And now?
More than 3.7 million users
15,000 new users/day
Population of Japan is: 127 million
Internet users: 86.7 million
Source CIA Factbook
70% of active users
(last login less than 72 hours)
Average user spends 3
hours 20 minutes on mixi
per week
Ranked 35th on Alexa
worldwide, and 3rd in
Japan
PV growth in 2 years
Google
Japan
mixi
Amazon
Japan
Users growth in 2 years
3,500,000
Users
2,625,000
1,750,000
875,000
0
04/03 05/03 06/03
Our technology
solutions
The technology behind
•Linux 2.6
•Apache 2.0
•MySQL
•Perl 5.8
•memcached
•Squid
S T RE
UE QU
mod_proxy E Q EST images
R
mod_perl
memcached
HOT OBJECTS
diary cluster message cluster other cluster

Powered by
MySQL
•More than 100 MySQL servers
•Add more than 10 servers/month
•Non-persistent connection
•Mostly InnoDB
•Heavily rely on the use of DB
partitioning (our own solution)
DB replication
•MySQL server load gets heavy
•Add more slaves
DB
REQUEST
Replicate
IT E)
W R
Y (
UER
Q
mod_perl QUERY (READ) DB

DB replication
•Classic problem with DB SLAVES
50 writes/s
replication 25 reads/s
SLAVES 50 writes/s
25 reads/s
MASTER 50 writes/s MASTER
50 writes/s 50 reads/s 50 writes/s 50 writes/s
25 reads/s
100 100
50 writes/s
reads/s reads/s
50 writes/s
50 reads/s
25 reads/s
Some statistics
•Diary related tables
• Read 85%
• Write 15%
•Message related tables

• Read 75%
• Write 25%
DB partitioning
•Replication couldn’t keep up
anymore
•Try to split the DB
How to split?
user A user B user C
message tables Splitting

diary tables vertically by
users or
splitting
other tables
horizontally by
table types
DB
Vertical partition
user A user B user C
message tables
diary tables
other tables
DB
DB 1 DB 2
Vertical partition
•Too many tables to deal with at
one time
•The transition in splitting gets
complex and difficult
Horizontal partition
$dbh = $db->load_dbh(type =>
$dbh = $db->load_dbh();
“message”);
message tables
message tables
NEW DB
diary tables
$dbh = $db->load_dbh(type =>
“diary”);
other tables diary tables
NEW DB
OLD DB Also called level 1
partitioning within mixi
Partition map for level 1
•Small and static
•Just put it in configuration file
•For example:
$DB_DIARY = ‘DBI:mysql:host=db1;database=diary’;
$DB_MESSAGE = ‘DBI:mysql:host=db2;database=message’;
...
Easy transition
mod_perl 1 Writes to both DBs
W
TE
RI
RI
TE
W
AD
RE
AD
RE
Shifts reads 3
SELECT
OLD DB NEW DB
INSERT IGNORE
2 Copies in background
Problems with level 1
•Cannot use JOIN anymore
• Use FEDERATED TABLE from MySQL 5
• Or do SELECT twice which is faster than
using FEDERATED TABLEs
• If table is small, just duplicate it
Next step
•When the new DB gets
overloaded
•We split the DB, yet again
•Get ready for level 2
Partitioning key
•user id, message id
•Choose wisely!
user A user B
message tables message tables

or
user id message id
Level 2 partition
user A user B user C user D
message tables
LEVEL 1 DB
message tables message

NEW DBtables
NODE 1 NODE 2
•Big and dynamic
•Cannot put it all in configuration
file
•Manager based
• Use another DB to do the partition
mapping
•Algorithm based
• Partition map is counted inside
application
• node_id = member_id % TOTAL_NODE
Manager based
MANAGER message tables
DB NODE 1
node_id=2
2 Returns node_id message tables
1 Asks for
node_id
NODE 2
user_id=14
message tables
mod_perl
3 Connects to node NODE 3
Algorithm based
1 Computes node_id message tables
node_id=(user_id%3)+1
node_id=3 NODE 1
number of nodes = 3
mod_perl message tables

NODE 2
message tables
2 Connects to node
NODE 3
Manager based
•Pros:
• Easy to manage
• Add a new node, move data between
nodes
•Cons:
• This process increases by 1 query for
partition map
• It needs to send a request to the manager
Algorithm based
•Pros:
• Application servers can compute node id
by themselves
• Bypass the connection to the manager
•Cons:
• Difficult to manage
• Adding new nodes is tricky
Adding nodes is tricky
old_node_id=(member_id%2)+1
3 Copies in background
number of nodes = 2
new_node_id=(member_id%4)+1
1 Adds a new application logic
number of nodes = 4 NODE 1
COPY
WRITE
mod_perl NODE 2
READ
+
COPY
WR 2 Writes to both DBs
ITE if node_id is different
NODE 3
RE
AD NODE 4
4 Shifts reads
Problems with level 2
member tables
• Too many connections
to different DBs
NODE 1
• Fortunately, on mixi,
member tables the majority are small
NODE 2 data sets
member tables • Cache them all by
NODE 3 using distributed
memory caching
community tables • We rarely hit the DB
NODE 1 • Average page load
time is about 0.02 sec*
community tables
NODE 2
* depending on data sets average load time may vary
Caching
•memcached
• Also used in LiveJournal, Slashdot, etc
•Install server on mod_perl
machine
•39 machines x 2 GB memory
Summary of DB partitioning
•Level 1 partition (split by table
types)
•Level 2 partition (split by
partitioning key)
• Manager based
• Algorithm based
Summary of DB partitioning
1 Split by table types user A user B user C

LEVEL 1
diary tables
2 Split by
partitioning key
other tables
OLD DB
LEVEL 2 LEVEL 2
Image Servers
Statistics
•Total size is more than 8 TB of
storage
•Growth rate is about 23 GB / day
•We use MySQL to store
metadata only
Two types of images
•Frequently accessed images
• Number of image files is relatively small
(about a few million files)
• For example, user profile photos,
community logos
•Rarely accessed images
• About hundred millions of image files
• Diary photos, album photos, etc
Frequently accessed images
•Few hundred GBs of files
•Distribute via the use of FTP and
Squid
•Third party Content Delivery
Network
Frequently accessed images
Squid CDN
sto1.mixi.jp sto2.mixi.jp
2 Pull images from storage
mod_perl UPLOAD Storage

1 Uploads to storage
Rarely accessed images
•Few TBs of files
•Newer files get accessed more
often
•Cache hit ratio is very bad
•Distribute directly from storage
Uploading rarely accessed images
Storage
MANAGER 2 Arranges a pair sto1.mixi.jp
DB of area_id
area_id Storage
=1,2 sto2.mixi.jp
Assigns a id for
Storage
1
an image file
abc.gif sto3.mixi.jp
AD
O
PL
D
U
mod_perl UP LO A
3 Uploads image
Storage
to storage sto4.mixi.jp
Viewing rarely accessed images
7 Asks for abc.gif
User Storage
8 Returns abc.gif
sto1.mixi.jp
1 Asks for
view_diary.pl 6 Returns view_diary.pl Storage
and URL for abc.gif
sto2.mixi.jp
2 Detects abc.gif 5 Creates
in view_diary.pl mod_perl image URL
Storage
abc.gif area_id =1 sto3.mixi.jp
3 Asks for area_id
4 Returns area_id
MANAGER Storage
sto4.mixi.jp
DB
To do
•Try MySQL Cluster
•Try to implement better
algorithm
• Consistent hashing?
• Linear hashing?
•Level 3 partitioning?
• Split again by timestamp?
Questions?
Thank you
•Further questions to
bkesuma@mixi.co.jp
•We are hiring :)
•Have a nice day!

Mixi JP Scaling Out With Open Source

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Mixi JP Scaling Out With Open Source

Hochgeladen von

Copyright:

Verfügbare Formate

mixi.

Community listing User Testimonials

diary cluster message cluster other cluster

mod_perl QUERY (READ) DB

•Message related tables

message tables Splitting

other tables diary tables

message tables message tables

message tables message

mod_perl message tables

message tables message tables

mod_perl UPLOAD Storage

Das könnte Ihnen auch gefallen