Sie sind auf Seite 1von 34

From distributed caches

to in-memory data grids


TechTalk by Max A. Alexejev
malexejev@gmail.com

Memory Hierarchy
RR
<1
<1
ns
L1
ns
L1
~4
~4cycles,
cycles,
~1ns
~1ns
L2
L2
~10
~10cycles,
cycles,~3ns
~3ns
L3
L3
~42
cycles,
~42 cycles,~15ns
~15ns

Cost

DRAM
DRAM
>65ns
>65ns
Flash
Flash // SSD
SSD // USB
USB

Storag
e term

HDD
HDD
Tapes,
Tapes,Remote
Remotesystems,
systems,etc
etc

Max A. Alexejev

Software caches
Improve response times by reducing data access

latency
Offload persistent storages
Only work for IO-bound applications!

Max A. Alexejev

Caches and data location


Consisten
Consisten
cy
cy
protocol
protocol
Local
Local

Shared
Shared

Remote
Remote

Distributi
Distributi
on
on
algorithm
algorithm

Hierarchica
Hierarchica
ll

Distributed
Distributed

Max A. Alexejev

Ok, so how do we grow beyond one


node?
Data replication

Max A. Alexejev

Pros and Cons of replication


Pro
Pro
Best read performance (for local replicated caches)
Fault tolerant cache (both local and remote)
Can be smart: replicate only part of CRUD cycle
Con
Con
Poor writes performance
Additional network load
Can scale only vertically: limited by single machine size
In case of master-master replication, requires complex
consistency protocol
6

Max A. Alexejev

Ok, so how do we grow beyond one


node?
Data distribution

Max A. Alexejev

Pros and Cons of data


distribution
Pro
Pro
Can scale horizontally beyond single machine
size
Reads and writes performance scales
horizontally
Con
Con
No fault tolerance for cached data
Increased latency of reads (due to network
round-trip and serialization expenses)
8

Max A. Alexejev

What do high-load applications need


from cache?

Low
latency

Linear
horizont
al
scalabili
ty

Distribut
ed cache

Max A. Alexejev

Client
Client

Cache access patterns:


Cache Aside
For reading data:
1. Application asks for some data for a given key
2. Check the cache
3. If data is in the cache return it to the user
4. If data is not in the cache fetch it from the DB,

put it in the cache, return it to the user.

Cach
Cach
ee

For writing data


5. Application writes some new data or updates

existing.
6. Write it to the cache

DB
DB

7. Write it to the DB.

Overall:
Increases reads performance
Offloads DB reads
Introduces race conditions for writes

10

Max A. Alexejev

Client
Client

Cache access patterns:


Read Through
For reading data:
1. Application asks for some data for a given

key
2. Check the cache
3. If data is in the cache return it to the user

Cach
Cach
ee

4. If data is not in the cache cache will

invoke fetching it from the DB by himself,


saving retrieved value and returning it to
the user.
Overall:
Reduces reads latency
Offloads read load from underlying storage
May have blocking behavior, thus helping

with dog-pile effect


Requires smarter cache nodes

11

Max A. Alexejev

DB
DB

Client
Client

Cache access patterns:


Write Through
For writing data
1. Application writes some new

data or updates existing.


2. Write it to the cache

Cach
Cach
ee

3. Cache will then synchronously

write it to the DB.


Overall:
Slightly increases writes latency
Provides natural invalidation
Removes race conditions on

writes

12

Max A. Alexejev

DB
DB

Client
Client

Cache access patterns:


Write Behind
For writing data
1. Application writes some new data or

updates existing.
2. Write it to the cache
3. Cache adds writes request to its

internal queue.

Cach
Cach
ee

4. Later, cache asynchronously flushes

queue to DB on a periodic basis and/or


when queue size reaches certain limit.
Overall:
Dramatically reduces writes latency by

a price of inconsistency window


Provides writes batching
May provide updates deduplication

13

Max A. Alexejev

DB
DB

A variety of products on the market


Memcache
Memcache
dd
GigaSpace
GigaSpace
ss
Oracle
Oracle
Coherence
Coherence
Riak
Riak

Hazelcast
Hazelcast

Cassandra
Cassandra

Redis
Redis
Infinispan
Infinispan
EhCache
EhCache
14

Terracotta
Terracotta
MongoDB
MongoDB

Max A. Alexejev

KV
KV
caches
caches

Lets sort em out!


Some products are really
hard to sort like
Terracotta in both DSO
and Express modes.

NoSQL
NoSQL

Data
Data
Grids
Grids

Memcache
d

Redis

Oracle
Coherence

Ehcache

Cassandra

GemFire

MongoDB

GigaSpace
s

GridGain

Hazelcast

Infinispan

15

Max A. Alexejev

Why dont we have any distributed


in-memory RDBMS?
Master
Master MultiSlaves
MultiSlaves configuration
configuration
Is, if fact, an example of replication
Helps with reads distribution, but does not help with
writes
Does not scale beyond single master
Horizontal
Horizontal partitioning
partitioning (sharding)
(sharding)
Helps with reads and writes for datasets with good data
affinity
Does not work nicely with joins semantics (i.e., there are
no distributed joins)
16

Max A. Alexejev

Key-Value caches

Memcached and EHCache are good examples to


look at

Keys and values are arbitrary binary (serializable)


entities

Basic operations are put(K,V), get(K),


replace(K,V), remove(K)

May provide group operations like getAll() and


putAll()

17

Max A. Alexejev

Some operations provide atomicity guarantees

Memcached

18

Developed for
LiveJournal in 2003

Has client libraries in


PHP, Java, Ruby, Python
and many others

Nodes are independent


and dont communicate
with each other
Max A. Alexejev

EHCache

19

Initially named Easy Hibernate


Cache

Java-centric, mature product with


open-source and commercial
editions

Open-source version provides only


replication capabilities, distributed
caching requires commercial
license for both EHCache and
Terracotta TSA

Max A. Alexejev

NoSQL Systems
A whole bunch of different products with both

persistent and non-persistent storage options.


Lets call them caches and storages, accordingly.
Built to provide good horizontal scalability
Try to fill the feature gap between pure KV and

full-blown RDBMS

20

Max A. Alexejev

Case study: Redis


hset users:goku
powerlevel 9000 hget
users:goku powerlevel

21

Max A. Alexejev

Written in C, supported by
VMWare

Client libraries for C, C#, Java,


Scala, PHP, Erlang, etc

Single-threaded async impl

Has configurable persistence

Works with K-V pairs, where K is


a string and V may be either
number, string or Object (JSON)

Provides 5 interfaces for:


strings, hashes, sorted lists,
sets, sorted sets

Supports transactions

Use cases: Redis


Good for fixed lists, tagging, ratings, counters,

analytics and queues (pub-sub messaging)


Has Master MultiSlave replication support.

Master node is currently a SPOF.


Distributed Redis was named Redis Cluster and

is currently under development

22

Max A. Alexejev

Case study:
Cassandra

23

Max A. Alexejev

Written in Java, developed


in Facebook.

Inspired by Amazon
Dynamo replication
mechanics, but uses
column-based data model.

Good for logs processing,


index storage, voting, jobs
storage etc.

Bad for transactional


processing.

Want to know more? Ask


Alexey!

In-Memory Data Grids


New generation of caching products, trying to combine
benefits of replicated and distributed schemes.
24

Max A. Alexejev

IMDG: Evolution
Data
Data Grids
Grids

Computational
Computational
Grids
Grids

Reliable
Reliable storage
storage
and
live
data
and live data
balancing
balancing among
among
grid
nodes
grid nodes

Reliable
Reliable jobs
jobs
execution,
execution,
scheduling
scheduling and
and
load
balancing
load balancing

Moder
Moder
n
n
IMDG
IMDG
25

Max A. Alexejev

IMDG: Caching concepts

Implements KV cache interface

Provides indexed search by values

Provides reliable distributed locks interface

Caching scheme partitioned or distributed, may be specified per cache or cache service

Provides events subscription for entries (change notifications)

Configurable fault tolerance for distributed schemes (HA)

Equal data (and read/write load) distribution among grid nodes

Live data redistribution when nodes are going up or down no data loss, no clients
termination

Supports RT, WT, WB caching patterns and hierarchical caches (near caching)

Supports atomic computations on grid nodes

26

Max A. Alexejev

IMDG: Under the


hood

27

All data is split in a number of


sections, called partitions.

Partition, rather then entry, is an


atomic unit of data migration when
grid rebalances. Number of partitions
is fixed for cluster lifetime.

Indexes are distributed among grid


nodes.

Clients may or may not be part of the


grid cluster.

Max A. Alexejev

IMDG Under the hood:


Requests routing
For get() and put() requests:
1. Cluster member, that makes a request, calculates

key hash code.


2. Partition number is calculated using this hash

code.
3. Node is identified by partition number.
4. Request is then routed to identified node,

executed, and results are sent back to the client


member who initiated request.
For filter queries:
5. Cluster member initiating requests sends it to all

storage enabled nodes in the cluster.


6. Query is executed on every node using distributed

indexes and partial results are sent to the


requesting member.
7. Requesting member merges partial results locally.
8. Final result set is returned from filter method.

28

Max A. Alexejev

IMDG: Advanced use-cases


Messaging
Map-Reduce calculations
Cluster-wide singleton
And more

29

Max A. Alexejev

GC tuning for large grid nodes


An easy way to go: rolling restarts or storage-enabled

cluster nodes. Can not be used in any project.


A complex way to go: fine-tune CMS collector to

ensure that it will always keep up cleaning garbage


concurrently under normal production workload.
An expensive way to go: use OffHeap storages

provided by some vendors (Oracle, Terracotta) and


use direct memory buffers available to JVM.
30

Max A. Alexejev

IMDG: Market players


Oracle Coherence: commercial, free for evaluation

use.
GigaSpaces: commercial.
GridGain: commercial.
Hazelcast: open-source.
Infinispan: open-source.
31

Max A. Alexejev

Terracotta
A company behind EHCache, Quartz and Terracotta Server Array.
Acquired by Software AG.
32

Max A. Alexejev

Terracotta Server Array


All data is split in a number of sections, called stripes.
Stripes consist of 2 or more Terracotta nodes. One of them is Active node, others have Passive status.
All data is distributed among stripes and replicated inside stripes.
Open Source limitation: only one stripe. Such setup will support HA, but will not distribute cache data. I.e., it is not horizontally scalable.

33

Max A. Alexejev

And thank you for coming!

Max A. Alexejev

QA Session

Das könnte Ihnen auch gefallen