Sie sind auf Seite 1von 32

Designing

 Access  Methods:  
The  RUM  Conjecture
Manos  Athanassoulis1,      Michael  S.  Kester1,      Lukas  M.  Maas1,  
Radu Stoica2,      Stratos Idreos1,      Anastasia  Ailamaki3,      Mark  Callaghan4

1Harvard   University               2IBM  Research               3EPFL               4Facebook  Inc.


memory  wall
storage  wall

2
memory  wall storage  wall

3
Memory  Wall

4
Memory  Wall

every  byte  counts 5


Storage  Wall

HDD SSD  (Single  Level  Cell) SSD  (Multi  Level  Cell) HDD  (Shingled  Magnetic  Rec.)
ü capacity ü random   reads ü capacity ü capacity
ü sequential  access ü low  latency × endurance  (worse)   × read/write  asymmetry  
× random   access × capacity
× latency  plateaus × endurance
× read/write  asymmetry
every  byte  counts 6
memory  wall storage  wall
every  byte  we  read  counts every  byte  we  read  counts
every  byte  we  update  counts
every  byte  we  allocate counts every  byte  we  allocate counts

7
in  this  fluid  environment  …

8
we  build  
access  methods

9
...  since  a  long  time  ago!
Can  we  stop  worrying  about
building  access  methods?

10
why  do  we  keep  building?
every  access  method …  is  optimizing  for  the  tradeoff  between

Reads

Hash  Index
Tree Projection
Index

Updates
Memory

this  balance  forms  a  three-­‐way  tug  of  war 11


The  RUM  Conjecture
every  access  method  has  a  (quantifiable) Read
min

• read  overhead
• update  overhead
• memory  overhead
the  three  of  which  form  a  competing  triangle max

we  can  optimize  for  two  of   min

Update
min

Memory
the  overheads  at  the  
expense  of  the  third
12
The  RUM  Conjecture
every  access  method  has  a  (quantifiable) Read
min

• read  overhead
read-­‐optimized
• update  overhead
• memory  overhead
the  three  of  which  form  a  competing  triangle max

we  can  optimize  for  two  of   min

Update Memory
min

the  overheads  at  the  


expense  of  the  third
13
The  RUM  Conjecture
every  access  method  has  a  (quantifiable) Read
min

• read  overhead
update  &  memory
• update  overhead optimized
• memory  overhead
the  three  of  which  form  a  competing  triangle max

we  can  optimize  for  two  of   min

Update Memory
min

the  overheads  at  the  


expense  of  the  third
14
The  RUM  Conjecture
every  access  method  has  a  (quantifiable) Read
min

• read  overhead
memory-­‐optimized
• update  overhead
• memory  overhead
the  three  of  which  form  a  competing  triangle max

we  can  optimize  for  two  of   min

Update Memory
min

the  overheads  at  the  


expense  of  the  third
15
what  would  be  an  optimal  read behavior?

read(x) accesses  only  the  bytes  of  object  X


read(X)

oracle
R

X data

U M 16
what  would  be  an  optimal  read  behavior?

read(x) accesses  only  the  bytes  of  object  X


read(X)

how  free  can  an  oracle  be?


oracle
R

X data

? ?

U M 17
what  would  be  an  optimal  read  behavior?

1 3 4 5 8

unique,  positive  integers


stored  in  the  location  equal  to  their  value

18
what  would  be  an  optimal  read  behavior?

1 3 4 5 8

minimum  read  overhead

19
what  would  be  an  optimal  read  behavior?

1 2 3 4 5 8

insert  2 delete  8 update  4  -­‐>  3

minimum  read  overhead


bound  update  overhead
20
what  would  be  an  optimal  read  behavior?

1 2 3 4 5 8 17

insert  2 delete  8 update  4  -­‐>  3 insert  17

minimum  read  overhead


bound  update  overhead
unbounded  memory  overhead 21
what  would  be  an  optimal  update  behavior?
always  append,   and  invalidate  on  update
update  (X)  changes  the  minimal  number  of  bytes

Always  scan R

X data A B X
?

U about  more  data?


what  about  reads?                what   M 22
what  would  be  an  optimal  update  behavior?
always  append,   and  invalidate  on  update
update  (X)  changes  the  minimal  number  of  bytes

Always  scan R

X data A B X C D

U
higher  read  and  memory  overhead M 23
what  would  be  an  optimal  memory  overhead?
no  metadata  whatsoever,   would  result  in  the  smallest  memory  footprint

scan  and  in-­‐place  updates


scan  and  find
R

?
X data

do  we  need  to  reach  tUhe  optimal(s)? M 24


what  would  be  an  optimal  memory  overhead?
no  metadata  whatsoever,   would  result  in  the  smallest  memory  footprint

scan  and  in-­‐place  updates


scan  and  find
R

?
X data

No!
?

do  we  need  to  reach  tUhe  optimal(s)? M 25


RUM  Conjecture:  an  example
a  tight  column: 8217693
R
• reads have  to  scan
• no  memory  overhead
• in-­‐place  
Uupdates M and  efficient  inserts

26
RUM  Conjecture:  an  example
a  tight  column: 8217693 a  tight  sorted  column: 1 2 3 6 7 8 9
R R
• reads have  to  scan • very  efficient  reads (logarithmic  search)
• no  memory  overhead • no  memory overhead
• in-­‐place  
Uupdates M and  efficient  inserts • updates  &  inserts  
U reorganization
M

27
RUM  Conjecture:  an  example
a  tight  column: 8217693 a  tight  sorted  column: 1 2 3 6 7 8 9
R R
• reads have  to  scan • very  efficient  reads (logarithmic  search)
• no  memory  overhead • no  memory overhead
• in-­‐place  
Uupdates M and  efficient  inserts • updates  &  inserts  
U reorganization
M

adding  clustering: 2 1 3 7 6 9 8
• efficient  readsR
• small  memory overhead
• updatesU&  inserts:  M
reorganization

28
RUM  Conjecture:  an  example
a  tight  column: 8217693 a  tight  sorted  column: 1 2 3 6 7 8 9
R R
• reads have  to  scan • very  efficient  reads (logarithmic  search)
• no  memory  overhead • no  memory overhead
• in-­‐place  
Uupdates M and  efficient  inserts • updates  &  inserts  
U reorganization
M

adding  clustering: 2 1 3 7 6 9 8 …  and  ghost  values: 2 1 3 7 6 9 8


• efficient  readsR • efficient  reads R
• small  memory overhead • small  memory overhead  (but  increased)
• updatesU&  inserts:  M
reorganization • updates:  reorganization  
U M (but  inserts  for  free)

29
RUM-­‐aware  access  methods
Can  we

…  add  flexibility to  existing  access  methods?

...  have  arbitrary  RUM  balance?

…  have  dynamic  RUM  balance?

30
some  active  research  directions
üclassify  existing  access  methods  [Tutorial  -­‐ SIGMOD2016]

üadd  more  metadata,  to  optimize  for  updates/reads


o Bitmap  indexing  [SIGMOD2016] and  LSM-­‐Trees  in  the  works

üpre-­‐partition  data:  minimize  both  read  and  update  cost

üshape-­‐shifting  index:  match  workload  without  offline  optimizations

31
daslab.seas.harvard.edu

Thanks!

http://daslab.seas.harvard.edu/rum/

32

Das könnte Ihnen auch gefallen