Beruflich Dokumente
Kultur Dokumente
B -trees
Previous Lectures
Height balanced binary search trees: AVL trees,
red-black trees. Multiway search trees: (2,4) trees.
Advantages: search faster than in an average
binary search tree because the tree is guaranteed to
have the smallest possible depth.
Today: another kind of multi-way search trees (Btrees). Optimises search in external memory.
B -trees
External storage
So far, we analysed data structures
assuming that all data is kept in main
memory.
If the data is kept in external memory,
should take into account disk access time.
It takes much longer to find the right place
on disk compared to main memory.
B -trees
Main principles
Data is stored on disk in chunks (pages,
blocks, allocation units) and the disk drive
reads or writes a minimum of one page at a
time. To optimise an algorithm for
accessing external memory, should
minimise disk accesses
read and write in multiples of page sizes.
B -trees
B-trees
A good example of a data structure for
external memory is a B-tree.
Better than binary search trees if data is
stored in external memory (they are NOT
better with in-memory data!).
Proposed by R. Bayer and E. M. McCreigh
in 1972.
B -trees
B-trees
Each node in a tree should correspond to a
block of data.
Each node stores many data items and has
many successors (stores addresses of
successor blocks).
The tree has fewer levels but search for an
item involves more comparisons at each
level.
For database search, it is more important to
minimise page swaps than comparisons.
B -trees
Example of a B-Tree
25
3 9 15
1 2
4 5
10 13
37 50
16 17
B -trees
28 33
39 48
52 55 67 89
Practical considerations
Since each node correspond to a block/page,
their size is usually a power of 2.
The number of records in a node is
therefore usually even.
The number of children (the order of the
tree) is therefore usually odd.
B -trees
B-Tree of order 5:
25
3 9 15
1 2
4 5
10 13
37 50
16 17
B -trees
28 33
39 48
52 55 67 89
Search in a B-Tree
The search for a record with key x is done
as in all multiway search trees:
Search(node,x): if node is null, return null.
Iterate through records (k1 , r1) , (km ,rm)
of the node.
If x = ki return ri .
If x< ki Search(childi , x)
If km < x, Search(childm+1 , x).
B -trees
3 9 15
1 2
4 5
10 13
37 50
16 17
B -trees
28 33
39 48
52 55 67 89
3 9 15
1 2
4 5
10 13
37 50
16 17
B -trees
28 33
39 48
52 55 67 89
Inserting in a B-Tree
Find the node where the item is to be
inserted by following the search procedure.
If the node is not full, insert the item into
the node in order.
If the node is full, it has to be split.
B -trees
Inserting 12:
25
3 9 15
1 2
4 5
10 13
37 50
16 17
B -trees
28 33
39 48
52 55 67 89
Inserting 12:
25
3 9 15
1 2
4 5
10 12 13
37 50
16 17
B -trees
28 33
39 48
52 55 67 89
Inserting 53:
25
3 9 15
1 2
4 5
10 12 13
37 50
16 17
B -trees
28 33
39 48
52 55 67 89
37 50
28 33
39 48
52 53 55 67 89
B -trees
Splitting a node
Find middle value (of old keys in the node
and the new key).
Keep records with keys smaller than middle
in the old node.
Put records with keys greater than middle in
the new node.
Push middle record up into parent node.
B -trees
Splitting a node
37 50
28 33
39 48
52 53 55 67 89
B -trees
Splitting a node
37 50
55
28 33
39 48
52 53
B -trees
67 89
Splitting a node
37 50 55
28 33
39 48
52 53
B -trees
67 89
Insertion
If this makes the parent full, split it as well
and push its middle item upwards.
Continue doing this until either some space
is found in an ancestor node, or a new root
node is created.
B -trees
Deletion in B-trees
As in (2,4) trees, replace the item to be removed
by its in-order successor (which is bound to be in a
leaf node).
Remove successor from its leaf node.
This may cause underflow (fewer than d/2 records
in that leaf node u).
Depending on how many records the sibling of u
has, this is fixed either by fusion or by transfer.
B -trees
Delete 37:
25
3 9 15
1 2
4 5
10 12 13
37 50 55
16 17
B -trees
28 33
39 48 52 53 67 89
3 9 15
1 2
4 5
10 12 13
37 50 55
16 17
B -trees
28 33
39 48 52 53 67 89
3 9 15
1 2
4 5
10 12 13
39 50 55
16 17
B -trees
28 33
39 48 52 53 67 89
3 9 15
1 2
4 5
10 12 13
39 50 55
16 17
B -trees
28 33
48 52 53 67 89
3 9 15
1 2
4 5
10 12 13
39 50 55
16 17
B -trees
28 33
48 52 53 67 89
3 9 15
50 55
39
1 2
4 5
10 12 13
16 17
B -trees
28 33
48
52 53 67 89
3 9 15
1 2
4 5
10 12 13
50 55
16 17
B -trees
28 33 39 48
52 53 67 89
Delete 16
25
3 9 15
1 2
4 5
10 12 13
50 55
16 17
B -trees
28 33 39 48
52 53 67 89
3 9 15
1 2
4 5
10 12 13
50 55
16 17
B -trees
28 33 39 48
52 53 67 89
3 9 15
50 55
13
1 2
4 5
10 12
17
B -trees
28 33 39 48
52 53 67 89
3 9 13
50 55
15
1 2
4 5
10 12
17
B -trees
28 33 39 48
52 53 67 89
3 9 13
1 2
4 5
10 12
50 55
15 17
B -trees
28 33 39 48
52 53 67 89
Efficiency of B-trees
If a B-tree has order d, then each node (apart from
the root) has at least d/2 children. So the depth of
the tree is at most log d/2 (size)+1. In the worst
case have to make d-1 comparisons in each node
(linear search, but d-1 is a constant factor).
Fewer disk accesses than for a binary tree.
Each node could correspond to one block of data
(plus addresses of successor nodes).
B -trees
Variant of B-trees
Only leaves contain data
Non-leaves contain only keys and block
numbers
Access speed is higher because each block
can hold more block numbers.
Programming is more complicated because
there are two kinds of nodes: leaves and
non-leaves.
B -trees
Indexing
Index: list of key/block pairs.
If small enough, could be kept in the main
memory.
If the index is too large to be loaded in the
main memory, it is often kept in a B-tree.
Multiple index files: if different kinds of
search have to be performed, can keep one
index for every set of keys.
B -trees