Sie sind auf Seite 1von 10

Address lookup algorithms for IPv6

Y.K. Li and D. Pao Abstract: Previously published address lookup algorithms are mainly tailored for IPv4. The lookup operation is optimised based on the prex distribution patterns found in IPv4 routing tables. The introduction of IPv6 with increased address length and very different prex length distribution poses new challenges to the lookup algorithms. Major renements to the address lookup algorithms are necessary when applied to IPv6. In this paper, we study three well-known approaches for IPv6 address lookup, namely the trie-based approach, the range search approach, and the hash-based approach. We extend the address lookup methods with incremental update capability and evaluate their lookup rate, update rate and memory requirement. Theoretically, the hash-based method has good scalability. However, it is found that the hash-based method has the slowest lookup rate and the worst memory efciency. The trie-based method has the fastest lookup rate but the slowest update rate. The range search approach has the best update rate and memory efciency. The update rate of the trie-based method can be improved by combining with the hash-based method.

Introduction

One of the major functions of an IP router is to forward the incoming packets towards their nal destination. In order to do that, the router uses the destination address in the packets header as the key to lookup its routing table. Each entry in the routing table is an ordered pairoaddress prex, next-hop4. The incoming packet will be forwarded to the next-hop of the longest prex that matches its destination address. The address lookup problem, which is fundamental to the design of the next generation routers [1], has attracted a substantial amount of attention in the past few years. A number of sophisticated algorithms have been reported in the literature [1, 2]. In general, the address lookup algorithm is required to meet the following requirements. (i) Fast lookup: The packet processing rate should match the packet arrival rate. For core routers with 10 Gbit/s interfaces, the required packet processing rate is up to 30 MPPS (million packets per second). Such a high packet processing rate can only be achieved by hardware implementations. For lower-end to mid-range routers with 155 Mbit/s to 1 Gbit/s interfaces, software approaches are preferred because of better exibility and lower cost. (ii) Memory efcient: The number of prexes contained in the routing table has risen from 10 000 to over 150 000 in the past decade [3]. Load balancing and multi-homing will continue to contribute to the growth of the routing table in the years to come [4]. To coupe with the continuous growth of the routing table, the address lookup algorithm should employ space efcient data structures. (iii) Fast incremental update: Because of changes in network topology and the instability of the BGP protocols, frequent
r The Institution of Engineering and Technology 2006 IEE Proceedings online no. 20050652 doi:10.1049/ip-com:20050652 Paper rst received 12th December 2005 and in revised form 24th February 2006 The authors are with the Electronic Engineering Department, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong E-mail: d.pao@cityu.edu.hk IEE Proc.-Commun., Vol. 153, No. 6, December 2006

updates (on the average of a few hundreds updates per second) to the routing table have been observed [5]. Consequently, the address lookup method should support fast incremental updates to the data structure. There can be performance tradeoff among the above three requirements. Some methods achieve very fast lookup rate at the expanses of consuming more memory space and requiring substantial pre-computations. The introduction of the next generation IP standard, i.e. IPv6, with 128-bit address length poses an additional challenge to the scalability of the address lookup algorithms. Most of the previously proposed methods are tailored for IPv4. Typically, over 85% of the prexes in IPv4 routing tables are 16 to 24 bits long. A common approach to speed up the lookup process is to distribute the prexes into disjoint groups based on the value of the rst 16 to 24 bits of the prex value. The average size of non-empty groups is only up to a few hundreds. The lookup algorithm will then take the rst 16 to 24 bits of the destination to select one of the groups by a simple indexing operation. The matching of the remaining 8 to 16 bits of the destination address with the prexes in the selected group can be done efciently. Extensive development and deployment of IPv6 have been taking place in many countries, especially Europe, United States, Japan and China, in the past couple of years [6]. Because of the rapid growth of the uses of mobile computing devices and the US Department of Defenses mandate that all newly acquired networking equipment should be IPv6 enabled, the transition from IPv4 to IPv6 will be accelerated [7]. IPv6-ready network equipments are available in the market. IPv6 prexes are much longer than the IPv4 counter parts and have a very different distribution. The effectiveness of the indexing approach using the rst 16 to 24 bits of the address value is less apparent. Signicant renements to existing lookup algorithms are required if they are applied to IPv6. In this paper we study the performance of three software approaches for IPv6 address lookup, namely the trie-based approach, the range search approach, and the hash-based approach. The wellknown trie-based method and hash-based method are
909

extended to support incremental updates. A detailed implementation of the multicolumn search method is presented. A new hybrid method that combines LC-trie with hashing is proposed. The source codes developed in this research can be available upon request. Readers may refer to [8, 9] for hardware solutions that are suitable for core routers. In [8] we have presented a pipelined processing approach that can achieve an optimum throughput of one lookup per memory cycle. The level-compressed trie is represented using bit-vectors to reduce memory consumption, and the data structure supports efcient incremental update. In [9] we have presented a two-level routing table organisation that can reduce the TCAM storage requirement by about 40%. 2 Characteristics of IPv4 and IPv6 routing tables

allocated when it is known that one and only one subnet is needed; and a 128-bit prex is allocated when it is absolutely known that one and only one device is connecting to the network. It is also recommended that mobile devices be allocated 64-bit prexes. To allow a smooth transition from IPv4 to IPv6, the IPv4-mapped address format and IPv4compatible address format are dened. An IPv4-compatible address is obtained by attaching the 32-bit IPv4 address to a special 96-bit pattern of all zeros. The IPv4-mapped address starts with 80 bits of zeros and 16 bits of ones, followed by the 32-bit IPv4 address. Since the prexes in a routing table correspond to network addresses, it is expected that majority of the IPv6 route prexes will have 48 to 64 bits [13]. 3 Address lookup algorithms

In IPv4, the destination address of an IP packet is 32 bits long. The 32-bit IP address is broken down into the network address part and the host address part. IP routers forward packets based only on the network address until the packets reach the destined network. Typically, an entry in the forwarding table stores the address prex (e.g. the network address) and the routing information (i.e. the nexthop router and output interface). Three different network sizes are dened in the classful addressing scheme, namely classes A, B, and C. The addresses of class A networks are 8 bits long and start with the prex 0. The addresses of class B networks are 16 bits long and start with the prex 10. The addresses of class C networks are 24 bits long and start with the prex 110. In order to allow a more efcient use of the IP address space and avoid the problem of forwarding table explosion, arbitrary length prexes are allowed in the classless interdomain routing (CIDR) scheme. The router can aggregate the address prexes to reduce the size of the routing table. It has been found that over 85% of the prexes in IPv4 routing tables are 16 to 24 bits long, and there are only a small number of prexes that are shorter than 16 bits. An IPv6 address has 128 bits. Details of the IPv6 address formats can be found in [10, 11]. According to the Internet Architecture Board (IAB) and Internet Engineering Steering Group (IESG) recommendation [12], a unicast IPv6 address consists of two components, i.e. a 64-bit network/subnetwork ID followed by a 64-bit host ID. In general, an address block with a 48-bit prex is allocated to a subscriber. Very large subscribers could receive a 47-bit prex or slightly shorter prex, or multiple 48-bit prexes. Some 32-bit and 36-bit prex blocks had been allocated before the publication of [12]. A 64-bit prex may be

3.1

Trie-based approach

The longest matching prex problem can be modelled as a search problem on a binary-trie. Figure 1 depicts an example of the trie representation of a small routing table. To nd the longest matching prex, we traverse the binarytrie along the path dened by the packets destination address. If the length of the IP address is W bits, a straightforward implementation of the lookup algorithm will require 2W storage space and W iterations. More efcient lookup algorithms aim at reducing the space and/or time requirements. One may observe that by pushing the prexes to the leaf nodes of the binary-trie and store the nodes on the bottom level in an array, the address lookup can be done in one memory access. This approach is, however, not practical. For IPv4, we need to have an array with 232 entries. Moreover, a prex with L bits will be replicated up to 2W L times, which practically inhibits incremental update. Gupta et al. proposed a multi-level indexing table organisation [14] that can reduce the memory requirement to an acceptable range of about 9 to 33 MB with certain restrictions on the number and distributions of long prexes. However, the current trends of having a much higher number of long prexes, e.g. the Telstra router [3] has over 25 K prexes that are longer than 24 bits, the amount of memory required using this method will increase substantially. The Lulea algorithm [15] attempts to minimise the memory requirement such that the entire routing table can be stored in the cache memory of the network processor. First, the binary-trie is converted to a complete prex tree with unnecessary empty nodes removed. They use bitvectors to represent a cut in the prex tree on levels 16, 24, and 32, respectively. Associated with each bit-vector is an

p1= 0*

p2= 10*

p3= 010* p6= 1000 (p1) (p3) (p3) (p1) (p1) (p6) (p2) (p2)

p4= 110*

p5= 0001 (p1) (p5) (p1)

p7= 1110 (p2) (p4) (p4) (p7)

null

Fig. 1

Example binary-trie for a 4-bit address space

Prexes at the bottom level of nodes are obtained by leaf pushing


910
IEE Proc.-Commun., Vol. 153, No. 6, December 2006

array of code words that provides the bit counts and base addresses to a pointer table that stores the references of the next level chunk or the longest matching prex. The search operation will rst use the rst 16 bits of the destination address to search the level-16 cut. If the search result indicates that there can be matching prexes longer than 16 bits, then the next 8 bits of the destination address will be extracted and is used to search the level-24 cut, and so on. Since the pointer table needs to be packed, insertion/ deletion of prexes will involve a large amount of data movement and updating to the array of code words. Srinivasan and Varghese generalised the concept of multi-bit trie [16]. Instead of making a decision based on a single bit, multiple bits are considered together to decide on which of the branches to be taken in a step. In the multi-bit trie, only a smaller set of distinct prex length {l1, l2, y, lk} are allowed. A prex whose length is larger than l1 but smaller than or equal to l2 will be expanded to have l2 bits. The number of bits used to decide on the branching is called the stride. If the nodes at the same level have the same stride size, it is called xed stride multi-bit trie, otherwise, it is called variable stride multi-bit trie. Fixed stride is easy to implement but the variable stride gives a better lookup speed and more efcient space usage. Dynamic programming technique can be used to determine the optimal set of strides for a given routing table that minimise the space consumption and the worst case searching time [16]. Incremental update to the data structure is possible, however, it is difcult to maintain the optimal structure after the dynamic updates. The above three methods are not scalable to IPv6. When the address length is increased from 32 bits to 128 bits, the number of cuts or depth of the trie will be increased accordingly. The amount of storage required is dependent on the number of prexes N, the length of the prexes L, and the stride width. With the increased address length in IPv6, the memory requirement of the above three methods will growth exponentially. In IPv6, prexes are very sparsely distributed in the 128-bit address space. For example, even there are one million prexes, the probability of hitting a non-empty node in a 128-bit binary-trie is less than 3 10 33. The amount of memory required to store the non-empty nodes can be reduced using the path compression technique. Path compression allows us to skip the intermediate nodes along single route paths. The number of levels skipped is represented by the skip value. LC-trie [17] combines path compression and level compression to speed up the lookup process and reduce the memory requirement. It has a much better scalability and is selected for detailed study in this paper. In level compression, the 2i descendents of a node n located i levels below n is replaced by a single node of degree 2i. The children nodes are packed in an array, and can be referenced via the starting address stored in the parent node. The degree of a node is represented by the branch factor, i.e. the number of bits used to identify one of the branches. A node in the LC-trie has three elds, the branch factor, the skip value and a pointer. A leaf node will have the branch factor equal to zero. The pointer eld of an internal node references another node in the LC-trie, while the pointer eld of a leaf node references an entry in the basevector table that stores the prexes, the next-hop and a pointer that references the prex table. The prex table stores linked lists of internal prexes. The average search time is proportional to the depth of the LC-trie. One may trade memory space for better search efciency. A node n (except the root) may have a branch factor i if at least f 2i of its descendants located i levels
IEE Proc.-Commun., Vol. 153, No. 6, December 2006

below n are present, where f is called the ll factor, 0 o f r 1. The branch factor of the root is xed at 16. In the subsequent discussion, f is assumed to be equal to 0.5. Figure 2b shows the structure of the LC-trie with f 0.5 that corresponds to the binary-trie of Fig. 2a. Empty leaves may be created if f is smaller than 1. Prexes corresponding

B s=3

D s=2

s=1

I s=1

K s=2

s=skip value

Basevector table prefix pointer B 001101* A C 010100*-D 010101* -G 10010* F H 10011* F I 1011* F J 110* E K 11101* E node 0 1 2 3 4 5 6 7 8 9 10 11 12 branch 3 0 0 1 0 1 0 0 0 0 0 0 0 skip 0 0 3 2 0 1 1 0 2 0 0 0 0
c

Prefix table len next 3 A 1 E F E 2

pointer 1 B B 9 D 11 I J K C D G H

Fig. 2 An example binary-trie with prexes labelled from A to K, structure of the corresponding LC-trie with f 0.5 and data structures of the LC-trie
The next-hop eld of the basevector table and the prex table is not shown a Example b Structure c Data structure
911

to internal nodes are only stored in the prex table and can be accessed by traversing the linked list after arriving at a leaf in the LC-trie, hence the pointer eld of empty leaves in the LC-trie are assigned the reference to a neighbouring prexes instead of the null value. References to a prex may be replicated and stored in the leaf nodes on several levels of the LC-trie. Note that prexes E and F are internal prexes, and they can only be accessed through the prex table. The data structures described in [17] are optimised to achieve the maximum lookup speed. Each trie-node ts in a 32-bit word and the nodes are packed in an array, and the basevector table is sorted to facilitate the construction of the LC-trie. Ravikumar et al. [18] suggested to store the prex in the trie-node instead of in the basevector table and prex table such that the lookup speed can be further improved. In their implementation for IPv4, the 64-bit node contains a 4-bit branch factor, a 7-bit skip value, a 5-bit next-hop value, a 16-bit trie node pointer, and a 32-bit prex. With a 16-bit node pointer, their implementation only allows a maximum of 64 K nodes in the LC-trie. For IPv6, the node size is increased to 192 bits. When the ll factor is smaller than one, the memory efciency is low. Incremental update to the trie structure has not been discussed in [18]. LC-trie is more scalable to IPv6 as the memory requirement is mainly dependent on the number of prexes N. Because of the highly optimised data structures, the original LC-trie algorithm does not support incremental update. We have shown in [19] that by slightly modify the data structures to associate prexes to both leaf and internal nodes and include a memory management module based on the buddy system, incremental update can be done in about 15 ms. Figure 3a shows the modied LC-trie structure proposed in [19]. Note that prexes E and F are belong to hidden nodes (internal nodes that are excluded because of level compression) in the modied LC-trie. They are associated with the parent of their corresponding hidden nodes in the LC-trie. For a densely populated block, the number of internal prexes associated with a node can be quite large. To further improve the searching speed, we make one renement to the data structure by performing a one-level leaf pushing as shown in Fig. 3b. Internal prexes are associated with the nodes of the expanded block rather than the parent of the internal nodes. To reduce the

memory requirement, the lists of nodes in the expanded block are organised as shared-lists, i.e. only one physical copy of the prexes E and F are present. The list of prexes associated with a trie node are sorted by their length in descending order. Each trie node has four elds (a 5-bit branch factor, a 7-bit skip value, a 20-bit node-pointer that references the next level LC-trie node, and a 32-bit pointer that references the rst prex in the linked list) and occupies a 64-bit double word. With a 20-bit node-pointer, a maximum of 512 K prexes can be stored in the LC-trie if the ll factor f is equal to 0.5. Each node in the linked list of prex has a 128-bit eld to store the prex value, a 8-bit length value, a 8-bit next-hop eld, and a 32-bit next pointer. The revised search algorithm is as follow. The LCtrie will be traversed according to the value of the input address A. When we arrive at a trie-node n, if the associated linked list is not empty, the input address is compared to the rst prex p in the linked list. If p matches A, then p is the best matching prex seen so far, and the traversal of the LC-trie will continue. If p does not match A, then the traversal of the LC-trie stops and the algorithm compares A with the remaining prexes in the linked list of node n to search for a possible better matching prex. The incremental insertion and deletion operations are conceptually the same as [19].

3.2

Range search approach

s=3 B A

s=2

s=1 I

s=1 J K

s=2

s=3 B A

s=2

s=1 I

s=1 J F E K

s=2

Fig. 3

LC-trie structures

a Structure proposed in [19] b Structure used in this study


912

In this approach, a prex is interpreted as an address range [20]. A prex p represents the range in between the two endpoints pL and pH, where pL (pH) is obtained by appending 0 (1) to p to form a full length address. Each prex is replaced by its two end-points, and the end-points are sorted in ascending order. For a full-length prex, no bit padding is required. The prex will be represented by one single value, a merged point, in the list of end-points. Every endpoint in the list corresponds to a unique prex, and every region between two consecutive end-points corresponds to a unique prex. The prex corresponding to each region is pre-computed and is associated with its two endpoints. In principle, three prexes are associated with each end-point, namely L-BMP, E-BMP, G-BMP. To look up a given address A, we perform a binary search on the sorted list of endpoints E to locate the interval i such that E[i] r A o E[i+1]. Suppose the search operation is terminated at an end-point E[ j], the BMP is given by the L-BMP, EBMP, or G-BMP of E[ j] depending on whether the input address A is less than, equal to or greater than E [ j], respectively. Figure 4 shows the list of end points and the associated BMPs corresponds to the routing table of Fig. 2a. The G-BMP (L-BMP) of the lower-end (higherend) of prex p is p itself. The L-BMP (G-BMP) of the lower-end (higher-end) of prex p is set to q where q is the closest ancestor of p in the binary-trie. If the list of endpoints are stored in an array, coincided end-points should be combined into a single entry. For example, entries 8 and 9 should be combined where the E-BMP of the longer prex and the L-BMP of the shorter prex should be selected, i.e. the combined entry will have E-BMP/G-BMP F and L-BMP null. Similarly, entries 15 and 16 should also be combined such that E-BMP/L-BMP I and GBMP E. If the list of end-points are stored in a binary search tree, coincided end-points can be kept separately to simplify the update operations. One can easily see that the G-BMP of the lower-end is the same as the E-BMP. The L-BMP of the higher-end is the same as the E-BMP. For a merged-point, the L-BMP is equal to the G-BMP. In practice only two BMPs need to be associated with an end-point.
IEE Proc.-Commun., Vol. 153, No. 6, December 2006

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

End-point AL BL BH AH CL CH DL DH EL FL GL GH HL HH IL IH FH JL JH KL KH EH

00 00 00 00 01 01 01 01 10 10 10 10 10 10 10 10 10 11 11 11 11 11

value 10 00 11 01 11 01 11 11 01 00 01 00 01 01 01 01 00 00 00 00 01 00 01 01 01 10 01 11 11 00 11 11 11 11 00 00 01 11 10 10 10 11 11 11

00 00 11 11 00 11 00 11 00 00 00 11 00 11 00 11 11 00 11 00 11 11

L-BMP --A B A --C --D --E F G F H F I F E J E K E

E-BMP B B A C D F G G H H I I F J J K K E

G-BMP B A ------F G F H F I F E J E K E ---

Table 1: Data elds of a tree node using range search


Field Key (expanded prex) Length Next-hop of BMP1 Next-hop of BMP2 Status (L/H/M, LTH, RTH) Left pointer Right pointer 3rd (bypass) pointer Size (bits) 128 8 8 8 8 32 32 32

Fig. 4 List of end-points for the set of prexes in the example of Fig. 2a assuming 8-bit full address length
The padded bits are underlined

In [20], the authors proposed to use a 16-bit index table to process the rst 16 bits of the address, and use the range search approach to match the remaining 16 bits of the address against the prexes in the selected group. After the indexing, the search algorithm only needs to consider the remaining 16 bits of the address, i.e. the size of the search key is only 2 bytes. The access time of the cache memory is much shorter than that of the main memory. The searching speed can be improved by using 6-way search where a tree node of the multi-way search tree ts in a cache line. It was also argued that the number of prexes in a given group is relatively small, the BMPs of the end-points in a group are recomputed from scratch when the routing table is updated. Major renements to the data structures and update operations are required when this approach is applied to IPv6. In IPv6, using a 16-bit index table cannot signicantly reduce the complexity of the search operation. In addition, after the indexing step the size of the search key is only reduced to 112 bits. A node of a multiway search tree will occupy more than one cache lines. Hence, the use of multiway search tree does not help to reduce the computation time. In this study, we use a height-balanced inorder-threaded binary search (BST) tree to store the list of end-points. To facilitate incremental updates, coincided end-points are not removed. The ordering of two end-points E1 and E2 with equal value (after bit padding) is dened as follow. Let p1 and p2 be the corresponding prexes of E1 and E2, and p1 is shorter than p2. If E1 is the lower-end of p1, then E1 o E2. If E1 is the higher-end of p1, then E1 4 E2. The data elds dened in the tree node are listed in Table 1. Since the nal outcome of the address lookup is to determine the next-hop value, the next-hop of the BMP is stored in the binary search tree instead of the BMP itself. The status byte indicates whether the end point is the lowerend, higher-end, or a merged-point. The LTH (RTH) ag indicates whether the left (right) pointer is a tree pointer or a thread. The bypass pointer points to the node storing the
IEE Proc.-Commun., Vol. 153, No. 6, December 2006

other end-point of the same prex, i.e. the bypass pointer of pL points to pH, and vice versa. The search operation will proceed as follows. The BST is traversed starting from the root. If the input address is less than the key value of the current node, then the left subtree will be searched. If the input address is greater than the key value, then the right subtree will be searched. If the input address is equal to the key value, then the search direction depends on the node type. If the node type is L, then the algorithm will continue to search the right subtree for possible longer matching prexes. If the node type is H, then the algorithm will continue to search the left subtree. If the node type is M, then the best matching prex have been found and the search operation is terminated. When a new prex p is inserted, we need to (i) determine the L-BMP/G-BMP of the lower-end point/higher-end point, (ii) insert the lower- and higher-end points of p to the BST, and (iii) update the G- or L-BMP of other prexes that are covered by p. The L-BMP/G-BMP of the lowerend point/higher-end point of p can be determined in constant time after the point of insertion for pL or pH has been determined. Two ranges specied by prexes p and q are either disjoint or the smaller range is a subset of the larger range. Suppose the point of insertion for pH has been determined and x is the inorder successor of pH in the BST. There can be two possibilities. If x encloses p, then x is the G-BMP of pH. If x and p are disjoint, then the G-BMP of pH is given by the associated BMP of x. To update the BMP of the prexes covered by p, we perform an inorder traversal from pL to pH using the thread pointers. When the lower-end point of a prex q is visited, the corresponding higher-end point is accessed using the by-pass pointer and the nodes between qL and qH can be skipped. The algorithm will then continue the inorder traversal from qH until reaching pH. Now consider removing a prex p, and let q be the G-BMP of pH. The delete algorithm will perform an inorder traversal from pL to pH, and change the associated BMP of the prexes enclosed by p to q. After updating the associated BMP of the prexes covered by p, pL and pH are physically removed from the search tree. In the above method, the key value is 128 bits long. When the program is run on a 32-bit machine, it takes multiple instructions to compare two 128-bit values. Lampson et al. proposed the idea of multicolumn search [20] for IPv6. The 128-bit prex is decomposed into four 32-bit columns such that individual columns are searched separately. The advantage of this approach is that the size of the search key matches the word length of todays 32-bit microprocessors. However, no implementation details were given in [20]. In this paper, we shall present an implementation of the multicolumn search with incremental update capability. Let us use the set of end-points of Fig. 4 as an example.
913

Assume the 8-bit address is divided into 4 columns. Suppose we want to look up the input address 10 01 10 00. In the rst step, we locate 10 in the rst column. When we proceed to the second column, the search is restricted to the guarded range from rows 8 to 16. In the second step, we nd 01 in the guarded range and proceed to the third column and the search will be restricted to rows 10 to 13. Eventually, we nd an exact match at row 12 and the best matching prex is H. It has been suggested in [20] that the multicolumn search can be elegantly represented by binary search trees. Each node in the search tree has a next-column pointer in addition to the left and right pointer. The next-column pointer will be used to reference the subset of search keys in the guarded range in the next column. Key values in the rst column are organised in a BST, and there is one BST for each guarded range in the subsequence columns. Suppose we want to look up the address 10 10 00 00. In this case, the value 10 is not found in the corresponding guarded range in column 2. If the search is terminated at node 11 during the traversal of the BST in column 2, then we may follow the next-column pointer to locate the leftmost node of the BSTs in column 3 and column 4 to arrive at node IL. The BMP is then given by the L-BMP of that node which is F in this example. To improve the performance, the following renements to the data structures are made. Four types of node are dened, L (lower-end), H (higher-end), M (merged), and I (internal). A prex is only expanded to the closest column boundary, e.g. a 48-bit prex will only be expanded to 64 bits. A node in the BST of the rst three columns that is not an end-point is called an internal node. Two BMPs are associated with each internal node. The meaning of the two BMPs for the four types of nodes are given in Table 2. Note that the L-BMP of a merged-point is the same as the G-BMP. Duplicated internal nodes within a guarded range can be removed. However, coincided L-type and Htype end-points will be kept in order to simplify the update operation. Coincided end-points are ranked by their type and the actual length of the prex as in the case of the basic binary search approach. The length of an internal node is equal to the column number multiplied by column width, e.g. the length of an internal node in column 2 is equal to 64. The ordering of the four types of equal-value end-points is L o M o I o H. The ordering of two equal-value L-type (H-type) end-points is that the one with a shorter prex length is smaller (larger) than the one with a longer prex length. Figure 5 shows the compressed list of end-points, and Fig. 6 depicts the logical structure of the search trees of the multicolumn search method. The BMP of an I-node i is determined by the following rules: 1. The L-BMP of i is p if p is the most specic prex in the given guarded range such that p encloses i and pL is less than i. Similarly, the G-BMP of i is p if p is the most specic prex in the given guarded range such that p encloses i and pH is greater than i. For example, the L-BMP of I5 is A and the G-BMP of I5 is null.
Table 2: BMP of the four node types
Node type L H M I BMP1 E-BMP E-BMP E-BMP L-BMP BMP2 L-BMP G-BMP G-BMP G-BMP

2. If i is not enclosed by any pair of end-point in the guarded range of its home column, then we trace back to the previous column to look for an enclosing prex if one exists. For example, the L-BMP/G-BMP of I7 is F. By compressing the list of end-points, we can save memory space and by associating the L-BMP and G-BMP to internal nodes, the searching speed can be improved. The node structure of the multicolumn search is essentially the same as the basic binary search, except that the size of the key is reduced to 32 bits. The third pointer of an internal

End-point AL BM AH CM DM EL FM GL GH HL HH IM JL JH KL KH EH

00 00 00 01 01 10 10 10 10 10 10 10 11 11 11 11 11

value 10 11 01 11 01 00 01 01

01 01 01 01 11 00 01 10 10

00 01 10 11

10 11

BMP1 A B A C D E F G G H H I J J K K E

BMP2 --A --------E F F F F F E E E E ---

Fig. 5 Compressed list of end-points for multicolumn search with column width equals to 2 bits

c olum n 1

c olum n 2

c olum n 3

00 (I 1)

10 (A L) 11 (I 5) 11 (A H) 01 (I 6)

01 (B M)

01 (I 2) 10 (E L) 10 (F M) 10 (I 3)

00 (C M) 01 (D M ) 00 (G L) 01 (G H ) 10 (H L) 11 (H H )

01 (I 7) 11 (I M )
00 (J L) 01 (J H) 10 (I 8)
BMP1 -----E A --F E BMP2 ----E ----F E

11 (I 4) 11 (E H)
I-node I1 I2 I3 I4 I5 I6 I7 I8

10 (K L) 11 (K H)

Fig. 6

Logical structure of the multicolumn search trees

Nodes are shown in sorted order within the corresponding guarded ranges. Internal nodes are numbered from 1 to 8
IEE Proc.-Commun., Vol. 153, No. 6, December 2006

914

node is used as the next-column pointer. In the case of an end-point, it is used as the bypass pointer. One major change to the search algorithm is that when a matching internal node is found, the algorithm will follow the nextcolumn pointer to look for possible longer matching prex. When a matching lower-end or merged point is found, the right subtree of the node (if exist) will be search for possible longer matching prex. When a matching higherend node is found, the left subtree of the node (if exist) will be search for possible longer matching prex. When no matching node is found in a guarded range, the search is terminated. Let us consider the example for looking up the input address 10 10 00 00 again. Form Fig. 6, we can see that the search will terminate at node I7 or IM in column 2 (depending on which of the two nodes is selected as the root of the BST). In either case, the best matching prex F is given by BMP2 of I7 or IM. The update algorithm is similar to the basic binary search scheme, except that several BSTs may need to be adjusted, and the BMPs of the affected internal nodes should be updated.

the binary search on range approach. The steps for insertion and deletion are shown below. Insert a prex p Step 1. Insert pL and pH to the BST and update the BMP of markers lying within the range from pL to pH. Step 2. Insert markers for p if required. When a new marker m is inserted, search for its BMP by performing an inorder traversal on the binary search tree starting from the higherend of the marker. Let x be the inorder successor of m. There are three possibilities: (i) x is a prex and x encloses m, then x is the BMP of m. (ii) x does not enclose m, and y is the prex that encloses x. If y encloses m, then y is the BMP of m; otherwise BMP of m is null. (iii) x is a marker and y is the BMP of x. If y encloses m, then y is the BMP of m; otherwise BMP of m is null. Delete a prex p Step 1. Find the longest prex q that encloses p. Change the BMP of markers covered by p to q. Step 2. If there is no longer prexes enclosed by p, i.e. no prex exists within the range from pL to pH, then remove pL and pH from the BST; otherwise, change the node type from prex to marker in the BST. Step 3. For the markers along the search path of p, if mH is the inorder successor of mL in the BST, remove m. The hash table indices of the prexes and markers are also stored in the BST. Hence, normal search operations are allowed when the network processor is handling update requests. Once the required modications to the hash tables are determined, the corresponding entries in the hash table can be updated directly. In the above update algorithm, a marker is removed only when it does not enclose any prex. It is possible that some redundant markers are kept in the hash tables. One can overcome this problem by associating a reference counter with each marker. The reference counter represents the number of prexes that will require the presence of the marker. When the reference counter is reduced to zero during prex deletion, the corresponding marker is then removed. However, there is a tradeoff in the two options. In the second option, we need to allocate storage for the reference counter, say 2 bytes are required. This extra storage space is required in every hash table entry. If the number of redundant markers is not high, then the rst option may be more space efcient. Variants of the binary search on length method have been proposed [21] such as the mutating binary search and rope search. These methods try to optimise the search paths based on the search history. They require additional data structures to record the dynamic search paths. This will, however, further complicate the pre-computation and update operation. Hence, they are not considered in this study.

3.3

Hash-based approach

One hash table is dened for each prex length. Prexes are stored in different hash tables according to their lengths. The longest matching prex can be determined using the binary search on prex length [21]. When a match is found in a hash table, the algorithm continues to search hash tables for longer prex length. However, when there is no match, to which direction to continue the search cannot be determined. To resolve this problem, Waldovget proposed to introduce markers in the data structure to guide the search direction. Associated with each marker m is the longest prex in the routing table that matches m. Theoretically, the hash-based method has good scalability. Assuming binary search on prex length is used, then the time complexity of the search operation is O(log2 W). For IPv6, the longest matching prex can be determined in seven probes to hash tables. A drawback of this method is the complexity of the update process. When a prex p is removed from the routing table, the markers that are associated with p may need to be removed. In addition, if p is the best matching prex of a marker m, then m should also be updated. Similarly, when a new prex is added, multiple markers may need to be updated. No details of the update operations were given in [21]. In this study, an incremental update algorithm is proposed. In the update operation, we need to locate all the markers that may be affected by the insertion or deletion of a prex. When a prex p is inserted (deleted), markers with shorter length may be added (removed), and markers with longer length may need to update their BMP. One may locate the shorter markers by probing the hash tables with key value taken from the leading bits of p. However, the hash tables does not support the function to locate markers with longer length simply because these markers can have many possible values. Some auxiliary data structures are required. The update operation is best visualised using a binarytrie. The prex p is the BMP of a marker m if p is the closest ancestor of m. For every prex p, there should be a marker or a prex along the path of the binary search on length that leads to the length of p. A marker m can be removed if there is no prex with longer length exists in its two subtrees. One approach to implement the update algorithm is to represent each prex or marker by its pair of endpoints as in the case of the range search approach. The list of end-points are organised as a binary search tree similar to
IEE Proc.-Commun., Vol. 153, No. 6, December 2006

3.4

Hybrid method

In the hybrid method, we try to combine the hashing method with LC-trie in order to take the advantages of the two methods. Prexes in the routing table are divided into four groups, (i) IPv4-mapped and IPv4-compatible prexes; (ii) IPv6 prexes with less than 32 bits; (iii) IPv6 prexes with 32 to 63 bits; (iv) IPv6 prexes with 64 bits and above. A 32-bit LC-trie is used to store all the prexes derived from IPv4 addresses. The branch factor of the root of this LC-trie is xed at 16. Another 31-bit LC-trie is used to store prexes of group 2. In this case, the branch factor of the root is
915

governed by the distribution of the prexes. Prexes of the third group is further divided into a number of subgroups. Prexes in each subgroup share a common 32-bit prex. Each subgroup is represented by a 31-bit LC-trie. Similarly, prexes of the fourth group is divided into subgroups based on the value of the rst 64 bits. Each subgroup is represent by a 64-bit LC-trie. There are two hash tables, one with 32-bit hash key and the other with 64-bit hash key. The search operation proceeds as follows. Step 1: Check if the input address is an IPv4-mapped or IPv4-compatible address. If yes, extract the last 32 bits of the input address to search the IPv4 LC-trie. Otherwise, go to Step 2. Step 2: Probe the 64-bit hash table with the rst 64 bits of the input address. If there is a hit, search the corresponding 64-bit LC-trie using the remaining 64 bits of the input address. If a matching prex is found, the search is terminated, otherwise go to Step 3. Step 3: Probe the 32-bit hash table with the rst 32 bits of the input address. If there is a hit, search the corresponding LC-trie using bits 33 to 63 of the input address. If a matching prex is found, the search is terminated; otherwise go to Step 4. Step 4: Search the LC-trie for group 2 prexes using the rst 31 bits of the input address. The advantage of this scheme is that the sizes of the LCtries are reduced. Hence, the efciency of the update operations can be improved substantially.

Table 3: Computation complexity of the address lookup algorithms


Algorithm Lookup operation Update operation

LC-trie and variants Range search/Multicolumn search Hash + Binary search on length

O(log2 log2 N ) O(2k) O(log2 N ) O(log2 W ) O(log2 N + Q) O(W + M)

Table 4: Distribution of the synthetic IPv6 prexes


Prex length Distinct 24-bit to 47-bit prexes Distinct 48-bit prexes 25-bit to 64-bit sub-prexes Distinct 64-bit prexes 65-bit to 128-bit sub-prexes Percentage 510% 3550% 3550% 1520% 15%

Performance evaluation

First we would like to give a qualitative comparison of the algorithms in terms of the computation complexity for lookup and update operations. For LC-trie, the average depth of the trie is O(log2 log2 N) and the worst case depth is O(W/k) where N is the number of prexes, W is the address length, and k is the average stride width. If the branch factor of a trie node (i.e. the stride width) is changed because of insertion/deletion operation, the number of data movement in adjusting the data structures is O(2k). For the range search and multicolumn search methods, when a new prex p is added (or removed), we also need to update the BMP of prexes that are covered by p. Prex q is said to be covered by p if p is the nearest ancestor of q in the logical binary-trie. For the hash-based approach with binary search on length, the insertion of a prex p involves the addition of markers along the search path of p and updates the BMP of markers that are covered by p. If Q is the number of prexes covered by p and M is the number of markers covered by p, it is obvious that M 4 Q. Table 3 summarises the asymptotic complexity of the algorithms. A more accurate comparison can be obtained by measurement of the execution time on a common hardware platform. The deployment of IPv6 is still in its infancy. Large scale routing tables are not available. Performance of the address lookup algorithms can only be measured using synthetic routing tables. We assume there are a xed number of 38 000 IPv4-mapped and IP4compatible route prexes. These prexes are taken from the mea-east IPv4 routing table. IPv6 route prexes are synthesised with the distribution shown in Table 4. The sub-prexes are used to model the load balancing, multihoming and/or network partitioning. A sub-prex is generated by rst randomly picking a distinct prex and
916

then extend its length by padding a random bit pattern of the selected length. The address lookup algorithms are coded using the C language. The programs are executed on a computer with a 2.0 GHz Pentium-4 CPU, 512MB main memory, running the LINUX operating system. For each selected table size, ten instances of the routing table are generated. An address trace with 100 K entries is created by randomly picking route prexes from the routing table and extending them to full-length addresses. For update traces, 3000 prexes are randomly picked from the routing table. These prexes are removed and re-inserted into the routing table. Six address lookup methods are implemented and they are labelled by LC (the original LC-trie), LC-I (the modied LC-trie with incremental update), LC-H (the hybrid LC-trie combined with hashing), BS (binary search on end-points), MCS (multicolumn search), and hash (hashing with binary search on length) in Figs. 79. LC is obtained by modifying the source code of [17] which was originally designed for IPv4. The shortest prex in our test data is 24 bits long. Hence, in the implementation of hash, the algorithm will terminate when the search length is less than 24. The modulo function is used as the hash function and the load factor of the hash tables is set to about 0.3 to avoid having a high collision probability. Figure 7 plots the average lookup rate of the six methods. In general, LC has the best lookup rate and hash is the slowest method. LC-I and LC-H are about 10% to 20% slower than LC. The reduction in lookup rate is mainly due to the additional processing (comparing the input address with the rst prex in the linked list of the trie node) in each iteration. This is consistent with our previous experimentations for IPv4 [19]. The lookup rate of MCS is higher than BS, but the improvement is not very substantial. Since IPv4-mapped and IPv4-compatible addresses share a common 80-bit string of zeros at the front, the size of the rst column search tree is largely reduced for small to moderate sized routing tables. However, this advantage gradually diminishes as the proportion of IPv6 prexes is increased in the large routing tables. It is also noticed that the sum of the depth of the search trees on the four columns in MCS can be larger that the depth of a single search tree in BS.
IEE Proc.-Commun., Vol. 153, No. 6, December 2006

Fig. 7

Average packet processing rate of the six algorithms

Fig. 8

Average update time of the ve algorithms

Fig. 9

Memory requirements of the six algorithms

Hence, the gain in the key comparison operation in MCS (comparing 32-bit keys instead of 128-bit keys in BS) is offset by the increase in the number of iterations required in a search operation. Although the asymptotic complexity of hash is lower than that of BS/MCS, its execution time is found to be longer than the other methods. This can be due to the following two factors: (i) using a 32-bit CPU to process hash keys that are longer than 32 bits imposes additional time penalty; and (ii) collisions in a hash table will also lengthen the processing time. Figure 8 shows the average update time of the ve algorithms. MCS has the best update efciency. It takes about 2 ms to perform one insertion/deletion. For LC-I, the
IEE Proc.-Commun., Vol. 153, No. 6, December 2006

time to update an IPv4-mapped/IPv4-compatible prex is about 230 ms while the time to update an IPv6 prex is about 55 ms. The difference is because of the relatively higher prex density found in the IPv4-mapped/IPv4compatible address range. The branch factor of some of the trie node in that region can be quite large. When the branch factor of a trie node needs to be updated, substantial amount of data movements are required. IPv6 prexes are more sparsely distributed. It is more likely that an insertion/ deletion may not trigger a restructuring of the LC-trie. In addition, to restructure a block with a smaller branch factor can be done more efciently. Since the number of IPv4mapped/IPv4-compatible prexes are xed, the proportion of IPv6 prexes increases with the overall size of the routing table. Hence, the average update time is improved when the routing table size is increased. In LC-H, the LC-trie is decomposed into a number of smaller LC-tries. An update operation is restricted to one of the smaller LC-tries. In LCH, the root node of the 32-bit LC-trie that stores the IPv4mapped/IPv4-compatible prexes has a xed branch factor of 16, hence the time to update an IPv4-mapped/IPv4compatible prex is reduced to about 33 ms. The time to update an IPv6 prex is about 14 ms. The relative update performances of BS/MCS and hash are consistent with our qualitative analysis. Since the number of markers to be updated in hash is higher than the corresponding number of prexes that need to be updated in BS/MCS, the update time of hash is found to be almost ten times higher than that of BS/MCS. The overall memory usages of the six methods are shown in Fig. 9. MCS has very good memory efciency. Its memory usage is only about 66% to 72% of BS. The memory usage of LC-I and LC-H are slightly higher than the original LC-trie. In order to support incremental update, dynamic memory allocation using the buddy system is employed. Sufcient free spaces should be available in the system. If the overhead for the free spaces is included, the memory requirement of LC-I and LC-H should be increased by about 50%, which will then be higher than that of MCS. The memory requirement of hash is four to ve times that of MCS. Memory consumption of hash depends on the number of markers and the load factor of the hash tables. In this study, it is found that the total number of marker is approximately equal to the number of prex in the routing table. Since we maintain the load factor of the hash tables to be around 0.3, the overall memory requirement of hash is about four to ve times that of the other methods. Theoretically hash has very good scalability. However, its performance is not as good as other methods. In order to support incremental update, auxiliary data structure, e.g. a binary search tree storing the markers and prexes, is required. The auxiliary data structure already occupies more memory than the BS method. The lookup rate of hash is even lower than BS. Hence, hash is not a good candidate for IPv6 address lookup. MCS is superior to BS in all the three metrics. It is a potential candidate for small routing tables. For large routing tables, LC-H is obviously the best choice. 5 Conclusion

Previously published address lookup algorithms are tailored for IPv4. Their performances for IPv6 address lookup are uncertain. In this research, we have made major extensions to three address lookup approaches, namely the LC-trie approach, the range search approach and the hash-based approach, for application to IPv6 address lookup. A total
917

of six address lookup methods are implemented and evaluated. All the methods, except the original LC-trie algorithm, are equipped with incremental update capability. Their performances are measured using synthetic IPv6 routing tables. The experimental results reveal that the LC-trie methods have the best lookup rate, whereas the multicolumn search method has the best update performance. The hashing with binary search on length method is expected to have good scalability. However, it is found that its performance is inferior to the other two approaches. The update time of the LC-trie method can be reduced signicantly by combining with the hashing method without scarifying the lookup performance. To the best of our knowledge, this is the rst in-depth study on the performances of software approaches for IPv6 address lookup. The source codes developed in this research can be available upon request, hoping that this will encourage future research and experimentation in IPv6 address lookup. 6 Acknowledgments

This research was supported by CityU Strategic grant 7001423. 7 References

1 Chao, H.J.: Next generation routers, Proc. IEEE, 2002, 90, (9), pp. 15181558 2 Ruiz-Sanchez, M.A., Biersack, E.W., and Dabbous, W.: Survey and taxonomy of IP address lookup algorithms, IEEE Netw., 2001, pp. 823 3 BGP table statistics, http://bgp.potaroo.net 4 Bu, T., Gao, L., and Towsley, D.: On characterizing BGP routing table growth, Comput. Netw., 2004, 45, pp. 4554

5 Labovitz, C., Malan, G.R., and Jahanian, F.: Internet routing instability, IEEE/ACM Trans. Netw., 1998, 6, (5), pp. 515528 6 IPv6 Task Force Steering Committee: IPv6 overall status, 2004, www.ipv6tf-sc.org 7 Gupta, P., Etzel, K., and Bolaria, J.: A scalable and cost-optimized search subsystem for IPv4 and IPv6, EE Times NetSeminar, 28 June 2004, http://www.cypress.com 8 Pao, D., Liu, C., Wu, A., Yeung, L., and Chan, K.S.: Efcient hardware architecture for fast IP address lookup, IEE Proc. Comput. Digit. Tech., 2003, 150, (1), pp. 4352 9 Pao, D.: TCAM organization for IPv6 address lookup. IEEE ICACT, Feb. 2005, pp. 2631 10 Hinden, R., Deering, S.: Internet protocol version 6 (IPv6) addressing architecture. Network Working Group RFC 3513, April 2003, ftp://ftp.rfc-editor.org/in-notes/rfc3513.txt 11 Hinden, R., Deering, S., Nordmark, E.: IPv6 global unicast address format, Network Working Group RFC 3587, Aug. 2003, ftp://ftp.rfceditor.org/in-notes/rfc3587.txt 12 IAB and IESG: IAB/IESG recommendations on IPv6 address allocations to sites, Network Working Group RFC 3177, Sept. 2001, ftp://ftp.rfc-editor.org/in-notes/rfc3177.txt 13 Cypress Semiconductors Co., www.cypress.com 14 Gupta, P., Lin, S., and McKeown, N.: Routing lookups in hardware at memory access speeds. IEEE INFOCOM, 1998, pp. 12401247 15 Degermark, M., Brodnik, A., Carlsson, S., and Pink, S.: Small forwarding tables for fast routing lookups. ACM SIGCOMM, 1997 16 Srinivasan, V., and Varghese, G.: Fast address lookups using controlled prex expansion, ACM Trans. Comput. Syst., 1999, 17, (1), pp. 140 17 Nilsson, S., and Karlsson, G.: IP-address lookup using LC-tries, IEEE J. Sel. Areas Commun., 1999, 17, (6), pp. 10831092 18 Ravikumar, V.C., Mahapatra, R., and Liu, J.C.: Modied LC-trie based efcient routing lookup. IEEE Int. Symp. on Modeling, Analysis, & Simulation of Computer & Telecommunications Systems, 2002 19 Pao, D., and Li, Y.K.: Enabling incremental updates to LC-tries for efcient management of IP forwarding tables, IEEE Commun. Lett., 2003, 7, (5), pp. 245247 20 Lampson, B., Srinivasan, V., and Varghese, G.: IP lookups using multiway and multicolumn search, IEEE/ACM Trans. Netw., 1999, 7, (3), pp. 324334 21 Waldvogel, M., Varghese, G., Turner, J., and Plattner, B.: Scalable high speed IP routing lookups. ACM SIGCOMM, 1997

918

IEE Proc.-Commun., Vol. 153, No. 6, December 2006

Das könnte Ihnen auch gefallen