Sie sind auf Seite 1von 36

CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications

Sangyeun Cho, J. R. Martin, R. Xu, M. H. Hammoud and R. Melhem


Dept. of Computer Science

University of Pittsburgh

Search ops in applications

Search (or lookup) operations represent an important common function Network packet processing
For each arriving packet, determine the output port Given packet information, find a matching classification rule Each look up can incur many memory accesses

Speech recognition
Searching (e.g., dictionary lookup) takes up ~24% of CPU cycles

Forthcoming RMS (Recognition, Mining, and Synthesis) apps

ISPASS 2007

Search performance and power

Search performance must match increasing line speeds


For OC-768, up to 104M packets must be processed per second Network traffic has doubled every year [McKeown03] Routing tables (~200K prefixes in a core router) are growing [RIS] IPv6

Power and thermal issue already a critical limiting factor in network processing device design [McKeown03] Search in battery-operated devices should be energyefficient Conventional search solutions
Software methods (tries, hash table, ) Hardware methods (CAM, TCAM, )

ISPASS 2007

IP lookup using a trie


Consider an IP address: 0 1 0 0 0 1 10

Software approach is flexible high memory capacity requirement high memory bandwidth requirement not SCALABLE

ISPASS 2007

IP lookup using TCAM


Consider an IP address: 0 1 0 0 0 1 10
110100* 110101* choose the first 110111* among the matched high bandwidth, constant time 01000* lookup 01100* sort before TCAMs are storing relatively small,01101* 11011* expensive 0100* 0110* power consumption very high 1101* 10* not SCALABLE 0*

ISPASS 2007

CA-RAM a hybrid approach

Can we do better than the existing conventional schemes?


CAM-like search performance RAM-like cost and power

CA-RAM combines hashing w/ hardware parallel matching CA-RAM design goals


High lookup performance Low power consumption Smaller chip area per stored datum Straightforward system-level integration

ISPASS 2007

Talk roadmap

What is CA-RAM? Prototype design Case study 1: IP lookup Case study 2: Trigram lookup for speech recognition

ISPASS 2007

CA-RAM Content Addressable RAM


Conventional CAM/TCAM CA-RAM Memory cells

Match logic

Separate match logic and memory


Match logic for a single row, not every row Allows the use of dense RAM technology Enables highly reconfigurable match logic Keep keys sorted in each row, not in entire array

ISPASS 2007

Very simple, yet efficient


Use hashing to store keys in a particular row To look up, hash the search key and retrieve one row Perform matching on entire row in parallel Achieve full content addressability w/o paying overhead!
search key
Index generato r Keyj1 Keyj2

Keyi1

Keyi2

Match processor1Match processor2


ISPASS 2007

Pipelined CA-RAM operation


Search key Keyi1 Keyi2 Keyi3

Index generator Index

Keyj1

Keyj2

Keyj3

Keyj1 Search key

Keyj2

Keyj3

Match processor1Match processor2Match processor3

Result

Key Index generation access matching forwarding Memory Result

Step 1 Step 2 Step 3 Step 4


ISPASS 2007

Dealing w/ bucket overflows

Careful design of hash function

Increase bucket size


Reduce load factor (); = # of occupied entries / # of total

entries

Use chaining; store overflows in subsequent rows


Multiple accesses per lookup

Use a small overflow CAM, accessed in parallel


Similar to popular victim caching

Use two-level hashing and employ multiple CA-RAM ISPASS banks 2007

CA-RAM reconfig. opportunities


Reconfigurable match logic allows:

Adapting key size to apps


Same hardware to support multiple apps or standards

ISPASS 2007

Adapting key size

Key Keyi1 i1

Keyi2

Keyi2 i3 Key

Adapting key size is straightforward Key Key Key Key Key


j1 j1 j2 j2 j3

Will benefit supporting multiple apps/ standards


Reconfigurable match logic

Select key bits for matching

Match information

ISPASS 2007

CA-RAM reconfig. opportunities


Reconfigurable match logic allows:

Adapting key size to apps


Same hardware to support multiple apps or standards

Binary and ternary matching


Some apps require ternary matching, some dont

ISPASS 2007

Supporting binary/ternary matching


Keyi1 Maski1 Keyi2

K0 xor

K1

K2
j1

KN-1

Ki

M i TM i
j1 j2

Developed configurable Key comparator Mask Key ...


2:1 MATCH T-matching requires 2 bits / 1 symbol
N-1

Ternary

Search key

Supporting different types of MATCH matching MATCH Reconfigurable match logic in different bit positions feasible
i

Consider mask bits or not

Match information

ISPASS 2007

CA-RAM reconfig. opportunities


Reconfigurable match logic allows:

Adapting key size to apps


Same hardware to support multiple apps or standards

Binary and ternary matching


Some apps require ternary matching, some dont

Storing data and keys in a CA-RAM module


Cuts # of memory accesses for a lookup by half

ISPASS 2007

Simult. key matching & data access


Keyi1 Datai2 Key i1

Data access follows TCAM lookup CA-RAM supports data embedding Cuts memory traffic & latency by half
Search key Reconfigurable match logic Keyj1 Dataj2 Key j1

Match key & bypass data

Match information Match result & Data

ISPASS 2007

CA-RAM reconfig. opportunities


Reconfigurable match logic allows:

Adapting key size to apps


Same hardware to support multiple apps or standards

Binary and ternary matching


Some apps require ternary matching, some dont

Storing data and keys in a CA-RAM module


Cuts # of memory accesses for IP lookup by half

Providing range checking capabilities


Beneficial for rule-based packet filtering

ISPASS 2007

Supporting range checking


Keyi1 Rangei1

(Range checking causes troubles)


Range Key (Entries must be expanded)
j1 j1

CA-RAM can upport range checking efficiently

Match key & check range

Search key

Reconfigurable match logic

Match information

ISPASS 2007

CA-RAM-based memory subsystem

Request queue Request

Input Controller

CA-RAM slice

CA-RAM slice

...

CA-RAM slice

Result queue Result

Config

...
CA-RAM slice

...
CA-RAM slice

... ...
CA-RAM slice

Output Controller

ISPASS 2007

Prototype implementation

We implemented a prototype CA-RAM slice design (w/ a degree of reconfigurability) and evaluated its power and area advantages over state-of-the-art TCAMs We used a standard cell (0.16m) based ASIC design flow Step
Expand search key Calculate match vector Decode match vector Extract result Total

# cells
3,804 5,252 899 6,037 15,992

Area, m2 Delay, ns
66,228 10,591 1,970 21,775 100,564 (0.89) 0.95 1.91 1.99 4.85

ISPASS 2007

Area and power: CA-RAM vs. TCAM


Cell area (m2) @130nm CMOS
10 9 8 7 6 5 4 3 2 1 0

Per Cell Area (um2) @130nm

11x 4.5x

16T SRAM-based TCAM 8

CA-RAM 8T DRAM-based 6T DRAM-based DRAM-based ternary area advantage 4.5x~11x CA-RAM power advantage 4x~14x
TCAM TCAM CA-RAM

4.5Mb Power (W) @143MHz

7 6 5

Power (W) 4 4.5Mb @143MHz 3


2 1 0 16T SRAM-based TCAM 8T DRAM-based TCAM 6T DRAM-based TCAM

14x 4x

DRAM-based ternary CA-RAM

ISPASS 2007

Performance: CA-RAM vs. (T)CAM

ISPASS 2007

Case study 1: IP lookup

Problem description

Given
A set of prefixes (each prefix is associated with output port

number) IP address

Find a prefix that matches with input IP address and return output port number associated with it
In the presence of multiple matching prefixes, choose the

longest

Procedure
Find a good hash function to distribute prefixes Determine CA-RAM organization

ISPASS 2007

Data set and hashing method


IP core routers table having 186,760 entries Bit selection scheme


[Zane et al. 03]

98% of prefixes are at least 16 bits long Select hash bits from the first 16 bits (low-order bits)

ISPASS 2007

Shaping CA-RAM
Consider multiple design points:
Design A Design B Design C
( = 0.47) ( = 0.40) ( = 0.36) 2,048 rows (32 entries) 4,096 rows (64 entries)

Design D Design E

( = 0.36)

Design F
( = 0.24)

( = 0.36)

ISPASS 2007

Performance
40% 30%

Spilled entries 20%


10% 0%

WithDesign B Design C Design D a properly chosen , Design A


( = 0.47) ( = 0.40) ( = 0.36) ( = 0.36)

Design E
( = 0.24)

Design F
( = 0.36)

2.5 2

CA-RAM achieves near-constant AMAL


Skewed traffic

Uniform traffic

Average memory1.5 access latency


0.5

0 Design A
ISPASS 2007

Design B

Design C

Design D

Design E

Design F

Area and power


1.2 1 0.8

TCAM

Design B

TCAM

Relative area or power 0.6


0.4 0.2 0

CA-RAM CA-RAM

Area

Power

CA-RAM advantageous over TCAM

ISPASS 2007

Case study 2: Trigram lookup in speech recognition

Problem, data set, and hashing

Problem
Look up a trigram in the trigram database

Data set
A subset of the Sphinx trigram database We picked up entries having 13~16 characters Still 5,385,231 entries or 86MB

Hashing
DJB, an efficient string hash function (Used in Sphinx)

ISPASS 2007

Result

ISPASS 2007

Data distribution

ISPASS 2007

Area comparison
Relative area
1.2

0.8

0.6

0.4

0.2

CAM

CA-RAM

ISPASS 2007

CA-RAM conclusions

Compared w/ software methods


Less # of memory accesses; higher lookup performance

Compared w/ CAM or TCAM


Higher density matching that of DRAM large lookup table Competitive performance Low power a critical advantage for cost-effective system design Reconfigurable
Can accommodate apps having different key/record sizes, binary vs. ternary searching requirements, range checking, Can adopt new standards much more easily, e.g., IPv6

Two case studies show the efficacy of the CA-RAM approach


3~5 improvement in area and power, compared with CAM/TCAM

ISPASS 2007

CA-RAM: A High-Performance Memory Substrate for Search-Intensive Applications


Questions?

Das könnte Ihnen auch gefallen