Sie sind auf Seite 1von 10

2142 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO.

6, JUNE 2016

EMDBAM: A Low-Power Dual Bit Associative


Memory With Match Error and Mask Control
Sandeep Mishra, Member, IEEE, and Anup Dandapat, Senior Member, IEEE

Abstract— A ternary content addressable memory (TCAM) Both architectures take significantly a large design area, as
speeds up the search process in the memory by searching through one storage cell has been designed for storing each bit of data
prestored contents rather than addresses. The additional don’t along with matching a circuit and an address encoder at the
care (X) state makes the TCAM suitable for many network
applications but the large amount of cell requirement for storage final stage [6]–[8]. Thus, efficient low-power techniques must
consumes high power and takes a large design area. This paper be employed in designing a CAM.
presents a novel architecture of TCAM, which prestores 2 bits External Bloom filter (BF) has been used to avoid
of data in an up–down manner and provides multiple masking pseudohit and miss events without modifying the CAM archi-
operations through a single control multimasking circuit. The tecture [9]. However, this technique suffers from frequency
proposed dual bit associative memory with match error and
mask control (EMDBAM) consumes low power and selects the mismatch between CAM and BF. In [10], two metal rails
valid value on matchline through match error controller. The VDDML and VDD have been introduced to power up the data
proposed design has been implemented using a standard 45-nm storage and mask storage cell. A feedback loop automati-
CMOS technology, and the extracted layout has been simulated cally turns the matchline (ML) current OFF. This technique
using SPECTRE with the supply voltage at 1 V. The proposed results in reduced average power consumption as the output
EMDBAM can reduce the cell area by 39% compared with a
basic TCAM design with a reduction of 9.6% in the energy- has been obtained after power-gated transistor turned OFF,
delay product. but the leakage power consumption is significant. In [11],
Index Terms— Content addressable memory (CAM), dual bit pipelined MLs have been used to activate a small segment
associative memory (DBAM), energy-efficient memory designs, of the ML. Zhang et al. [12] have recycled current of a
low-power design, ternary CAM (TCAM). voltage detector to charge-up the ML for reducing energy.
In [13], an overlapped search method has been presented
I. I NTRODUCTION
for BCAM bettering the pipelined structure for low energy

D ATA retrieval is done by addressing desired memory


location in most memory devices such as RAM.
A content addressable memory (CAM), on the other hand,
dissipation.
Lin et al. [14] have reduced the number of comparisons by
storing the input data separately in data and parameter mem-
compares the search bit stream simultaneously with all pre- ory. An initial ones-count parameter extraction technique has
stored data. As this search process requires only a single filtered out the unmatched data to reduce the number of com-
clock cycle, CAM is used for ultraspeed memory access. This parison in the second stage. An SRAM-based TCAM design
makes CAM suitable for many network applications, such has been presented in [15], where the search operation goes
as asynchronous transfer mode switching, network intrusion through SRAMs rather than the conventional TCAMs. This
detection systems, fast lookup of network routing, and image increases bit density but consumes substantial power through
processing applications, such as image compression, hough leaky SRAMs. A two-phase prediction and correction search
transformation, and pattern recognition [1]–[5]. sensing technique has been proposed for power reduction
Binary CAM (BCAM), the simplest CAM uses only search at high operating frequency [16]. The designs presented
word containing 0 or 1. A ternary CAM (TCAM), on the in [10]–[16] have suffered from high leakage power con-
other hand, uses don’t care (X) sometimes called wild bit sumption through extra cells. The leakage power consumption
along with 0 and 1. TCAM is a time efficient search device, increases in an exponential basis particularly below 0.1-μm
which clearly outclasses all valid search algorithms in the technology [17] and is an important factor in low-power
longest prefix matching and packet classifications. Though designs [18]. This issue has not been resolved in all these
the searching is parallel, a comparison with all prestored architectures. Both CAM and mask storage cells lead to
data makes BCAM and TCAM more power hungry. leakage due to the use of cross-coupled inverters. Reduction
Manuscript received July 4, 2015; revised October 13, 2015; accepted in the number of a CAM storage cell has been achieved in
November 17, 2015. Date of publication December 9, 2015; date of current the proposed architecture with a unique masking approach
version May 20, 2016. This work was supported by the Ministry of Human to reduce an additional leakage primarily due to the local
Resource Development, Government of India.
The authors are with the Department of Electronics and Communication masking requirement.
Engineering, National Institute of Technology Meghalaya, Shillong 793003,
India (e-mail: ssandeep.mmishra@nitm.ac.in; anup.dandapat@nitm.ac.in). A. CAM Designs Based on Segmented Architecture
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. Some designs based on segmented architecture have been
Digital Object Identifier 10.1109/TVLSI.2015.2503005 proposed in [19]–[28]. In [20] and [21], low-power designs
1063-8210 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
MISHRA AND DANDAPAT: LOW-POWER DBAM WITH MATCH ERROR AND MASK CONTROL 2143

Fig. 1. (a) Simplified functional block of the conventional TCAM. (b) High-level architecture of the proposed EMDBAM.

have been proposed based on the reduction of power A state-of-the-art TCAM architecture has been presented
consumption due to high capacitive searchlines (SLs) in this paper that features the dual bit architecture of CAM,
and MLs. In the first technique, data bits have been separated single control multimasking, and match error control. The
and stored in NAND and NOR blocks [20]. The second design dual bit associative memory (DBAM) with match error and
uses a pie-sigma ML, where NAND and NOR cells have mask control (EMDBAM) suits all the network and image
been realized by pie and sigma segment, respectively [21]. processing applications with extremely low-power require-
To avoid short-circuit current, an interfacing logic has been ments. Architectures with a basic TCAM structure require
used between pie and sigma segments. W × B number of CAM storage cells for W words of B bit
Chang et al. [22] have segmented TCAM cells based on length [30]. However, the proposed EMDBAM requires only
the mask bit values. The mask bits with only 1 value have 2 × B number of CAM storage cells. The MLs of upper and
been separated from those having only 0 values. Except the lower CAMs need to be encoded that takes W number of extra
boundary cells, all other cells in different segments have blocks, so the overall EMDBAM require 2 × B + W cells
been self-gated. Ruan et al. [23] have partitioned the input compared with W × B cells for a basic TCAM [30]–[32].
bit stream into several groups; among these, the output The rest of this paper is organized as follows. Section II
has been derived with the use of a block XOR approach. describes the DBAM with a single control multimasking
A significant reduction in power consumption can be achieved, circuit (SCMMC), which provides the way of avoiding an indi-
but it completely depends on the matching probability. The vidual cell masking task. In Section III, we introduce the match
precomputation circuitry takes a large area of design too. error controller (MEC) that ensures valid values on the MLs.
In [24], a logic-in-memory structure has been implemented Section IV presents the ML selection, modified charge-shared
to reduce the memory access time. An magnetic tunnel junc- ML sense amplifier, and address encoder system. Section V
tion/CMOS hybrid structure has been integrated with it for presents the measurement results. Finally, the conclusion is
reducing leakage power consumption. drawn in Section VI.
The TCAM word MLs have been separated into four
segments in [25]. The first segment has been precharged, and II. D UAL B IT A SSOCIATIVE M EMORY W ITH S INGLE
the rest have been charge-shared. Here, a mismatch in one C ONTROL M ULTIMASKING C IRCUIT
segment does not drain the ML charge in the other segments. All the nonsegmented architectures [10]–[16] have suffered
By using the dynamic power source technique, a mask data from high leakage power consumption. The segmented archi-
value has been used to destroy the prestored data [26]. In [27], tectures [19]–[28] have resolved this issue, but the storage
scalability of TCAM has been improved with the exclusion cell count remains the same. The conventional fully parallel
of priority encoder. Using these architectures, leakage power TCAM presented in Fig. 1(a) consists of a data storage cell,
consumption can be reduced significantly. In [29], single-bit a mask storage cell, and an evaluation logic that increases the
CAM cells have been arranged in the up–down approach physical size and interconnection wires [data wordline (WL)
(one with a stored value of 1 and the other with 0). As search and mask WL].
bit contains only 0s or 1s, either upper or lower cell provides a The main motivation behind the dual bit structure is to
match condition. A priority detector at the final stage ensures compare input search data (tag) with compressed storage
correct match output when a perfect match does not occur in data. For this purpose, instead of implementing all W word
both CAM cells. However, this does not give the functionality storage cells, only two words of search data bit length B
of a TCAM, which is an essential requirement in many have been designed with a common mask storage for
network applications. each bit. The DBAM has been designed using two CAMs
2144 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016

Fig. 2. (a) Conventional NAND-type TCAM. (b) DBAM comprising 10T upper and lower CAMs with mask storage. (c) Comparison circuit. (d) SCMMC.

(upper CAM and lower CAM) placed in an up–down TABLE I


approach, as shown in Fig. 1(b). The upper and lower CAMs SCMMC F UNCTION TABLE (C1 AND C0 : M ASK C ONTROL B ITS ;
have been prestored with alternate logic values (0 or 1) through M AND M̄ : M ASK VALUES ; XL 1 AND XL 2 : SCMMC O UTPUTS )

(DL1 or DL1 ) and (DL2 or DL2 ). Separated SL and SL


from datalines have been provided to both CAMs. A mask
storage cell has been placed for each DBAM, and the mask
value of it (M or M̄) has been used for comparison with the
match outputs of both upper and lower CAMs (FL1 and FL2 ).
A common WL has been applied to both CAM cells as well
as mask storage cell.
The search values provided to the DBAM are either 0 or 1
CAM cell outputs with separate mask values. Each DBAM
at no mask condition, so it matches with one of the stored
slot comprises one SCMMC, which takes a mask value of the
CAM values. The match function is similar to a conventional
common mask storage cell. The outputs XL1 and XL2 have
NAND -type TCAM shown in Fig. 2(a). For local masking,
been used in upper and lower comparison circuits, respec-
the value of a mask storage cell is set to 1, which provides
tively. The NOR-based comparator produces unprocessed ML
a wild match to all search values. In case of global masking,
(UML1 and UML2 ) outputs, and those have been given
SL and SL values are made equal. The SCMMC has been
to MEC. The MEC passes valid values on MLs through a mis-
presented to avoid the large requirement of mask storage
match blocking circuit (MBC) and priority selector (PS) based
cells. The SCMMC requires only one mask storage cell for
on the error checking circuit (ECR) output (ER). An MBC
the whole array of CAM blocks, as shown in Fig. 2(b).
prevents the searching operation in the subsequent blocks if a
The SCMMC outputs (XL1 and XL2 ) and CAM MLs
mismatch occurs in both CAM cells [dual mismatch (DMM)].
(FL1 and FL2 ) have been provided as inputs to the comparison
The PS passes preset values to the MLs based on comparison
circuit, as shown in Fig. 2(c). The masking of both upper and
circuit outputs.
lower CAMs has been controlled by an SCMMC, as shown
in Fig. 2(d). One or both CAM cells can be selected for
masking by using mask control bits (C1 and C0 ). The mask III. M ATCH E RROR C ONTROLLER
values (M and M̄) of the mask storage cell have been provided The search values match either with upper or lower CAM
as inputs to the SCMMC of each slot. An 8T SRAM cell cell values in most search conditions, but there are the possi-
has been used for the mask storage to avoid the issue of data bilities of DMM, dual match (DM), and reverse match (RM)
storage destruction during the read operation. Decoupled write with the storage values. In these conditions, valid values
and read word signals (WL and read wordline) have been used must be passed to the MLs. For this purpose, we introduce
to separate the data retention element from an output element. the MEC as shown in Fig. 3(a) that comprises ECR, MBC,
The function of the SCMMC is described in Table I. and PS. The ECR gives a match error when there is a change
Mask control bits (C1 and C0 ) with value 1 results in a in the preset values. Table II describes the ECR function
wild match (local masking) in a DBAM cell. In no mask where the output (ER) is determined based on the controlling
condition, C1 and C0 have value 0. Upper and lower CAM parameters. The controlling parameters here are the search
cells can be masked separately by using dissimilar logic values (SL and SL), mask control bits (C1 and C0 ), and
values of mask control bits. The SCMMC provides an added unprocessed ML outputs of upper and lower comparison
advantage of using a single mask storage for comparison of all circuits (UML1 and UML2 ).
MISHRA AND DANDAPAT: LOW-POWER DBAM WITH MATCH ERROR AND MASK CONTROL 2145

Fig. 3. (a) MEC includes three fundamental blocks: error checking circuit (ECR), MBC, and PS. U1 and U2 are unprocessed MLs of the previous slot.
(b) Two stage MLS with modified charge-shared ML sense amplifier.

TABLE II
E RROR C HECKING C IRCUIT (ECR) F UNCTION TABLE (C1 AND C0 : M ASK C ONTROL B ITS ; SL AND SL: S EARCHLINE ;
UML 1 AND UML 2 : U NPROCESSED MLs; ER: O UTPUT OF E RROR C HECKING C IRCUIT )

Both instantaneous errors EI1 and EI2 pass logic 0 when


a local masking state occurs in the corresponding CAM cell
(EI1 = 0 when C0 = 1 and EI2 = 0 when C1 = 1).
Therefore, a DM at this condition is not considered to be an
error. For the other values of mask control bits, EI1 and EI2
depend on unprocessed MLs. ECR gives an error output 1 for
DMM, DM, and RM conditions. Unprocessed ML outputs of
the previous slot (U1 and U2 ) have been fed to the MBC of
the present slot that controls the DMM error. If a mismatch Fig. 4. Address encoder of the proposed EMDBAM (ML11 –ML14 and
occurs in both CAM cells, then MBC disables further MLs. ML21 –ML24 are the MLs of upper and lower CAMs, respectively).
When mismatch occurs in (n − 1)th cell, the MLs of nth cell
and subsequent cells carry a mismatch signal VDD through on DMI switches on the MBC and an 1 turns ON the PS.
transistors MU and M L ; otherwise, UML1 and UML2 have Both MBC and PS have been power gated to VDD and GND
been transmitted to the outputs of MBC (MML1 and MML2 ). based on DMI and ER values for avoiding unnecessary leakage
The MBC has not been provided in the first slot as the when there is no error in the output of ECR. MBC has been
searching operation starts from this slot. power gated through transistors B M1 , B M2 , B M3 , and B B4 and
The PS sets the best value in the ML by using the value of PS through transistors PM1 , PM2 , PM3 , and PM4 , as shown
UML and SL. The outputs of PS (PML1 and PML2 ) have been in Fig. 3(a).
provided to the ML selector (MLS) along with the outputs of
MBC (MML1 and MML2 ). The operation of both MBC and IV. M ATCHLINE S ELECTION , S ENSING ,
PS blocks is controlled by the decision-making input (DMI) AND A DDRESS E NCODER S YSTEM
and ECR output (ER), as shown in Fig. 3(b). MBC and PS The MLS and the ML sense amplifier are shown in
operate, when ECR shows an error at the output. A 0 value Fig. 3(b). In designs that use a basic TCAM structure, the
2146 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016

Fig. 5. Extracted layout of the proposed EMDBAM for a 2-bit SL (α: considering acceptable ML swing; β: considering full ML swing;
A0 – A3 : address output).

TABLE III
P OWER , E NERGY D ISSIPATION , AND ML D ELAY A NALYSIS OF THE P ROPOSED EMDBAM FOR VARIOUS
VALUES OF M ASK C ONTROL B ITS (C11 –C04 : M ASK C ONTROL B ITS )

ML is precharged to VDD at first. When a match occurs V. R ESULTS AND A NALYSIS


in all CAM cells in a row, the ML is charged down to 0. The proposed EMDBAM has been designed using the
However, in the proposed design, the MLs are not precharged standard generic process design kit (GPDK) 45-nm CMOS
rather controlled by the WL. We have used a modified process, and the extracted layout (a 4-bit SL) similar to the
charge-shared ML sensing scheme presented in [8] next to layout presented in Fig. 5 (a 2-bit SL) has been simulated
the MLS. Instead of enabling the ML sources through IBIAS , using SPECTRE. Power consumption and energy dissipation
we have discharged net N1 to GND during write as well as reduction are primary focus in our proposed architecture. Most
precharge phase. The precharge signal (PRE) passes VDD to of the referred architectures accord moderate reduction in
net N2 that discharges ML1 . During the search phase if a power consumption to provide a better energy-delay tradeoff.
match occurs, ML charges to VDD through M3 and inverter. Therefore, we have chosen a low leakage two-side self-gating
In case of a mismatch, ML remains discharged through the (TSSG)-TCAM [22] and basic TCAM [30] for testing the
loop formed by M2 and inverter. efficacy of our proposed architecture.
A two stage selector, as shown in Fig. 3(b), has been used The detailed comparison has been done for varying temper-
for selecting the valid value (UML or PML or MML) to ature from −20 °C to 100 °C and for supply voltage scaling
the ML. The first stage has been controlled by DMI and second from 1.2 to 0.6 V. For a fair comparison, these architectures
by ER. The ML outputs of all DBAM slots (ML11–ML24 ) have have been scaled to a 45-nm CMOS process and analyzed
been given to the address encoder system, as shown in Fig. 4. in the same environment. The motive behind the use of a
All the addresslines ( A0 – A3 ) have been pulled down to 0 45-nm technology is to test the designs under high leakage
initially, and the match condition charges the corresponding power. The nMOS and pMOS cells with standard threshold
addressline to VDD . Thus, the address encoder can address S voltage (0.36 and −0.4 V) have been used. A performance
number of addresses using 2 × log2 S number of MLs. comparison summary of TCAM size, energy consumption, and
MISHRA AND DANDAPAT: LOW-POWER DBAM WITH MATCH ERROR AND MASK CONTROL 2147

Fig. 6. Power consumption analysis of various TCAM architectures for varying temperature from −20 °C to 100 °C and VDD of 1 V. (a) Static power
analysis. (b) Average power analysis. (c) Peak power analysis.

TABLE IV
E NERGY D ISSIPATION , ML D ELAY, AND EDP C OMPARISON OF VARIOUS TCAM A RCHITECTURES
FOR VARYING T EMPERATURE F ROM −20 °C TO 100 °C AND VDD OF 1 V

ML delay with relevant recently proposed architectures has


been presented at the end of this section.

A. Power Consumption Analysis


The proposed architecture consumes less power compared
with the referred TCAM designs [22], [30], as shown in Fig. 6.
The static power analysis is shown in Fig. 6(a). Primary blocks
(MBC and PS) in the MEC have been power gated to VDD and
GND to reduce the static power, as discussed in Section III.
Due to the dual bit structure, a reduction of 70.8% in the static
power consumption has been achieved in the proposed design
from a basic TCAM [30].
The average power consumption analysis is shown
in Fig. 6(b), where the proposed architecture clearly outclasses
all other designs. Many design decisions are determined by Fig. 7. Power consumption analysis of the proposed SCMMC for various
sequences of mask control bits, as shown in Table III. (a) Static power analysis.
the peak power, which is 72% less on average compared with (b) Average power analysis.
referred designs and is stable, as shown in Fig. 6(c).
In Table III, different values of mask control bits (C11 –C04 ) average power consumption reduces at a higher rate, as shown
have been set for testing the effectiveness of the SCMMC cir- in Fig. 7(b).
cuit. A small variation of 2.52% in static power consumption
has been found for different values of mask control bits that B. Energy Dissipation and Delay Analysis
ensures the stability of the proposed architecture, as shown Table IV presents the energy dissipation, ML delay, and
in Fig. 7(a). At higher level of local masking (a large number energy-delay product (EDP) of different TCAM architectures.
of TCAM slots have been masked as in sequences 2, 5, and 7), The energy dissipation of the proposed architecture has been
2148 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016

Fig. 8. (a) Energy dissipation analysis of various TCAM architectures for varying temperature from −20 °C to 100 °C and VDD of 1 V. (b) Energy, delay,
and EDP analysis of TCAM architectures at 27 °C and VDD of 1 V. (c) EDP comparison with and without using SCMMC.

TABLE V
L OW-V OLTAGE A NALYSIS OF THE P ROPOSED EMDBAM AT VARIOUS P ROCESS C ORNERS (TT: T YPICAL C ORNER ; FS: FAST nMOS
AND S LOW pMOS; SF: S LOW nMOS AND FAST pMOS; SS: S LOW C ORNER ; FF: FAST C ORNER )

found to be the least among all compared TCAMs, as shown TABLE VI


in Fig. 8(a) and (b). The energy/bit/search in EMDBAM P ERFORMANCE C OMPARISON OF THE P ROPOSED
EMDBAM FOR VARIOUS M ACROCAPACITY
is 23% less compared with a basic TCAM [30] and 21% less
compared with a TSSG-TCAM [22]. Due to the presence
of several feedbacks in the MEC and wake-up time arising
from the power-gating transistors (B M1 –B B4 and PM1 –PM4 ),
EMDBAM exhibits an additional delay, but the function of
MEC gives several advantages over the basic TCAM in many
network and compression applications.
EDP is an important design metric particularly in low-
power designs. As presented in Table IV and Fig. 8(b), the
proposed architecture provides the best EDP among com-
pared architectures with a reduction of 23.8% from the basic
TCAM [30]. Fig. 8(c) presents the energy dissipation compar-
ison of EMDBAM with and without the use of the SCMMC.
Mask storage cells have been provided to both upper and lower been compared with architectures in [22] and [30] for a supply
CAM cells of each slot for the design without the SCMMC. voltage scaling from 1.2 to 0.6 V at 27 °C.
The average energy/bit/search has been reduced by 51% with The EDP analysis of compared designs at 0.7 V is shown
the introduction of the SCMMC. in Fig. 9(a). A normalized EDP analysis for a supply voltage
scaling from 1.2 to 0.6 V is presented in Fig. 9(b). At higher
supply voltages, TSSG-TCAM [22] performs better but as
C. Low-Voltage Analysis the supply voltage is scaled down the proposed design pro-
The low-voltage operation of the proposed architecture vides a better energy-delay metric compared with the other
is presented in Fig. 9. The EMDBAM is adaptable to a designs. A normalized analysis of various power consump-
supply voltage scaling of 0.6 V without any performance tions, ML delay, and energy dissipation of the proposed
degradation. To demonstrate it, the proposed architecture has EMDBAM is presented in Fig. 9(c). The design performs
MISHRA AND DANDAPAT: LOW-POWER DBAM WITH MATCH ERROR AND MASK CONTROL 2149

Fig. 9. Low-voltage analysis of TCAM architectures at 27 °C. (a) EDP comparison at 0.7 V. (b) EDP analysis for supply voltage scaling from 1.2 to 0.6 V.
(c) Normalized low-voltage analysis of the proposed EMDBAM.

TABLE VII
C OMPARISON S UMMARY OF VARIOUS TCAM A RCHITECTURES

better at a supply voltage of 1 and 1.1 V providing the


best EDP, while the longer delay results in an average
increment of 75% at 0.6 V.

D. Process Variation Analysis


The proposed architecture has been tested at various process
corners for three different supply voltages (1, 0.6, and 0.5 V),
as presented in Table V. The design functions for a supply
voltage of 0.5 V at SF and FF corners with an EDP of 2533.9
and 1229.8 fJ × ps, respectively. The supply voltage of 0.6 V
is set to be the lower bound as the average EDP increment
is 74% in case of 0.5 V. The ML voltage variation for varying
temperature from −20 °C to 100 °C and VDD of 1 V at
various process corners is shown in Fig. 10. The graph shows
the N2 voltage variation for a match case. An ML delay Fig. 10. ML delay analysis for process–temperature variation.
variation of only 282 ps has been found in the given range.
as the bit size increases that provides a fair normalized EDP
E. Performance Comparison Summary and ensures the cascading capability for forming a large
The performance of our proposed design has been compared TCAM array.
for various TCAM macros, as presented in Table VI. Moderate The performance comparison summary of the energy-delay
increment in the power consumption has been found as the bit metric and the design complexity is presented in Table VII.
capacity increases, but the energy dissipation metric remains The proposed design performs better than almost all referred
almost unchanged for all TCAM sizes. There is a small architectures. Designs presented in [10] and [16] dissipate
average increment of 10.8% in the delay has been recorded lesser energy compared with the proposed design, but the
2150 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016

TABLE VIII [3] I. Hayashi et al., “A 250-MHz 18-Mb full ternary CAM with low-
P ERFORMANCE C OMPARISON S UMMARY OF VARIOUS TCAM voltage matchline sensing scheme in 65-nm CMOS,” IEEE J. Solid-State
A RCHITECTURES FOR A 4-bit SL AT 27 °C AND Circuits, vol. 48, no. 11, pp. 2671–2680, Nov. 2013.
S UPPLY V OLTAGE OF 1 V [4] C.-C. Wang, C.-H. Hsu, C.-C. Huang, and J.-H. Wu, “A self-disabled
sensing technique for content-addressable memories,” IEEE Trans.
Circuits Syst. II, Exp. Briefs, vol. 57, no. 1, pp. 31–35, Jan. 2010.
[5] H.-Y. Li, C.-C. Chen, J.-S. Wang, and C. Yeh, “An AND-type match-
line scheme for high-performance energy-efficient content addressable
memories,” IEEE J. Solid-State Circuits, vol. 41, no. 5, pp. 1108–1119,
May 2006.
[6] D. Pao, P. Zhou, B. Liu, and X. Zhang, “Enhanced prefix inclusion
coding filter-encoding algorithm for packet classification with ternary
content addressable memory,” IET Comput. Digit. Techn., vol. 1, no. 5,
pp. 572–580, Sep. 2007.
[7] Y.-D. Kim, H.-S. Ahn, S. Kim, and D.-K. Jeong, “A high-speed range-
matching TCAM for storage-efficient packet classification,” IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 56, no. 6, pp. 1221–1230, Jun. 2009.
[8] N. Mohan, W. Fung, D. Wright, and M. Sachdev, “A low-power ternary
CAM with positive-feedback match-line sense amplifiers,” IEEE Trans.
TABLE IX Circuits Syst. I, Reg. Papers, vol. 56, no. 3, pp. 566–573, Mar. 2009.
[9] S. Pontarelli and M. Ottavi, “Error detection and correction in content
P REDICTIVE M ODELS OF G ENERIC P ROCESS D ESIGN K IT (GPDK)
addressable memories by using Bloom filters,” IEEE Trans. Comput.,
vol. 62, no. 6, pp. 1111–1126, Jun. 2013.
[10] A.-T. Do, S. Chen, Z.-H. Kong, and K. S. Yeo, “A high speed low power
CAM with a parity bit and power-gated ML sensing,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 21, no. 1, pp. 151–156, Jan. 2013.
[11] K. Pagiamtzis and A. Sheikholeslami, “A low-power content-addressable
memory (CAM) using pipelined hierarchical search scheme,” IEEE
J. Solid-State Circuits, vol. 39, no. 9, pp. 1512–1519, Sep. 2004.
use of two different power rails with a gated-power transistor [12] J.-W. Zhang, Y.-Z. Ye, and B.-D. Liu, “A current-recycling technique
in [10] and high-k metal gate in [16] increase the design for shadow-match-line sensing in content-addressable memories,” IEEE
complexity. Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 6, pp. 677–682,
Jun. 2008.
A summary of various power consumptions, energy dissi- [13] N. Onizawa, S. Matsunaga, V. C. Gaudet, W. J. Gross, and T. Hanyu,
pation, and ML delay analysis with compared architectures is “High-throughput low-energy self-timed CAM based on reordered over-
presented in Table VIII. The proposed design consumes least lapped search mechanism,” IEEE Trans. Circuits Syst. I, Reg. Papers,
vol. 61, no. 3, pp. 865–876, Mar. 2014.
power among the compared designs with an average reduction [14] C.-S. Lin, J.-C. Chang, and B.-D. Liu, “A low-power precomputation-
of 7.37% in the EDP. based fully parallel content-addressable memory,” IEEE J. Solid-State
Circuits, vol. 38, no. 4, pp. 654–662, Apr. 2003.
[15] Z. Ullah, K. Ilgon, and S. Baeg, “Hybrid partitioned SRAM-based
VI. C ONCLUSION ternary content addressable memory,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 59, no. 12, pp. 2969–2979, Dec. 2012.
A novel architecture of TCAM has been presented that con- [16] I. Arsovski, T. Hebig, D. Dobson, and R. Wistort, “A 32 nm
sumes the least power among the compared architectures. The 0.58-fJ/bit/search 1-GHz ternary content addressable memory compiler
dual bit structure that occupies small design area is suitable for using silicon-aware early-predict late-correct sensing with embedded
deep-trench capacitor noise mitigation,” IEEE J. Solid-State Circuits,
a high density design with no performance degradation. The vol. 48, no. 4, pp. 932–939, Apr. 2013.
proposed EMDBAM can be used in low-power applications [17] A. Wiltgen, Jr., K. A. Escobar, A. I. Reis, and R. P. Ribas, “Power
where more control over the match error is required. The consumption analysis in static CMOS gates,” in Proc. 26th Symp. Integr.
SCMMC reduced the individual cell masking task by using Circuits Syst. Design (SBCCI), Sep. 2013, pp. 1–6.
[18] N. S. Kim et al., “Leakage current: Moore’s law meets static power,”
mask control bits. Match errors have been controlled by Computer, vol. 36, no. 12, pp. 68–75, Dec. 2003.
MEC to pass valid values on the MLs. The proposed design [19] A. T. Do, C. Yin, K. Velayudhan, Z. C. Lee, K. S. Yeo, and
dissipates 0.84 fJ/bit/search with a 0.75-ns delay at 1 V. Results T. T.-H. Kim, “0.77 fJ/bit/search content addressable memory using
small match line swing and automated background checking scheme
conclude that the proposed EMDBAM is adaptable to a supply for variation tolerance,” IEEE J. Solid-State Circuits, vol. 49, no. 7,
voltage scaling of 0.6 V while providing a reduction of 9.68% pp. 1487–1498, Jul. 2014.
in the EDP from the basic TCAM architecture. [20] B.-D. Yang, Y.-K. Lee, S.-W. Sung, J.-J. Min, J.-M. Oh, and
H.-J. Kang, “A low power content addressable memory using low swing
search lines,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 12,
A PPENDIX pp. 2849–2858, Dec. 2011.
[21] S.-H. Yang, Y.-J. Huang, and J.-F. Li, “A low-power ternary content
The predictive models of GPDK have been shown addressable memory with Pai-Sigma matchlines,” IEEE Trans. Very
in Table IX. Large Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 1909–1913,
Oct. 2012.
[22] Y.-J. Chang, K.-L. Tsai, and H.-J. Tsai, “Low leakage TCAM for IP
R EFERENCES lookup using two-side self-gating,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 60, no. 6, pp. 1478–1486, Jun. 2013.
[1] Z. Cai, Z. Wang, K. Zheng, and J. Cao, “A distributed TCAM coproces- [23] S.-J. Ruan, C.-Y. Wu, and J.-Y. Hsieh, “Low power design of
sor architecture for integrated longest prefix matching, policy filtering, precomputation-based content-addressable memory,” IEEE Trans. Very
and content filtering,” IEEE Trans. Comput., vol. 62, no. 3, pp. 417–427, Large Scale Integr. (VLSI) Syst., vol. 16, no. 3, pp. 331–335, Mar. 2008.
Mar. 2013. [24] H. Jarollahi et al., “A nonvolatile associative memory-based context-
[2] K. Zheng, C. Hu, H. Lu, and B. Liu, “A TCAM-based distributed parallel driven search engine using 90 nm CMOS/MTJ-hybrid logic-in-memory
IP lookup scheme and performance analysis,” IEEE/ACM Trans. Netw., architecture,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 4, no. 4,
vol. 14, no. 4, pp. 863–875, Aug. 2006. pp. 460–474, Dec. 2014.
MISHRA AND DANDAPAT: LOW-POWER DBAM WITH MATCH ERROR AND MASK CONTROL 2151

[25] S. Baeg, “Low-power ternary content-addressable memory design using Sandeep Mishra (M’14) received the B.Tech. and
a segmented match line,” IEEE Trans. Circuits Syst. I, Reg. Papers, M.Tech. degrees in electronics and communication
vol. 55, no. 6, pp. 1485–1494, Jul. 2008. engineering from the Biju Patnaik University of
[26] Y.-J. Chang, “Using the dynamic power source technique to reduce Technology, Rourkela, India, in 2011 and 2013,
TCAM leakage power,” IEEE Trans. Circuits Syst. II, Exp. Briefs, respectively. He is currently pursuing the
vol. 57, no. 11, pp. 888–892, Nov. 2010. Ph.D. degree with the Department of Electronics
[27] M. J. Akhbarizadeh, M. Nourani, and C. D. Cantrell, “Prefix segregation and Communication Engineering, National Institute
scheme for a TCAM-based IP forwarding engine,” IEEE Micro, vol. 25, of Technology Meghalaya, Shillong, India.
no. 4, pp. 48–63, Jul./Aug. 2005. His current research interests include low-power
[28] M. Chae, J.-W. Lee, and S. H. Hong, “Decoupled 4T dynamic CAM memory design, high-speed sense amplifier, and
suitable for high density storage,” Electron. Lett., vol. 47, no. 7, intelligent transportation system.
pp. 434–436, Mar. 2011.
[29] D. Kayal, A. Dandapat, and C. K. Sarkar, “Design of a high performance
memory using a novel architecture of double bit CAM and SRAM,” Int.
J. Electron., vol. 99, no. 12, pp. 1691–1702, Jun. 2012. Anup Dandapat (M’10–SM’15) received the
[30] K. Pagiamtzis and A. Sheikholeslami, “Content-addressable mem- Ph.D. degree in digital VLSI design from Jadavpur
ory (CAM) circuits and architectures: A tutorial and survey,” IEEE University, Kolkata, India, in 2008.
J. Solid-State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006. He is currently an Associate Professor with
[31] B. S. Nataraj, S. Khanna, and V. Srinivasan, “Ternary content address- the Department of Electronics and Communica-
able memory cell,” U.S. Patent 6 154 384, Nov. 28, 2000. tion Engineering, National Institute of Technol-
[32] S. Hanzawa, T. Sakata, K. Kajigaya, R. Takemura, and ogy Meghalaya, Shillong, India. He has authored
T. Kawahara, “A large-scale and low-power CAM architecture over 50 national and international journal papers.
featuring a one-hot-spot block code for IP-address lookup in a network His current research interests include low-power
router,” IEEE J. Solid-State Circuits, vol. 40, no. 4, pp. 853–861, VLSI design, low-power memory design, and
Apr. 2005. low-power digital design.

Das könnte Ihnen auch gefallen