Beruflich Dokumente
Kultur Dokumente
6, JUNE 2016
Abstract— A ternary content addressable memory (TCAM) Both architectures take significantly a large design area, as
speeds up the search process in the memory by searching through one storage cell has been designed for storing each bit of data
prestored contents rather than addresses. The additional don’t along with matching a circuit and an address encoder at the
care (X) state makes the TCAM suitable for many network
applications but the large amount of cell requirement for storage final stage [6]–[8]. Thus, efficient low-power techniques must
consumes high power and takes a large design area. This paper be employed in designing a CAM.
presents a novel architecture of TCAM, which prestores 2 bits External Bloom filter (BF) has been used to avoid
of data in an up–down manner and provides multiple masking pseudohit and miss events without modifying the CAM archi-
operations through a single control multimasking circuit. The tecture [9]. However, this technique suffers from frequency
proposed dual bit associative memory with match error and
mask control (EMDBAM) consumes low power and selects the mismatch between CAM and BF. In [10], two metal rails
valid value on matchline through match error controller. The VDDML and VDD have been introduced to power up the data
proposed design has been implemented using a standard 45-nm storage and mask storage cell. A feedback loop automati-
CMOS technology, and the extracted layout has been simulated cally turns the matchline (ML) current OFF. This technique
using SPECTRE with the supply voltage at 1 V. The proposed results in reduced average power consumption as the output
EMDBAM can reduce the cell area by 39% compared with a
basic TCAM design with a reduction of 9.6% in the energy- has been obtained after power-gated transistor turned OFF,
delay product. but the leakage power consumption is significant. In [11],
Index Terms— Content addressable memory (CAM), dual bit pipelined MLs have been used to activate a small segment
associative memory (DBAM), energy-efficient memory designs, of the ML. Zhang et al. [12] have recycled current of a
low-power design, ternary CAM (TCAM). voltage detector to charge-up the ML for reducing energy.
In [13], an overlapped search method has been presented
I. I NTRODUCTION
for BCAM bettering the pipelined structure for low energy
Fig. 1. (a) Simplified functional block of the conventional TCAM. (b) High-level architecture of the proposed EMDBAM.
have been proposed based on the reduction of power A state-of-the-art TCAM architecture has been presented
consumption due to high capacitive searchlines (SLs) in this paper that features the dual bit architecture of CAM,
and MLs. In the first technique, data bits have been separated single control multimasking, and match error control. The
and stored in NAND and NOR blocks [20]. The second design dual bit associative memory (DBAM) with match error and
uses a pie-sigma ML, where NAND and NOR cells have mask control (EMDBAM) suits all the network and image
been realized by pie and sigma segment, respectively [21]. processing applications with extremely low-power require-
To avoid short-circuit current, an interfacing logic has been ments. Architectures with a basic TCAM structure require
used between pie and sigma segments. W × B number of CAM storage cells for W words of B bit
Chang et al. [22] have segmented TCAM cells based on length [30]. However, the proposed EMDBAM requires only
the mask bit values. The mask bits with only 1 value have 2 × B number of CAM storage cells. The MLs of upper and
been separated from those having only 0 values. Except the lower CAMs need to be encoded that takes W number of extra
boundary cells, all other cells in different segments have blocks, so the overall EMDBAM require 2 × B + W cells
been self-gated. Ruan et al. [23] have partitioned the input compared with W × B cells for a basic TCAM [30]–[32].
bit stream into several groups; among these, the output The rest of this paper is organized as follows. Section II
has been derived with the use of a block XOR approach. describes the DBAM with a single control multimasking
A significant reduction in power consumption can be achieved, circuit (SCMMC), which provides the way of avoiding an indi-
but it completely depends on the matching probability. The vidual cell masking task. In Section III, we introduce the match
precomputation circuitry takes a large area of design too. error controller (MEC) that ensures valid values on the MLs.
In [24], a logic-in-memory structure has been implemented Section IV presents the ML selection, modified charge-shared
to reduce the memory access time. An magnetic tunnel junc- ML sense amplifier, and address encoder system. Section V
tion/CMOS hybrid structure has been integrated with it for presents the measurement results. Finally, the conclusion is
reducing leakage power consumption. drawn in Section VI.
The TCAM word MLs have been separated into four
segments in [25]. The first segment has been precharged, and II. D UAL B IT A SSOCIATIVE M EMORY W ITH S INGLE
the rest have been charge-shared. Here, a mismatch in one C ONTROL M ULTIMASKING C IRCUIT
segment does not drain the ML charge in the other segments. All the nonsegmented architectures [10]–[16] have suffered
By using the dynamic power source technique, a mask data from high leakage power consumption. The segmented archi-
value has been used to destroy the prestored data [26]. In [27], tectures [19]–[28] have resolved this issue, but the storage
scalability of TCAM has been improved with the exclusion cell count remains the same. The conventional fully parallel
of priority encoder. Using these architectures, leakage power TCAM presented in Fig. 1(a) consists of a data storage cell,
consumption can be reduced significantly. In [29], single-bit a mask storage cell, and an evaluation logic that increases the
CAM cells have been arranged in the up–down approach physical size and interconnection wires [data wordline (WL)
(one with a stored value of 1 and the other with 0). As search and mask WL].
bit contains only 0s or 1s, either upper or lower cell provides a The main motivation behind the dual bit structure is to
match condition. A priority detector at the final stage ensures compare input search data (tag) with compressed storage
correct match output when a perfect match does not occur in data. For this purpose, instead of implementing all W word
both CAM cells. However, this does not give the functionality storage cells, only two words of search data bit length B
of a TCAM, which is an essential requirement in many have been designed with a common mask storage for
network applications. each bit. The DBAM has been designed using two CAMs
2144 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016
Fig. 2. (a) Conventional NAND-type TCAM. (b) DBAM comprising 10T upper and lower CAMs with mask storage. (c) Comparison circuit. (d) SCMMC.
Fig. 3. (a) MEC includes three fundamental blocks: error checking circuit (ECR), MBC, and PS. U1 and U2 are unprocessed MLs of the previous slot.
(b) Two stage MLS with modified charge-shared ML sense amplifier.
TABLE II
E RROR C HECKING C IRCUIT (ECR) F UNCTION TABLE (C1 AND C0 : M ASK C ONTROL B ITS ; SL AND SL: S EARCHLINE ;
UML 1 AND UML 2 : U NPROCESSED MLs; ER: O UTPUT OF E RROR C HECKING C IRCUIT )
Fig. 5. Extracted layout of the proposed EMDBAM for a 2-bit SL (α: considering acceptable ML swing; β: considering full ML swing;
A0 – A3 : address output).
TABLE III
P OWER , E NERGY D ISSIPATION , AND ML D ELAY A NALYSIS OF THE P ROPOSED EMDBAM FOR VARIOUS
VALUES OF M ASK C ONTROL B ITS (C11 –C04 : M ASK C ONTROL B ITS )
Fig. 6. Power consumption analysis of various TCAM architectures for varying temperature from −20 °C to 100 °C and VDD of 1 V. (a) Static power
analysis. (b) Average power analysis. (c) Peak power analysis.
TABLE IV
E NERGY D ISSIPATION , ML D ELAY, AND EDP C OMPARISON OF VARIOUS TCAM A RCHITECTURES
FOR VARYING T EMPERATURE F ROM −20 °C TO 100 °C AND VDD OF 1 V
Fig. 8. (a) Energy dissipation analysis of various TCAM architectures for varying temperature from −20 °C to 100 °C and VDD of 1 V. (b) Energy, delay,
and EDP analysis of TCAM architectures at 27 °C and VDD of 1 V. (c) EDP comparison with and without using SCMMC.
TABLE V
L OW-V OLTAGE A NALYSIS OF THE P ROPOSED EMDBAM AT VARIOUS P ROCESS C ORNERS (TT: T YPICAL C ORNER ; FS: FAST nMOS
AND S LOW pMOS; SF: S LOW nMOS AND FAST pMOS; SS: S LOW C ORNER ; FF: FAST C ORNER )
Fig. 9. Low-voltage analysis of TCAM architectures at 27 °C. (a) EDP comparison at 0.7 V. (b) EDP analysis for supply voltage scaling from 1.2 to 0.6 V.
(c) Normalized low-voltage analysis of the proposed EMDBAM.
TABLE VII
C OMPARISON S UMMARY OF VARIOUS TCAM A RCHITECTURES
TABLE VIII [3] I. Hayashi et al., “A 250-MHz 18-Mb full ternary CAM with low-
P ERFORMANCE C OMPARISON S UMMARY OF VARIOUS TCAM voltage matchline sensing scheme in 65-nm CMOS,” IEEE J. Solid-State
A RCHITECTURES FOR A 4-bit SL AT 27 °C AND Circuits, vol. 48, no. 11, pp. 2671–2680, Nov. 2013.
S UPPLY V OLTAGE OF 1 V [4] C.-C. Wang, C.-H. Hsu, C.-C. Huang, and J.-H. Wu, “A self-disabled
sensing technique for content-addressable memories,” IEEE Trans.
Circuits Syst. II, Exp. Briefs, vol. 57, no. 1, pp. 31–35, Jan. 2010.
[5] H.-Y. Li, C.-C. Chen, J.-S. Wang, and C. Yeh, “An AND-type match-
line scheme for high-performance energy-efficient content addressable
memories,” IEEE J. Solid-State Circuits, vol. 41, no. 5, pp. 1108–1119,
May 2006.
[6] D. Pao, P. Zhou, B. Liu, and X. Zhang, “Enhanced prefix inclusion
coding filter-encoding algorithm for packet classification with ternary
content addressable memory,” IET Comput. Digit. Techn., vol. 1, no. 5,
pp. 572–580, Sep. 2007.
[7] Y.-D. Kim, H.-S. Ahn, S. Kim, and D.-K. Jeong, “A high-speed range-
matching TCAM for storage-efficient packet classification,” IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 56, no. 6, pp. 1221–1230, Jun. 2009.
[8] N. Mohan, W. Fung, D. Wright, and M. Sachdev, “A low-power ternary
CAM with positive-feedback match-line sense amplifiers,” IEEE Trans.
TABLE IX Circuits Syst. I, Reg. Papers, vol. 56, no. 3, pp. 566–573, Mar. 2009.
[9] S. Pontarelli and M. Ottavi, “Error detection and correction in content
P REDICTIVE M ODELS OF G ENERIC P ROCESS D ESIGN K IT (GPDK)
addressable memories by using Bloom filters,” IEEE Trans. Comput.,
vol. 62, no. 6, pp. 1111–1126, Jun. 2013.
[10] A.-T. Do, S. Chen, Z.-H. Kong, and K. S. Yeo, “A high speed low power
CAM with a parity bit and power-gated ML sensing,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 21, no. 1, pp. 151–156, Jan. 2013.
[11] K. Pagiamtzis and A. Sheikholeslami, “A low-power content-addressable
memory (CAM) using pipelined hierarchical search scheme,” IEEE
J. Solid-State Circuits, vol. 39, no. 9, pp. 1512–1519, Sep. 2004.
use of two different power rails with a gated-power transistor [12] J.-W. Zhang, Y.-Z. Ye, and B.-D. Liu, “A current-recycling technique
in [10] and high-k metal gate in [16] increase the design for shadow-match-line sensing in content-addressable memories,” IEEE
complexity. Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 6, pp. 677–682,
Jun. 2008.
A summary of various power consumptions, energy dissi- [13] N. Onizawa, S. Matsunaga, V. C. Gaudet, W. J. Gross, and T. Hanyu,
pation, and ML delay analysis with compared architectures is “High-throughput low-energy self-timed CAM based on reordered over-
presented in Table VIII. The proposed design consumes least lapped search mechanism,” IEEE Trans. Circuits Syst. I, Reg. Papers,
vol. 61, no. 3, pp. 865–876, Mar. 2014.
power among the compared designs with an average reduction [14] C.-S. Lin, J.-C. Chang, and B.-D. Liu, “A low-power precomputation-
of 7.37% in the EDP. based fully parallel content-addressable memory,” IEEE J. Solid-State
Circuits, vol. 38, no. 4, pp. 654–662, Apr. 2003.
[15] Z. Ullah, K. Ilgon, and S. Baeg, “Hybrid partitioned SRAM-based
VI. C ONCLUSION ternary content addressable memory,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 59, no. 12, pp. 2969–2979, Dec. 2012.
A novel architecture of TCAM has been presented that con- [16] I. Arsovski, T. Hebig, D. Dobson, and R. Wistort, “A 32 nm
sumes the least power among the compared architectures. The 0.58-fJ/bit/search 1-GHz ternary content addressable memory compiler
dual bit structure that occupies small design area is suitable for using silicon-aware early-predict late-correct sensing with embedded
deep-trench capacitor noise mitigation,” IEEE J. Solid-State Circuits,
a high density design with no performance degradation. The vol. 48, no. 4, pp. 932–939, Apr. 2013.
proposed EMDBAM can be used in low-power applications [17] A. Wiltgen, Jr., K. A. Escobar, A. I. Reis, and R. P. Ribas, “Power
where more control over the match error is required. The consumption analysis in static CMOS gates,” in Proc. 26th Symp. Integr.
SCMMC reduced the individual cell masking task by using Circuits Syst. Design (SBCCI), Sep. 2013, pp. 1–6.
[18] N. S. Kim et al., “Leakage current: Moore’s law meets static power,”
mask control bits. Match errors have been controlled by Computer, vol. 36, no. 12, pp. 68–75, Dec. 2003.
MEC to pass valid values on the MLs. The proposed design [19] A. T. Do, C. Yin, K. Velayudhan, Z. C. Lee, K. S. Yeo, and
dissipates 0.84 fJ/bit/search with a 0.75-ns delay at 1 V. Results T. T.-H. Kim, “0.77 fJ/bit/search content addressable memory using
small match line swing and automated background checking scheme
conclude that the proposed EMDBAM is adaptable to a supply for variation tolerance,” IEEE J. Solid-State Circuits, vol. 49, no. 7,
voltage scaling of 0.6 V while providing a reduction of 9.68% pp. 1487–1498, Jul. 2014.
in the EDP from the basic TCAM architecture. [20] B.-D. Yang, Y.-K. Lee, S.-W. Sung, J.-J. Min, J.-M. Oh, and
H.-J. Kang, “A low power content addressable memory using low swing
search lines,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 12,
A PPENDIX pp. 2849–2858, Dec. 2011.
[21] S.-H. Yang, Y.-J. Huang, and J.-F. Li, “A low-power ternary content
The predictive models of GPDK have been shown addressable memory with Pai-Sigma matchlines,” IEEE Trans. Very
in Table IX. Large Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 1909–1913,
Oct. 2012.
[22] Y.-J. Chang, K.-L. Tsai, and H.-J. Tsai, “Low leakage TCAM for IP
R EFERENCES lookup using two-side self-gating,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 60, no. 6, pp. 1478–1486, Jun. 2013.
[1] Z. Cai, Z. Wang, K. Zheng, and J. Cao, “A distributed TCAM coproces- [23] S.-J. Ruan, C.-Y. Wu, and J.-Y. Hsieh, “Low power design of
sor architecture for integrated longest prefix matching, policy filtering, precomputation-based content-addressable memory,” IEEE Trans. Very
and content filtering,” IEEE Trans. Comput., vol. 62, no. 3, pp. 417–427, Large Scale Integr. (VLSI) Syst., vol. 16, no. 3, pp. 331–335, Mar. 2008.
Mar. 2013. [24] H. Jarollahi et al., “A nonvolatile associative memory-based context-
[2] K. Zheng, C. Hu, H. Lu, and B. Liu, “A TCAM-based distributed parallel driven search engine using 90 nm CMOS/MTJ-hybrid logic-in-memory
IP lookup scheme and performance analysis,” IEEE/ACM Trans. Netw., architecture,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 4, no. 4,
vol. 14, no. 4, pp. 863–875, Aug. 2006. pp. 460–474, Dec. 2014.
MISHRA AND DANDAPAT: LOW-POWER DBAM WITH MATCH ERROR AND MASK CONTROL 2151
[25] S. Baeg, “Low-power ternary content-addressable memory design using Sandeep Mishra (M’14) received the B.Tech. and
a segmented match line,” IEEE Trans. Circuits Syst. I, Reg. Papers, M.Tech. degrees in electronics and communication
vol. 55, no. 6, pp. 1485–1494, Jul. 2008. engineering from the Biju Patnaik University of
[26] Y.-J. Chang, “Using the dynamic power source technique to reduce Technology, Rourkela, India, in 2011 and 2013,
TCAM leakage power,” IEEE Trans. Circuits Syst. II, Exp. Briefs, respectively. He is currently pursuing the
vol. 57, no. 11, pp. 888–892, Nov. 2010. Ph.D. degree with the Department of Electronics
[27] M. J. Akhbarizadeh, M. Nourani, and C. D. Cantrell, “Prefix segregation and Communication Engineering, National Institute
scheme for a TCAM-based IP forwarding engine,” IEEE Micro, vol. 25, of Technology Meghalaya, Shillong, India.
no. 4, pp. 48–63, Jul./Aug. 2005. His current research interests include low-power
[28] M. Chae, J.-W. Lee, and S. H. Hong, “Decoupled 4T dynamic CAM memory design, high-speed sense amplifier, and
suitable for high density storage,” Electron. Lett., vol. 47, no. 7, intelligent transportation system.
pp. 434–436, Mar. 2011.
[29] D. Kayal, A. Dandapat, and C. K. Sarkar, “Design of a high performance
memory using a novel architecture of double bit CAM and SRAM,” Int.
J. Electron., vol. 99, no. 12, pp. 1691–1702, Jun. 2012. Anup Dandapat (M’10–SM’15) received the
[30] K. Pagiamtzis and A. Sheikholeslami, “Content-addressable mem- Ph.D. degree in digital VLSI design from Jadavpur
ory (CAM) circuits and architectures: A tutorial and survey,” IEEE University, Kolkata, India, in 2008.
J. Solid-State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006. He is currently an Associate Professor with
[31] B. S. Nataraj, S. Khanna, and V. Srinivasan, “Ternary content address- the Department of Electronics and Communica-
able memory cell,” U.S. Patent 6 154 384, Nov. 28, 2000. tion Engineering, National Institute of Technol-
[32] S. Hanzawa, T. Sakata, K. Kajigaya, R. Takemura, and ogy Meghalaya, Shillong, India. He has authored
T. Kawahara, “A large-scale and low-power CAM architecture over 50 national and international journal papers.
featuring a one-hot-spot block code for IP-address lookup in a network His current research interests include low-power
router,” IEEE J. Solid-State Circuits, vol. 40, no. 4, pp. 853–861, VLSI design, low-power memory design, and
Apr. 2005. low-power digital design.