Low Power Design

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO.
2, FEBRUARY 2013 573
An Ultra-Low Power Asynchronous-Logic

In-Situ Self-Adaptive System for
Wireless Sensor Networks
Tong Lin, Kwen-Siong Chong, Member, IEEE, Joseph S. Chang, and Bah-Hwee Gwee, Senior Member, IEEE
AbstractWe propose a Sub-threshold (Sub- ) Self-Adaptive of interest, comprising five main modules: Sensor Front-End,
Scaling (SSAVS) system for a Wireless Sensor Network with Signal Processor, Wireless Transceiver, Energy Source, and
the objective of lowest possible power dissipation for the prevailing
Power Management. As the WSN is typically designed for
throughput and circuit conditions, yet high robustness and with
minimal overheads. The effort to achieve the lowest possible power multiple-year operational life-span [1], power is carefully bud-
operation is by means of adjusting to the minimum voltage geted and where pertinent, energized only when required, such
(within 50 mV) for said conditions. High robustness is achieved by that the overall average power is typically 10100 [2].
adopting the Quasi-Delay-Insensitive (QDI) asynchronous-logic In our WSN depicted in Fig. 1, its overall active/passive
protocols where the circuits therein are self-timed, and by the
embodiment of our proposed Pre-Charged-Static-Logic (PCSL) operation ratio is approximately 20/80. In the passive mode,
design approach; when compared against competing approaches, only the Sensor Front-End module is continuously energized.
the PCSL is most competitive in terms of energy/operation, The Sensor and the Conditioning Circuits therein are powered
delay and IC area. By exploiting the already existing request directly by ( 2.8 V), a Lithium/Carbon Fluoride
and acknowledge signals of the QDI protocols, the ensuing over-
head of the SSAVS is very modest. The filter bank embodied
battery, via a Low-Dropout (LDO) Regulator.
in the SSAVS is shown to be ultra-low power and highly ro- The Simple Processor is powered by (1.2 V) via a
bust. When benchmarked against the competing conventional power-efficient Buck DC-DC Converter. The battery
Dynamic-Voltage-Frequency-Scaling (DVFS) synchronous-logic is appropriate largely because of its high energy density per
counterpart, no one system is particularly advantageous when the
weight and very wide operating temperature range (
operating conditions are known. However, when the competing
DVFS system is designed for the worst-case condition, the pro- to 160 ), congruent with that required of our WSN [3]. The
posed SSAVS system is somewhat more competitive, including Simple Processor ascertains if the input is possibly useful, and
uninterrupted operation while its self-adjusts to the varying if it is, the WSN goes into active mode where it signals the
conditions. Power Management module to energize the Signal Processor
Index TermsAdaptive scaling, asynchronous-logic cir- module via . The voltage of , typically in the
cuits, quasi-delay-insensitive circuits, sub-threshold operation, sub-threshold voltage (sub- ) range, is self-adjusted such
ultra-low power operation, wireless sensor networks.
that the lowest possible voltage is usedto enable ultra-low
power operation. The Signal Processor module buffers (via a
I. INTRODUCTION FIFO) the output of the Simple Processor, filters the output
signal before final computation by the Microcontroller Unit
W IRELESS SENSOR NETWORKs (WSNs) are increas-

ingly ubiquitous, in part, due to their ultra-low power
and high reliability operation. Fig. 1 depicts the WSN node
(MCU). When the MCU ascertains that the filtered signal is
useful, the Wireless Transceiver is energized and the processed
signal is subsequently transmitted wirelessly. With the wireless
transmission expected to be 0.01% active and with a 20/80
WSN active/passive operation, 50% of the overall power is
Manuscript received May 02, 2012; revised September 18, 2012; accepted attributed to the Signal Processor module, which is of interest
September 24, 2012. Date of publication December 03, 2012; date of current
version January 24, 2013. This paper was approved by Associate Editor Stefan in terms of power dissipation.
Rusu. This work was supported in part by research grants provided by the De- The approaches taken to minimize power involve all levels of
fense Advanced Research Projects Agency (DARPA), USA, and by the Ministry
the design space including algorithmic design and at the hard-
of Education, Singapore.
T. Lin is with the Temasek Laboratories, Nanyang Technological University, ware level. In the former, the filtering in the Signal Processor
Singapore 639798. He is also with the School of Electrical and Electronic module embodies the Frequency Response Masking (FRM)
Engineering, Nanyang Technological University, Singapore 639798 (e-mail:
lintong@ntu.edu.sg).
technique [4]. This involves the Interpolated Finite Impulse
K.-S. Chong is with the Temasek Laboratories, Nanyang Technological Uni- Response (IFIR) Filter and the FRM Filter Bank (FB), and is
versity, Singapore 639798 (e-mail: kschong@ntu.edu.sg). computationally more efficient than the usual FIR and IIR filter
J. S. Chang is with the Nanyang Technological University, School of Elec-
trical and Electronic Engineering, Division of Circuits and Systems, Singapore approaches. Ultra-low power design techniques in the latter
639798 (e-mail: ejschang@ntu.edu.sg). are extensively reported in literature [5][15] and of these,
B.-H. Gwee is with the Nanyang Technological University, School of Elec- operation in the sub- region is one of the most effective. This
trical and Electronic Engineering, Singapore 639798 (e-mail: ebhgwee@ntu.
edu.sg). is particularly applicable here because the speed of the digital
Digital Object Identifier 10.1109/JSSC.2012.2223971 circuits in the Signal Processor is modestthe clocking speed
0018-9200/$31.00 2012 IEEE

574 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 2, FEBRUARY 2013
Fig. 1. Block diagram of the WSN node.
ranges from 1.4 kHz to 1.4 MHz for a sampling rate range from novel self-adjustment is obtained very simplyby exploiting
0.1 kSamples/s (kS/s) to 100 kS/s. (and comparing) the existing Request and Acknowledge
Despite the potential advantages of sub- operation, this re- signals of the QDI protocol signaling, and thereafter
gion of operation is challenging here for several reasons. First, adjusting the accordingly (see Section III later). The
the WSN is designed to work in a wide range of conditions, ensuing overhead is hence very low.
including extreme environments ( to some- This paper is organized as follows. Section II reviews adap-
what similar to [14]. Second, Process, Voltage and Temperature tive scaling systems. Section III presents the design of the
(PVT) variations for fine-dimensioned CMOS processes in- proposed system. Section IV presents the measurement results
crease dramatically in sub- operation, and the ensuing delay of prototype ICs and benchmarking thereof. Finally, conclu-
variations are very severe, possibly intractable [9]. Typically, sions are drawn in Section V.
a very large delay safety margin (for synchronous-logic (sync)
circuits) would need to be allowed for, for example II. ADAPTIVE SCALING SYSTEMS
[14]. Third, the input signal to the Signal Processor module The general modality of adaptive scaling systems to re-
is variable. From a robust operation perspective, the circuits duce power is to adaptively adjust as low as possible (with
would need to be designed to meet the worst-case condi- appropriate timing margin) to meet the throughput requirement
tionsthe fastest input rate and extreme temperatures. for the prevailing operating conditions (including PVT varia-
To design the WSN for ultra-low power operation, we adopt tions). This largely requires the pertinent circuit delay variations
a self-adjusting approach whilst operating in the sub- to be tracked, observed, or inferred.
region, termed Sub-threshold Self-Adaptive Scaling A reported delay tracking technique is based on a Look-Up
(SSAVS) where the is in-situ dynamically self-adjusted. Table [19], [20] comprising tabulated pre-characterized
The modus operandi involves dialing up when the need throughput versus data according to critical path cir-
for computation increases or when the operating conditions are cuit delay(s) under worst-case PVT conditions for the given
less favorable, and is dialed-down when the conditions throughput. To avoid excessive timing margins, Statistical Static
are the converse. Put simply, the lowest is used where Timing Analysis [19] may be employed mostly to account for
possible because in general the lower the , the lower is local (within-die) variations. Another reported technique [21]
the power dissipation due to dynamic and leakage currents. attempts to track real-time variations by adding PVT sensors.
In this paper, we describe an SSAVS system for the Signal However, in sub- operation, because of the exponential rela-
Processor module in a WSN based on a proposed methodology tionship of sub- delay with PVT, even small errors in these
within the Quasi-Delay-Insensitive (QDI) asynchronous-logic sensor readings could lead to large circuit delay uncertainties,
(async) approach [6], [12], [14], [16], and with a novel in-situ and the overheads associated with the sensors may defeat any
self-adjusting means. The proposed design methodology, advantage. The reported critical path delay matching [22][26]
coined Pre-Charged Static-Logic (PCSL) [17], is essentially involves a ring oscillator matched to the critical path delay to
a static-logic library cell architecture that exploits the fast set the clock frequency, and is subsequently adjusted. For
reset feature and is appropriate for full-range Dynamic Voltage improved matching, the entire logic of the critical path may be
Scaling (DVS) [18]for ranging from nominal voltage replicated at high hardware cost [24]. Although this may be
to deep sub- . The proposed SSAVS system for the WSN is able to mitigate the delay uncertainties issues associated with
demonstrated by means of application to the FRM FB. The global PVT variations, it may not comprehensively account
LIN et al.: AN ULTRA-LOW POWER ASYNCHRONOUS-LOGIC IN-SITU SELF-ADAPTIVE SYSTEM FOR WIRELESS SENSOR NETWORKS 575
Fig. 2. Overall structure of the proposed SSAVS system with an async QDI FRM Filter Bank (FB); , ranges from 150 mV400 mV.
for local variations, particularly in sub- operation. Another tion of the computation. By counting the number of against
reported technique employs timing error detection/correction within a given period, we ascertain if the delay of the cir-
[27][30], where is reduced until the ensuing computation cuit is excessive, or otherwise, with respect to the throughput
is erroneous. is thereafter increased and the computa- for the prevailing conditions. is thereafter adjusted accord-
tion repeated. The applicability of this technique is arguably ingly such that the delay is just slightly less than the delay be-
limited due to the severe/intractable PVT variations in sub- tween input samples, thereby satisfying the throughput. Further,
operation, to possibly severe meta-stability issues due to the as is inherent in QDI async protocols, the computation is
lack of timing margin, and to the need for re-computations. uninterrupted while is transitioning during its self-adjust-
Another reported technique [31], [32] attempts to ascertain ment; in reported adaptive scaling systems, circuit opera-
the circuit delay indirectly by measuring the variations in the tion typically ceases when is transitioning [20]. Of specific
supply current drawn to infer the duration of the computation, interest, note that the delay is definitive because the delay is that
and subsequently adjusted. This technique is likely to be ascertained for the prevailing operating conditions, and we will
ambiguous in sub- operation where the ratio of the current show later that the associated hardware to adjust is very
during computation to idle is small. modest.
On the basis of the aforesaid review, it can be argued that At this juncture, to the best of our knowledge, ultra-low
these reported tracked, observed and inferred techniques are power QDI circuits with self-adaptive , operating in
inadequate in terms of robustness, particularly in sub- op- the sub- region and in extreme environments (hence re-
eration. Further, the hardware/computation overheads are con- quiring extremely high reliability), have yet to be reported or
siderable, including the need to scale with the scaling of demonstrated. Further it would be interesting to compare their
the clock frequency, i.e. Dynamic Voltage Frequency Scaling attributes, including IC area, delay, energy/operation
(DVFS). and power dissipation, against their conventional sync DVFS
We instead propose a definitive means by directly measuring counterpart and under various conditions (see Section IV later).
the delay and comparing it against the throughput for the pre-
vailing conditions, and is thereafter adjusted accordingly. III. SYSTEM DESIGN
To enable this, we adopt the self-timed async QDI (vis--vis the Fig. 2 depicts the proposed SSAVS system within the Power
conventional sync) where its dual-rail encoding includes the Re- Management module embodying the SSAVS Controller and its
quest signal which indicates that the input sample is ready associated adjustable means (a Buck DC-DC Converter),
and the Acknowledge signal that indicates the comple- and the PCSL-based 8 8-Bit Quad-Channel Async QDI FRM
Fig. 3. An example of the variation of with time. The logical numbers on the ordinate are and their corresponding DC voltages .
TABLE I
OPERATION OF THESSAVS CONTROLLER
FB within the FRM FB. There are two voltage rails in the , and the speed of the FB would far exceed
overall proposed SSAVS system: a fixed and the required computation. In this scenario, the number of FB
a variable whose sub- voltage typically ranges from clocks will be equal to the number of clocks in
150 mV to 400 mV. For ease of illustration, the specific rail each period. In the next
is shown in parenthesis for the supply rails and for signals of the period, the SSAVS Controller will subsequently decrement
various modules. In Fig. 2, the voltage of and of sig- by 1 bit to 10110 and correspondingly
nals is first adjusted from to by the reduces by 50 mV to 1.15 V. The process continues where
Step-Down Level Converter, and are thereafter buffered by the is continuously decremented as with the voltage
Async FIFO Buffer (depth of 50) before input ( and of commensurably reduced. Eventually, at period
) to the async FRM FB. The FB outputs ( 14) in Fig. 3, is decremented to 00010, equivalently
and their associated (combined from 14 via the Com- . This is the juncture where the speed of
pletion Detection Circuit) are output to the MCU for further pro- the FRM FB is just slightly slower than the data rate for
cessing. is also fed back to the Async FIFO Buffer. The the prevailing conditionsthe number of clocks hence
and signals are input to the Power Management module, exceeds the number of clocks in one
and is stepped up from to . The SSAVS period. Although the speed of the FRM FB is slightly too slow,
Controller within the Power Management module monitors the no error occurs because the unconsumed inputs are stored in
number of and signals in each pe- the Async FIFO Buffer (Fig. 2). In the next period, ,
riod (a 10 Hz clock generated by the Update Clock Gen- the SSAVS Controller reacts accordingly by incrementing
erator for a target throughput of 1 kS/s). The is a by 1 bit to 00011 and the corresponding
5-bit code that sets one of 24 voltage levels (in the Buck DC-DC increased by 50 mV to 200 mV. With increased, the
Converter) ranging from ' to ' speed of the FRM FB now slightly exceeds the required com-
(in 50 mV steps) for . putation and the unconsumed inputs stored in the FIFO buffer
Fig. 3 graphically depicts an example of the self-ad- are in turn computed at a slightly faster rate than
justment of . When the WSN is first initiated, the the data rate. Consequently, the number of clocks is
SSAVS Controller outputs ', equivalently now less than the number of clocks and at the end of this
Fig. 4. (a) Proposed Pre-Charged Static-Logic (PCSL) architecture, and six basic cells embodying the proposed PCSL dual-rail QDI realization approach:
(b) 2-input AND/NAND gate, (c) 2-input OR/NOR gate, (d) 3-input AO/AOI gate, (e) 3-input OA/OAI gate, (f) 2-input XOR/XNOR gate, and (g) 2-input MUX.
TABLE II
ENERGY-PER-OPERATION, DELAY AND IC AREA OF DUAL-RAIL LIBRARY CELLS EMBODYING
VARIOUS APPROACHES @ AND 130 nm CMOS PROCESS
period, all unconsumed inputs in the FIFO may have been required minimum. Hence, the FB is ultra-low power and highly
cleared; if not, the voltage of remains (or increased power-efficient. Note that the overheads for this self-adjusting
further) in the next time period(s). If cleared, in the next period are very modest (a counter) and the circuit operation is
, the number of clocks again equals to the number of uninterrupted whilst transitions.
clocks (as in time periods preceding ). This is the same In view of the need for sub- operation, it is imperative
scenario where the FB, as a consequence of the slightly raised to adopt circuits based on the static-logic family to mitigate
, is capable of computing faster than the data the effects of critical transistor sizing [9]; dynamic- and pass-
rate. In the next period , the scenario is that as in period , logic families are inappropriate [18]. Fig. 4(a) depicts the basic
and the operation repeats accordingly. Table I summarizes the architecture of our proposed async cells, coined Pre-Charged
three operational conditions. Static-Logic (PCSL) [17]. This basic architecture comprises
In short, the voltage of of the FB is in-situ adaptively an Inverting Static-Logic Cell, three transistors (for output pre-
self-adjusted to be as low as possible (within 50 mV) to meet charging during the reset phase/evaluation during the computa-
the throughput for the prevailing operating conditions, and on tion phase), and two inverters (for output buffering). The out-
average, the voltage of is slightly higher than the actual puts are (Output True) and (Output False). In PCSL
Fig. 5. Reported dual-rail AND/NAND circuit designs: (a) Delay-Insensitive-Minterm-Synthesis (DIMS), (b) NULL Convention Logic (NCL) with complex
gates (NCL1), and (c) NCL with fast-reset complex gates (NCL2).
Fig. 6. Block diagram of one channel of the 8 8-Bit Quad-Channel Async QDI FRM FB.
cells, when is 0, both outputs are 0. On the other hand, of cells embodying the reported DIMS, NCL1, and NCL2 ap-
when is 1 (indicating that an operation is ready) and when proaches is significantly higher: 4.0 , 1.6 , and 1.9 respec-
the input signals are valid, the operation commences and an en- tively. It is also apparent that the cells embodying the proposed
suing output is obtained. The architecture of the PCSL cell in- PCSL approach feature the shortest delay (the sum of two com-
volves an integration of the subcircuit associated with the ponents, (computation phase) and (reset phase), aver-
signal and a buffer (to each output) into the standard static-logic aged over all input combinations), save the simple AND/NAND
library cell (redesigned for dual-rail async), thereby sharing of and OR/NOR gates of NCL1. On average, the reported DIMS,
(common) transistors. This reduces the number of transistors, NCL1, and NCL2 cells are significantly slower: 4.1 , 1.8 ,
resulting in simultaneous lower power/energy dissipation, faster and 1.9 respectively. It is also apparent that the cells em-
speed and smaller IC area (see Table II later). On the basis of bodying the proposed PCSL approach require the smallest IC
this architecture, Figs. 4(b)(g) depict the schematic of six basic area; the layouts are based on the standard-cell approach where
PCSL cells (all with 3-transistor limit in any stack). the cell height is fixed at 4 and the cell width is in multi-
To depict the hardware advantage of the proposed PCSL ples of 0.4 . On average, the IC area required for cells em-
approach, the 2-input AND/NAND gate in Fig. 4(b) can bodying the reported DIMS, NCL1, and NCL2 approaches is
be compared to the same gate realized by three reported significantly larger: 4.7 , 2.6 , and 2.7 respectively; from
static-logic QDI approaches in Figs. 5(a)(c): (a) Delay-Insen- a perspective of dual-rail async and (single-rail) sync circuits,
sitive-Minterm-Synthesis (DIMS) approach [33], (b) NULL the smaller IC area is worthwhile because the IC area overhead
Convention Logic (NCL) with complex gates [34] (denoted of the former is somewhat mitigated. In short, cells embodying
NCL1), and (c) NCL with fast-reset complex gates [35] (de- the proposed PCSL approach simultaneously exhibit the lowest
noted NCL2). On the basis of simulations (130 nm CMOS), , shortest delay and smallest IC area.
Table II benchmarks , delay and IC area of the aforesaid With the proposed PCSL QDI realization approach, an 8
six basic cells of the various approaches. The competing cells 8-Bit Quad-Channel Async QDI FRM FB is designed. A semi-
are normalized to the PCSL cells whose actual values are custom design flow is adopted, where the front-end is designed
shown within parentheses. The average attributes are tabulated using an assortment of in-house design tools and commercial
in the last row. synthesis tools based on a flow similar to NCL-X [34]. The
It is apparent from Table II that the cells embodying the pro- back-end implementation, on the other hand, is based on com-
posed PCSL approach feature the lowest , save the simple mercial EDA tools with our customized library cells (including
AND/NAND and OR/NOR gates of NCL1. On average, the proposed PCSL). Each FB channel is independent and Fig. 6
Fig. 7. Die microphotograph (left) and layout (right) of the fabricated test-chips: (a) proposed SSAVS system with async QDI FRM filter bank, and (b) sync
benchmark filter.
depicts the block diagram of one FB channel embodying an FIR

filter realizing the FRM algorithm. As the throughput require-
ment of the intended WSN is somewhat modest, a serial im-
plementation is adopted, where each FB channel comprises an
Async Read/Write Controller, an 8 8-Bit Coefficient Memory,
an 8 8-Bit Data Memory, an 8-Bit PCSL Multiplier, and a
20-Bit PCSL Adder. To preserve the QDI protocol and proper
async handshaking, Datapath Completion Detection (DCD) and
Latch Completion Detection (LCD) circuits are included with
Muller C-elements (denoted by a gate symbol with C) [34].
All async dual-rail latches in the datapath are initialized to an
empty value except for Latch 3 which is used to hold the ac-
cumulated product and is initialized to a valid 0.
The data and clock from the Async FIFO Fig. 8. (a) High variations @ 1 kHz, 150 mV300 mV, and (b) error-free
Buffer (Fig. 2) are input to each FB channel. The Async Read/ response ( signal) from the proposed async QDI FRM filter bank.
Write Controller in Fig. 6 first initiates a write operation by
providing a valid memory address on and asserting
to write the data into the 8 8-Bit Data Data Memory and the 8 8-Bit Coefficient Memory by pro-
Memory. Upon write completion, the Async Read/Write Con- viding them with valid memory addresses on and
troller subsequently initiates the first read operation for the Mul- , and then asserting . The input data and
tiply-Accumulate (MAC) operation from both the 8 8-Bit its corresponding coefficient are respectively read out to Latch
Fig. 9. Example of the captured waveforms depicting (a) self-adjustment of and from the async QDI FRM filter bank, and (b) self-adjustment of
and under sudden temperature drop.
Fig. 10. Variation of the sync filter critical path delay under various PVT conditions: Monte Carlo simulations.
1 and Latch 2, and subsequently multiplied by the 8-Bit PCSL until the last tap of the filter. When (one of 14
Multiplier. The multiplication product is captured by Latch 4 in Fig. 2) is finally computed, the Async Read/Write Controller
and sign-extended to 20 bits to accommodate potential over- of each channel will assert its clock to indicate completion.
flow. The 20-Bit PCSL Adder is used to add this product to the The overall clock is output to the Async FIFO Buffer which
accumulated product stored in Latch 3. The result of the adder subsequently resets and de-asserts the clock.
is looped back to Latch 3, thereby updating its value and com- This in turn resets all FB channels and the system is now ready
pleting the first MAC operation. The MAC operation repeats to process the next input data from the FIFO.
Fig. 11. Scenario 1: Benchmarking delay and of a sync DVFS filter bank and the async SSAVS filter bank for three temperature corners: (a) ,
(b) 25 , and (c) 125 . Note: Bold lines are measured while dotted lines are from simulations.
IV. RESULTS AND BENCHMARKING from a pattern generator) and comparing the ensuing output
We will first demonstrate the robustness of the proposed data (by means of a logic analyzer) with that expected. We
async FB to PVT variations, particularly large and tem- will thereafter delineate the efficacy of the SSAVS system em-
perature variations, on the basis of physical measurements bodying the async FB and benchmark it against the competing
on prototype ICs (@130 nm CMOS) embodying the SSAVS conventional DVFS system embodying a sync filter. The die
system and the FB, and where pertinent, by simulations. microphotograph of DVFS system embodying one sync FB
Fig. 7(a) depicts the die microphotograph (left) and its layout channel is depicted in the left of Fig. 7(b) and on the right, the
(right). The async FB embodying 4 channels occupies an IC layout; the 4-channel sync FB would occupy ,
area of . All 30 prototype ICs tested were fully or smaller than the async FB. The lowest functional
functional for , and this of the sync filter (probably attributed to the hold time
in some sense corroborates the robustness of the design. The violations of registers therein [36]) is 200 mV, a minimum
functionality was verified by sampling the input data (generated voltage higher than that of the async FB (130 mV).
Consider first the robustness of the proposed async FB against

PVT variations, in this case varying at 1 kHz between 150
mV and 300 mV as shown in the top trace of Fig. 8. Under this
harsh condition, the async FB, operates without error as
verified by the signal (and by means of a logic analyzer),
depicted as the bottom trace in Fig. 8. It can be appreciated
that as can be varied widely without error and since the
FB operation is uninterrupted, the async FB readily lends itself
to being self-adjusted using the SSAVS system to the lowest
voltage possible that meets the throughput for the prevailing
conditions.
Consider now two examples of the SSAVS system that
demonstrate its in-situ self-adjusting . In the first example,
the operation of the SSAVS system earlier delineated in Fig. 3
is now physically depicted in Fig. 9(a) with the top and bottom
traces being and respectively. Fig. 9(b) depicts
the second example where in addition to self-adjusting
to the throughput rate, it also self-adjusts to the prevailing
conditions. In the top trace of Fig. 9(b), the prototype IC is
subjected to a sudden temperature drop (by means of freezer
spray onto the package thereof) at some juncture, and
self-adjusts by first increasing to between 200 mV and 250
mV, and thereafter to between 250 mV and 300 mV as the
cold permeates the IC package. Although not shown here, the
converse is obtained when the prototype IC is subjected to heat,
e.g. from a hot air gun reduces and finally toggles
between two lower voltage levels.
We will now benchmark the proposed SSAVS system with the
async FB against its sync DVFS FB counterpart. In the latter,
to accommodate the extreme/intractable delay variations due
to PVT (including temperature ranging from to 125
[18], congruent with the WSN application) while operating in
the sub- region, a substantial amount of delay safety margin
is needed to obtain operational robustness. To ascertain these
margins, we employ statistical delay analysis on the critical
path of the sync filter. In view of the intended WSN applica-
tion and the availability of test equipment (particularly the en- Fig. 12. Scenario 1: Power consumption of the sync and async filter banks
vironmental chamber), four temperature corners (extreme heat (a) @ , (b) @25 , and (c) @125 .
125 , nominal 25 , and extreme cold (and ))
are considered. To ascertain the spread of delay due to process
variations, 1000 Monte Carlo simulations on the critical path is much simpler where the clocking frequency is fixed (to the
delay of the sync filter are performed at each said temperature worst-case) to accommodate all conditions. For Scenario 1, we
corner. The worst-case delay at of the given process param- will use a (delay) point along the plot of the pertinent tem-
eters is chosen, in part, to obtain sufficient (99.7%) coverage. perature and adjust that point for 10% variation; the 10%
The same simulations are repeated across the intended in variation is congruous with the International Technology
the sub- voltage range. These ascertained delays are depicted Roadmap for Semiconductors. For example, for 25 , the
in Fig. 10 for nominal process parameters (solid lines) and for delay for is that for @25
that with process variations (dotted lines). Consistent with and , and equals to 3.9 (of the nominal). For Scenario 2,
observations reported elsewhere [37], the delay variations the delay for is that for the worst-case for
are expectedly higher at lower temperatures, a consequence of @ and , and equals to 183 ; in
steeper sub-threshold slope. [14], the allowed delay safety margin was somewhat similar,
Consider the benchmarking under two general scenarios. In .
Scenario 1, the sync DVFS system embodies a temperature In both scenarios, the characteristics of prototype ICs (em-
sensor and on the basis of the measured temperature and bodying both FBs) were measured at three temperature corners,
pre-characterization of the sync filter, the clocking frequency i.e. 125 for extreme heat, 25 for nominal, and for
is selected accordingly. In Scenario 2, the sync DVFS system extreme cold (limit of the environmental chamber), and plotted
Fig. 13. Scenario 2: Benchmarking delay and of a sync DVFS filter bank and the async SSAVS filter bank for three temperature corners: (a) ,
(b) 25 , and (c) 125 . Note: Bold lines are measured while dotted lines are from simulations.
in Figs. 1114. For completeness, the delays @Upper/Lower delay increases for decreasing temperature. Third, with the tem-
and 10% obtained by simulations for the async FB are also perature ascertained by the sensor, the delay variations, hence
plotted. the ensuing delay safety margins of the sync FB, are relatively
Figs. 11(a)(c) depict the delay (for computing one sample, small (vis--vis Scenario 2, see later). Consequently and not un-
equivalent 14 clock cycles) and at the three aforesaid tem- expectedly, the delay of the sync FB for 25 and 125 is
perature corners; as we are only able to measure at (in- largely comparable to its async counterpart at its nominal con-
stead of ), the remarks henceforth for the extreme cold dition. Fourth, the delay of the sync FB is longer @ on
temperature is for operation at . Note that is ascer- average, 4.0 longer than the async FB. This can be attributed
tained at each over the delay of computing one sample. to the longer delay at for compared to that at 125 .
On the basis of the delay plots, we remark the following. First, On the basis of the plots, we remark the following.
in general and as expected, the delay increases with reducing First, in general and as expected, the minimum for both
for both FBs. Second, also in general and for both FBs, the FBs decreases as the temperature decreases. Second, for
minimum reduces for reducing temperature for both FBs.

Specifically, as the temperature drops from 125 to ,
the minimum for the async and sync FBs respectively
shifts from equal to 400 mV to 250 mV and from 450
mV to 300 mV. Third, the sync FB, in general, is advanta-
geous at the higher end of and this advantage diminishes
at higher temperature. The async FB is conversely advantageous
at the lower end of . This observation can, as before, be cor-
roborated with Fig. 10.
As the interpretation of to power dissipation is not prima
facie, we plot in Figs. 12(a)(c) the power dissipation of the
FBs as a function of throughput for the three temperature cor-
ners. We make the following remarks. First, in general and as
expected, the power dissipation of both FBs decreases with re-
ducing throughput; in Fig. 12(c), the power dissipation con-
tinues to decrease for throughput 10 kS/s albeit at a low rate.
Second, the effect of throughput on power dissipation at the
three corners are different. At , the power dissipation is
roughly linearly related to the throughput, where as expected,
it increases with higher throughput. At 25 , the power dissi-
pation remains roughly linearly related to the throughput (al-
beit at a slower rate than that at ) for mid to high ( 1
kS/s) throughput, and the relationship is only slight for low
throughput, 1 kS/s. At 125 , the throughput has only a very
slight effect on the power dissipation. Overall, the influence of
throughput on power dissipation mitigates as the temperature
rises. Third, at 125 , the async FB dissipates lower power than
the sync FB, while at , the converse is true. At 25 , the
async FB is advantageous at the low throughput range, while at
the higher throughput range, the converse is true.
In the overall perspective of power dissipation in this Sce-
nario 1, it would be prudent to be cognizant of the hardware and
power dissipation costs associated with the temperature sensor.
These costs apply only to the sync DVFS system, and practi-
cally, these costs would likely defeat any advantages offered by
the sync DVFS system over the async SSAVS system.
Consider now Scenario 2 where the aforesaid temperature Fig. 14. Scenario 2: Power consumption of the sync and async filter banks
sensor is absent. Figs. 13(a)(c) benchmark the delay and (a) @ , (b) @25 , and (c) @125 .
for both FBs for the three temperature corners. The delay of the
sync DVFS FB is preadjusted and fixed to satisfy the worst-case 2, the async FB is advantageous in terms of to the sync FB
condition, i.e. delay with 10% variation at for at 125 , advantageous for sub- at 25 ,
the given operating voltage. It is hence not unexpected that and at , only for sub- .
the delay of the sync FB is substantially larger than its async Figs. 14(a)(c) depict the power dissipation of the FBs as
counterpart (at nominal condition) for all three temperature cor- a function of throughput for the same three temperature cor-
ners. This disparity becomes most apparent when the conditions ners. At , the sync FB dissipates less power in most of
are most benign, at 125 when the FBs can operate at a higher the throughput range. At 25 , the sync FB dissipates power
speed. In short, in Scenario 2, the async FB is advantageous in comparable to its async FB counterpart in the high throughput
terms of delay to the sync FB for all conditions. range 10 kS/s, and higher power in the mid to low throughput
Consider now the of the FBs. At , the of the range, 10 kS/s. At 125 , the sync FB dissipates substan-
sync FB is lower than the async FB for , and tially higher power than its async counterpart over the entire
the converse is true for . As the temperature throughput range. In short, compared to the power dissipation
increases, the of the sync FB as expected increases sig- of the sync FB, the async FB is disadvantageous at , com-
nificantly. Specifically, at 25 , the sync FB dissipates higher parable in the high throughput range at 25 , and advantageous
than its async counterpart for . Further, at elsewhere.
125 , the of the sync FB is significantly higher than the The aforesaid remarks and observations pertaining to Sce-
async FB over the entire sub- range. In short, in Scenario narios 1 and 2 can largely be explained by noting that in sub- ,
the delay of the circuits increases with decreasing temperature [10] K.-S. Chong, B.-H. Gwee, and J. S. Chang, Energy-efficient syn-
(vis--vis increasing temperature in supra- ), that the delay at chronous-logic and asynchronous-logic FFT/IFFT processors, IEEE
JSSC, vol. 42, no. 9, pp. 20342045, Sep. 2007.
increases the most at the extreme cold temperature (vis--vis [11] S. Hanson et al., Exploring variability and performance in a sub-
at other temperatures), that at very low the leakage current 200-mV processor, IEEE JSSC, vol. 43, no. 4, pp. 881891, Apr.
2008.
is dominant (over dynamic), that the leakage current is exponen- [12] E. Beigne et al., An asynchronous power aware and adaptive NoC
tially related to temperature, and that because the FB is a rela- based circuit, IEEE JSSC, vol. 44, no. 4, pp. 11671177, Apr. 2009.
[13] I. J. Chang, S. P. Park, and K. Roy, Exploring asynchronous design
tively simple circuit, the delay of the critical path of the sync FB techniques for process-tolerant and energy-efficient subthreshold oper-
is only slightly longer than its non-critical paths (explaining the ation, IEEE JSSC, vol. 45, no. 2, pp. 401410, Feb. 2010.
relatively low delay of the sync FB, particularly in Scenario 1). [14] R. D. Jorgenson et al., Ultralow-power operation in subthreshold
regimes applying clockless logic, in Proc. IEEE, Feb. 2010, vol. 98,
Overall, this benchmarking depicts that in Scenario 1, no spe- no. 2, pp. 299314.
cific FB is particularly advantageousthe sync DVFS FB and [15] K.-S. Chong et al., Synchronous-logic and globally-asynchronous-lo-
cally-synchronous (GALS) acoustic digital signal processors, IEEE
async SSAVS FB are advantageous in different conditions. Nev- JSSC, vol. 47, no. 3, pp. 769780, Mar. 2012.
ertheless, the sync FB may be disadvantageous if the tempera- [16] A. J. Martin and M. Nsytrom, Asynchronous techniques for
system-on-chip designs, in Proc. IEEE, Jun. 2006, vol. 96, no. 6, pp.
ture sensor overheads associated with DVFS for Scenario 1 are 11041115.
considered. In Scenario 2, the async FB is advantageous in terms [17] J. S. Chang, B.-H. Gwee, and K.-S. Chong, Asynchronous-Logic for
of reduced delay with respect to , usually lower with Full Dynamic Voltage Scaling, US Provisional Patent Application No.
61/364,478, Jul. 15, 2010.
respect to , and in terms of power dissipation, advantageous [18] J. S. Chang, Digital Asynchronous-Logic: Dynamic Voltage Control
in some conditions (while the sync advantageous in other condi- Final Technical Report for DARPA Project, HR0011-09-2-0006, Aug.
2010, et al..
tions). Further, in the context of continuous circuit operation and [19] J. Kwong et al., A 65 nm sub-Vt microcontroller with integrated
overheads associated with DVS, the proposed SSAVS is advan- SRAM and switched-capacitor DC-DC converter, IEEE JSSC, vol.
44, no. 1, pp. 115126, Jan. 2009.
tageous over the conventional DVFS in terms of uninterrupted [20] D. N. Truong et al., A 167-processor computational platform in 65
circuit operation and not requiring external intervention (such nm CMOS, IEEE JSSC, vol. 44, no. 4, pp. 11301144, Apr. 2009.
[21] J. Tschanz et al., Adaptive frequency and biasing techniques for tol-
as changing clock rate, pre-characterization, etc.). erance to dynamic temperature-voltage variations and aging, in Proc.
IEEE ISSCC, Feb. 2007, pp. 292293.
V. CONCLUSIONS [22] J. Kao, M. Miyazaki, and A. Chandrakasan, A 175-mV multiply-ac-
cumulate unit using an adaptive supply voltage and body bias architec-
We have proposed an SSAVS system for a WSN with the ture, IEEE JSSC, vol. 37, no. 11, pp. 15451554, Nov. 2002.
[23] B. H. Calhoun and A. P. Chandrakasan, Ultra-dynamic voltage scaling
objective of lowest possible power operation for the prevailing (UDVS) using sub-threshold operation and local voltage dithering,
throughput and circuit conditions adjusted to within 50 IEEE JSSC, vol. 41, pp. 238245, Jan. 2006.
[24] M. Elgebaly and M. Sachdev, Variation-aware adaptive voltage
mV of the minimum voltage, yet high operational robustness scaling system, IEEE Trans. VLSI Syst., vol. 15, no. 5, pp. 560571,
with minimal overheads. High robustness has been achieved May 2007.
[25] Y. Ramadass and A. Chandrakasan, Minimum energy tracking loop
by adopting the async QDI protocols, and the embodiment of with embedded DC-DC converter enabling ultra-low-voltage operation
our proposed PCSL design approach. Minimal overheads has down to 250 mV in 65 nm CMOS, IEEE JSSC, vol. 43, pp. 256265,
been achieved by exploiting already existing signals in the QDI Jan. 2008.
[26] D. Bol et al., A 25 MHz 7 ultra-low-voltage microcon-
protocols. The proposed async SSAVS system has been bench- troller SoC in 65 nm LP/GP CMOS for low-carbon wireless sensor
marked against its conventional sync DVFS system counterpart nodes, in Proc. IEEE ISSCC, Feb. 2012, pp. 490492.
[27] S. Das et al., A self-tuning DVS processor using delay-error detection
for two scenarios, and their merits and disadvantages delineated. and correction, IEEE JSSC, vol. 41, no. 4, pp. 792804, Apr. 2006.
[28] S. Das et al., Razor II: In situ error detection and correction for PVT
and SER tolerance, IEEE JSSC, vol. 44, no. 1, pp. 3248, Jan. 2009.
REFERENCES [29] K. A. Bowman et al., A 45 nm resilient microprocessor core for dy-
[1] G. Chen, S. Hanson, D. Blaauw, and D. Sylvester, Circuit design ad- namic variation tolerance, IEEE JSSC, vol. 46, no. 1, pp. 194208,
vances for wireless sensing applications, in Proc. IEEE, Nov. 2010, Jan. 2011.
vol. 98, no. 11, pp. 18081827. [30] J. Mkip et al., Timing-error detection design considerations in sub-
[2] M. Hempstead, D. Brooks, and G.-Y. Wei, An accelerator-based wire- threshold: An 8-bit microprocessor in 65 nm CMOS, J. Low Power
less sensor network processor in 130 nm CMOS, IEEE JESTCAS, vol. Electron. Appl., vol. 2, no. 2, pp. 180196, 2012.
1, no. 2, pp. 193202, Jun. 2011. [31] O. C. Akgun, J. Rodrigues, and J. Spars, Minimum-energy sub-
[3] T. Reddy and D. Linden, Lindens Handbook of Batteries, 4th ed. : threshold self-timed circuits: Design methodology and a case study,
McGraw-Hill Professional, 2010. in Proc. 16th ASYNC, 2010, pp. 4151.
[4] Y. C. Lim, Frequency response masking approach for the synthesis [32] W.-C. Hsieh and W. Hwang, Adaptive power control technique on
of sharp linear phase digital filters, IEEE Trans. Circuits and Systems, power-gated circuitries, IEEE Trans. VLSI Syst., vol. 19, no. 7, pp.
vol. 33, no. 4, pp. 357364, Apr. 1986. 11671180, Jul. 2011.
[5] J. S. Chang and Y.-C. Tong, A micropower-compatible time-multi- [33] J. Spars, J. Staunstrup, and M. Dantzer-Sorensen, Design of delay in-
plexed SC speech spectrum analyzer design, IEEE JSSC, vol. 28, no. sensitive circuits using multi-ring structures, in Proc. European DAC,
1, pp. 4048, Jan. 1993. 1992, pp. 710.
[6] L. S. Nielsen et al., Low-power operation using self-timed circuits and [34] A. Kondratyev and K. Lwin, Design of asynchronous circuits using
adaptive scaling of the supply voltage, IEEE Trans. VLSI Syst., vol. 2, synchronous CAD tools, IEEE Design Test Comput., vol. 19, no. 4,
no. 4, pp. 391397, Dec. 1994. pp. 107117, 2002.
[7] M. Nakai et al., Dynamic voltage and frequency management for a [35] J. Cortadella et al., Coping with the variability of combinational logic
low power embedded microprocessor, IEEE JSSC, vol. 40, no. 1, pp. delays, in Proc. ICCD, Oct. 2004, pp. 505508.
2835, Jan. 2005. [36] D. Bol, Robust and energy-efficient ultra-low-voltage circuit design
[8] A. Raychowdhury et al., Computing with subthreshold leakage: De- under timing constraints in 65/45 nm CMOS, J. Low Power Electron.
vice/circuit/architecture co-design for ultralow-power subthreshold op- Appl., vol. 1, no. 1, pp. 119, 2011.
eration, IEEE Trans. VLSI Syst., vol. 13, pp. 12131224, Nov. 2005. [37] D. Bol et al., The detrimental impact of negative Celsius temperature
[9] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-Threshold De- on ultra-low-voltage CMOS logic, in Proc. ESSCIRC, Sep. 2010, pp.
sign for Ultra Low-Power Systems. : Springer, 2006. 522525.
Tong Lin received the B.Eng. (First Class Honours) Bah-Hwee Gwee (S93M97SM03) received
degree in electrical and electronic engineering from the B.Eng. degree in Electrical and Electronic
Nanyang Technological University (NTU), Singa- Engineering from University of Aberdeen, U.K., in
pore, in 2008 (with a full scholarship from Ministry 1990. He received the M.Eng. and Ph.D. degrees
of Education, Singapore). He went for an exchange from Nanyang Technological University (NTU),
program at University of Miami, USA, in 2006. Singapore, in 1992 and 1998 respectively.
He was also a recipient of the Nanyang Presidents He was an Assistant Professor in School of EEE,
Graduate Scholarship. He received the Best Student NTU from 1999 to 2005 and has been an Associate
Paper Award at IEEE Subthreshold Microelectronics Professor since 2005. He holds the concurrent ap-
Conference in 2012. pointment of Assistant Chair (Students) of School of
He is currently pursuing the Ph.D degree at NTU, EEE since 2010. He was the Principal Investigators
where he is a Research Associate with Temasek Laboratories. His current (PIs) of a number of research projects including the ASEAN-European Union
research interests include asynchronous-logic circuit design and ultra-robust University Network Programme, Ministry of Education Tier-1 and Tier-2, De-
ultra-low power circuit/system design. fence Science and Technology Agency and Temasek Laboratories projects. He
was also the co-PIs of DARPA (USA), NTU-Panasonic, NTU-Lingkping re-
search projects. His total research grant is amounting to more than US$5M. He
has filed and granted several USA and Singapore patents in circuit design. His
Kwen-Siong Chong (S03M09) received the research interests include sub-threshold/dynamic voltage scaling asynchronous
B.Eng., M.Phil. and Ph.D degrees in electrical and circuit, GALS NoC and Class-D amplifier designs.
electronic engineering from Nanyang Technological Dr. Gwee was the Chairman of IEEE Singapore Circuits and Systems Chapter
University (NTU), Singapore, in 2001, 2002, and in 2005 and in 2006. He has been the members of IEEE CAS Society DSP,
2007 respectively. VLSI and Bio-CAS Technical Committees since 2004. He has served in the Or-
He is presently a Senior Research Scientist with ganizing Committees for IEEE BioCAS-2004, IEEE APCCAS-2006, Technical
Temasek Laboratories @ NTU, Singapore. He was Program Chair for ISIC-2007, co-Chair for ISIC-2011 and served in the steering
a visiting researcher in Nara Institute of Science and committee for IEEE APCCAS 20062008. He has been an associate editor for
Technology, Japan, in 2010. He was the co-principal journal of Circuits, Systems and Signal Processing 20072012, an associate ed-
investigator/collaborator of the Defense Advanced itor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS IIEXPRESS BRIEFS
Research Projects Agency (USA) and Ministry of 20102011 and an associate editor for IEEE TRANSACTIONS ON CIRCUITS AND
Education Tier-2 (Singapore) research projects. His research interests include SYSTEMS IREGULAR PAPERS since 2012. He is a senior member of IEEE and
asynchronous VLSI designs, low-voltage low power VLSI circuits, audio was an IEEE Distinguished Lecturer for CAS Society in 2009/2010.
signal processing and soft-error tolerant designs.
Dr. Chong was the Secretary of IEEE Circuits and Systems (CAS) Society,
Singapore Chapter, in 2011 and 2012. He has been the member of IEEE CAS
Society VLSI Technical Committee since 2009. He is a member of IEEE.
Joseph S. Chang received the B.Eng in Electrical

and Computer Engineering from Monash University,
Australia, and the Ph.D. degree from the Depart-
ment of Otolaryngology, University of Melbourne,
Australia.
He is currently with Nanyang Technological
University (NTU), Singapore, where he was previ-
ously the Associate Dean of Research and Graduate
Studies at the College of Engineering. He is also
an Adjunct at Texas A&M University. Joseph is a
multi-disciplinary engineer and his research interests
encompass emerging technologies and traditional Circuits and System-related
fields, including printed electronics, microfluidics, life sciences, audiology,
psychophysics, acoustics, and biomedical and electronic devices.
He served as Editor of the Open Column, IEEE CIRCUITS AND SYSTEMS
MAGAZINE, Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND
SYSTEMS-I AND -II, Guest Editor for the Proceedings of the IEEE, Guest Editor
of the Circuits and Systems Magazine (Life Sciences Special Issue), and chair
of the Life Sciences Systems and Applications technical Committee of the
IEEE CAS society. He has chaired several international conferences, including
the IEEE-National Institutes of Health (NIH) Life Sciences Systems and Appli-
cations Workshop, the IEEE-NIH CAS Medical and Environmental Workshop,
and the International Symposium on Integrated Circuits and Systems. He
publishes prolifically and has been awarded 10 patents with several pending.
He has also been awarded numerous academic, defense and industrial grants,
exceeding $11M, including from Defense Advanced Research Projects Agency
(USA), E.U. grants, from multinational corporations, etc. He has founded two
startups in the field of electroacoustics, and has designed numerous related
products, adopted for industry and commercially.

Low Power Design

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Low Power Design

Hochgeladen von

Copyright:

Verfügbare Formate

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO.

2, FEBRUARY 2013 573

An Ultra-Low Power Asynchronous-Logic

W IRELESS SENSOR NETWORKs (WSNs) are increas-

0018-9200/$31.00 2012 IEEE

Fig. 1. Block diagram of the WSN node.

depicts the block diagram of one FB channel embodying an FIR

Consider first the robustness of the proposed async FB against

minimum reduces for reducing temperature for both FBs.

Joseph S. Chang received the B.Eng in Electrical

Das könnte Ihnen auch gefallen