Beruflich Dokumente
Kultur Dokumente
AbstractWe propose a Sub-threshold (Sub- ) Self-Adaptive of interest, comprising five main modules: Sensor Front-End,
Scaling (SSAVS) system for a Wireless Sensor Network with Signal Processor, Wireless Transceiver, Energy Source, and
the objective of lowest possible power dissipation for the prevailing
Power Management. As the WSN is typically designed for
throughput and circuit conditions, yet high robustness and with
minimal overheads. The effort to achieve the lowest possible power multiple-year operational life-span [1], power is carefully bud-
operation is by means of adjusting to the minimum voltage geted and where pertinent, energized only when required, such
(within 50 mV) for said conditions. High robustness is achieved by that the overall average power is typically 10100 [2].
adopting the Quasi-Delay-Insensitive (QDI) asynchronous-logic In our WSN depicted in Fig. 1, its overall active/passive
protocols where the circuits therein are self-timed, and by the
embodiment of our proposed Pre-Charged-Static-Logic (PCSL) operation ratio is approximately 20/80. In the passive mode,
design approach; when compared against competing approaches, only the Sensor Front-End module is continuously energized.
the PCSL is most competitive in terms of energy/operation, The Sensor and the Conditioning Circuits therein are powered
delay and IC area. By exploiting the already existing request directly by ( 2.8 V), a Lithium/Carbon Fluoride
and acknowledge signals of the QDI protocols, the ensuing over-
head of the SSAVS is very modest. The filter bank embodied
battery, via a Low-Dropout (LDO) Regulator.
in the SSAVS is shown to be ultra-low power and highly ro- The Simple Processor is powered by (1.2 V) via a
bust. When benchmarked against the competing conventional power-efficient Buck DC-DC Converter. The battery
Dynamic-Voltage-Frequency-Scaling (DVFS) synchronous-logic is appropriate largely because of its high energy density per
counterpart, no one system is particularly advantageous when the
weight and very wide operating temperature range (
operating conditions are known. However, when the competing
DVFS system is designed for the worst-case condition, the pro- to 160 ), congruent with that required of our WSN [3]. The
posed SSAVS system is somewhat more competitive, including Simple Processor ascertains if the input is possibly useful, and
uninterrupted operation while its self-adjusts to the varying if it is, the WSN goes into active mode where it signals the
conditions. Power Management module to energize the Signal Processor
Index TermsAdaptive scaling, asynchronous-logic cir- module via . The voltage of , typically in the
cuits, quasi-delay-insensitive circuits, sub-threshold operation, sub-threshold voltage (sub- ) range, is self-adjusted such
ultra-low power operation, wireless sensor networks.
that the lowest possible voltage is usedto enable ultra-low
power operation. The Signal Processor module buffers (via a
I. INTRODUCTION FIFO) the output of the Simple Processor, filters the output
signal before final computation by the Microcontroller Unit
ranges from 1.4 kHz to 1.4 MHz for a sampling rate range from novel self-adjustment is obtained very simplyby exploiting
0.1 kSamples/s (kS/s) to 100 kS/s. (and comparing) the existing Request and Acknowledge
Despite the potential advantages of sub- operation, this re- signals of the QDI protocol signaling, and thereafter
gion of operation is challenging here for several reasons. First, adjusting the accordingly (see Section III later). The
the WSN is designed to work in a wide range of conditions, ensuing overhead is hence very low.
including extreme environments ( to some- This paper is organized as follows. Section II reviews adap-
what similar to [14]. Second, Process, Voltage and Temperature tive scaling systems. Section III presents the design of the
(PVT) variations for fine-dimensioned CMOS processes in- proposed system. Section IV presents the measurement results
crease dramatically in sub- operation, and the ensuing delay of prototype ICs and benchmarking thereof. Finally, conclu-
variations are very severe, possibly intractable [9]. Typically, sions are drawn in Section V.
a very large delay safety margin (for synchronous-logic (sync)
circuits) would need to be allowed for, for example II. ADAPTIVE SCALING SYSTEMS
[14]. Third, the input signal to the Signal Processor module The general modality of adaptive scaling systems to re-
is variable. From a robust operation perspective, the circuits duce power is to adaptively adjust as low as possible (with
would need to be designed to meet the worst-case condi- appropriate timing margin) to meet the throughput requirement
tionsthe fastest input rate and extreme temperatures. for the prevailing operating conditions (including PVT varia-
To design the WSN for ultra-low power operation, we adopt tions). This largely requires the pertinent circuit delay variations
a self-adjusting approach whilst operating in the sub- to be tracked, observed, or inferred.
region, termed Sub-threshold Self-Adaptive Scaling A reported delay tracking technique is based on a Look-Up
(SSAVS) where the is in-situ dynamically self-adjusted. Table [19], [20] comprising tabulated pre-characterized
The modus operandi involves dialing up when the need throughput versus data according to critical path cir-
for computation increases or when the operating conditions are cuit delay(s) under worst-case PVT conditions for the given
less favorable, and is dialed-down when the conditions throughput. To avoid excessive timing margins, Statistical Static
are the converse. Put simply, the lowest is used where Timing Analysis [19] may be employed mostly to account for
possible because in general the lower the , the lower is local (within-die) variations. Another reported technique [21]
the power dissipation due to dynamic and leakage currents. attempts to track real-time variations by adding PVT sensors.
In this paper, we describe an SSAVS system for the Signal However, in sub- operation, because of the exponential rela-
Processor module in a WSN based on a proposed methodology tionship of sub- delay with PVT, even small errors in these
within the Quasi-Delay-Insensitive (QDI) asynchronous-logic sensor readings could lead to large circuit delay uncertainties,
(async) approach [6], [12], [14], [16], and with a novel in-situ and the overheads associated with the sensors may defeat any
self-adjusting means. The proposed design methodology, advantage. The reported critical path delay matching [22][26]
coined Pre-Charged Static-Logic (PCSL) [17], is essentially involves a ring oscillator matched to the critical path delay to
a static-logic library cell architecture that exploits the fast set the clock frequency, and is subsequently adjusted. For
reset feature and is appropriate for full-range Dynamic Voltage improved matching, the entire logic of the critical path may be
Scaling (DVS) [18]for ranging from nominal voltage replicated at high hardware cost [24]. Although this may be
to deep sub- . The proposed SSAVS system for the WSN is able to mitigate the delay uncertainties issues associated with
demonstrated by means of application to the FRM FB. The global PVT variations, it may not comprehensively account
LIN et al.: AN ULTRA-LOW POWER ASYNCHRONOUS-LOGIC IN-SITU SELF-ADAPTIVE SYSTEM FOR WIRELESS SENSOR NETWORKS 575
Fig. 2. Overall structure of the proposed SSAVS system with an async QDI FRM Filter Bank (FB); , ranges from 150 mV400 mV.
for local variations, particularly in sub- operation. Another tion of the computation. By counting the number of against
reported technique employs timing error detection/correction within a given period, we ascertain if the delay of the cir-
[27][30], where is reduced until the ensuing computation cuit is excessive, or otherwise, with respect to the throughput
is erroneous. is thereafter increased and the computa- for the prevailing conditions. is thereafter adjusted accord-
tion repeated. The applicability of this technique is arguably ingly such that the delay is just slightly less than the delay be-
limited due to the severe/intractable PVT variations in sub- tween input samples, thereby satisfying the throughput. Further,
operation, to possibly severe meta-stability issues due to the as is inherent in QDI async protocols, the computation is
lack of timing margin, and to the need for re-computations. uninterrupted while is transitioning during its self-adjust-
Another reported technique [31], [32] attempts to ascertain ment; in reported adaptive scaling systems, circuit opera-
the circuit delay indirectly by measuring the variations in the tion typically ceases when is transitioning [20]. Of specific
supply current drawn to infer the duration of the computation, interest, note that the delay is definitive because the delay is that
and subsequently adjusted. This technique is likely to be ascertained for the prevailing operating conditions, and we will
ambiguous in sub- operation where the ratio of the current show later that the associated hardware to adjust is very
during computation to idle is small. modest.
On the basis of the aforesaid review, it can be argued that At this juncture, to the best of our knowledge, ultra-low
these reported tracked, observed and inferred techniques are power QDI circuits with self-adaptive , operating in
inadequate in terms of robustness, particularly in sub- op- the sub- region and in extreme environments (hence re-
eration. Further, the hardware/computation overheads are con- quiring extremely high reliability), have yet to be reported or
siderable, including the need to scale with the scaling of demonstrated. Further it would be interesting to compare their
the clock frequency, i.e. Dynamic Voltage Frequency Scaling attributes, including IC area, delay, energy/operation
(DVFS). and power dissipation, against their conventional sync DVFS
We instead propose a definitive means by directly measuring counterpart and under various conditions (see Section IV later).
the delay and comparing it against the throughput for the pre-
vailing conditions, and is thereafter adjusted accordingly. III. SYSTEM DESIGN
To enable this, we adopt the self-timed async QDI (vis--vis the Fig. 2 depicts the proposed SSAVS system within the Power
conventional sync) where its dual-rail encoding includes the Re- Management module embodying the SSAVS Controller and its
quest signal which indicates that the input sample is ready associated adjustable means (a Buck DC-DC Converter),
and the Acknowledge signal that indicates the comple- and the PCSL-based 8 8-Bit Quad-Channel Async QDI FRM
576 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 2, FEBRUARY 2013
Fig. 3. An example of the variation of with time. The logical numbers on the ordinate are and their corresponding DC voltages .
TABLE I
OPERATION OF THESSAVS CONTROLLER
FB within the FRM FB. There are two voltage rails in the , and the speed of the FB would far exceed
overall proposed SSAVS system: a fixed and the required computation. In this scenario, the number of FB
a variable whose sub- voltage typically ranges from clocks will be equal to the number of clocks in
150 mV to 400 mV. For ease of illustration, the specific rail each period. In the next
is shown in parenthesis for the supply rails and for signals of the period, the SSAVS Controller will subsequently decrement
various modules. In Fig. 2, the voltage of and of sig- by 1 bit to 10110 and correspondingly
nals is first adjusted from to by the reduces by 50 mV to 1.15 V. The process continues where
Step-Down Level Converter, and are thereafter buffered by the is continuously decremented as with the voltage
Async FIFO Buffer (depth of 50) before input ( and of commensurably reduced. Eventually, at period
) to the async FRM FB. The FB outputs ( 14) in Fig. 3, is decremented to 00010, equivalently
and their associated (combined from 14 via the Com- . This is the juncture where the speed of
pletion Detection Circuit) are output to the MCU for further pro- the FRM FB is just slightly slower than the data rate for
cessing. is also fed back to the Async FIFO Buffer. The the prevailing conditionsthe number of clocks hence
and signals are input to the Power Management module, exceeds the number of clocks in one
and is stepped up from to . The SSAVS period. Although the speed of the FRM FB is slightly too slow,
Controller within the Power Management module monitors the no error occurs because the unconsumed inputs are stored in
number of and signals in each pe- the Async FIFO Buffer (Fig. 2). In the next period, ,
riod (a 10 Hz clock generated by the Update Clock Gen- the SSAVS Controller reacts accordingly by incrementing
erator for a target throughput of 1 kS/s). The is a by 1 bit to 00011 and the corresponding
5-bit code that sets one of 24 voltage levels (in the Buck DC-DC increased by 50 mV to 200 mV. With increased, the
Converter) ranging from ' to ' speed of the FRM FB now slightly exceeds the required com-
(in 50 mV steps) for . putation and the unconsumed inputs stored in the FIFO buffer
Fig. 3 graphically depicts an example of the self-ad- are in turn computed at a slightly faster rate than
justment of . When the WSN is first initiated, the the data rate. Consequently, the number of clocks is
SSAVS Controller outputs ', equivalently now less than the number of clocks and at the end of this
LIN et al.: AN ULTRA-LOW POWER ASYNCHRONOUS-LOGIC IN-SITU SELF-ADAPTIVE SYSTEM FOR WIRELESS SENSOR NETWORKS 577
Fig. 4. (a) Proposed Pre-Charged Static-Logic (PCSL) architecture, and six basic cells embodying the proposed PCSL dual-rail QDI realization approach:
(b) 2-input AND/NAND gate, (c) 2-input OR/NOR gate, (d) 3-input AO/AOI gate, (e) 3-input OA/OAI gate, (f) 2-input XOR/XNOR gate, and (g) 2-input MUX.
TABLE II
ENERGY-PER-OPERATION, DELAY AND IC AREA OF DUAL-RAIL LIBRARY CELLS EMBODYING
VARIOUS APPROACHES @ AND 130 nm CMOS PROCESS
period, all unconsumed inputs in the FIFO may have been required minimum. Hence, the FB is ultra-low power and highly
cleared; if not, the voltage of remains (or increased power-efficient. Note that the overheads for this self-adjusting
further) in the next time period(s). If cleared, in the next period are very modest (a counter) and the circuit operation is
, the number of clocks again equals to the number of uninterrupted whilst transitions.
clocks (as in time periods preceding ). This is the same In view of the need for sub- operation, it is imperative
scenario where the FB, as a consequence of the slightly raised to adopt circuits based on the static-logic family to mitigate
, is capable of computing faster than the data the effects of critical transistor sizing [9]; dynamic- and pass-
rate. In the next period , the scenario is that as in period , logic families are inappropriate [18]. Fig. 4(a) depicts the basic
and the operation repeats accordingly. Table I summarizes the architecture of our proposed async cells, coined Pre-Charged
three operational conditions. Static-Logic (PCSL) [17]. This basic architecture comprises
In short, the voltage of of the FB is in-situ adaptively an Inverting Static-Logic Cell, three transistors (for output pre-
self-adjusted to be as low as possible (within 50 mV) to meet charging during the reset phase/evaluation during the computa-
the throughput for the prevailing operating conditions, and on tion phase), and two inverters (for output buffering). The out-
average, the voltage of is slightly higher than the actual puts are (Output True) and (Output False). In PCSL
578 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 2, FEBRUARY 2013
Fig. 5. Reported dual-rail AND/NAND circuit designs: (a) Delay-Insensitive-Minterm-Synthesis (DIMS), (b) NULL Convention Logic (NCL) with complex
gates (NCL1), and (c) NCL with fast-reset complex gates (NCL2).
Fig. 6. Block diagram of one channel of the 8 8-Bit Quad-Channel Async QDI FRM FB.
cells, when is 0, both outputs are 0. On the other hand, of cells embodying the reported DIMS, NCL1, and NCL2 ap-
when is 1 (indicating that an operation is ready) and when proaches is significantly higher: 4.0 , 1.6 , and 1.9 respec-
the input signals are valid, the operation commences and an en- tively. It is also apparent that the cells embodying the proposed
suing output is obtained. The architecture of the PCSL cell in- PCSL approach feature the shortest delay (the sum of two com-
volves an integration of the subcircuit associated with the ponents, (computation phase) and (reset phase), aver-
signal and a buffer (to each output) into the standard static-logic aged over all input combinations), save the simple AND/NAND
library cell (redesigned for dual-rail async), thereby sharing of and OR/NOR gates of NCL1. On average, the reported DIMS,
(common) transistors. This reduces the number of transistors, NCL1, and NCL2 cells are significantly slower: 4.1 , 1.8 ,
resulting in simultaneous lower power/energy dissipation, faster and 1.9 respectively. It is also apparent that the cells em-
speed and smaller IC area (see Table II later). On the basis of bodying the proposed PCSL approach require the smallest IC
this architecture, Figs. 4(b)(g) depict the schematic of six basic area; the layouts are based on the standard-cell approach where
PCSL cells (all with 3-transistor limit in any stack). the cell height is fixed at 4 and the cell width is in multi-
To depict the hardware advantage of the proposed PCSL ples of 0.4 . On average, the IC area required for cells em-
approach, the 2-input AND/NAND gate in Fig. 4(b) can bodying the reported DIMS, NCL1, and NCL2 approaches is
be compared to the same gate realized by three reported significantly larger: 4.7 , 2.6 , and 2.7 respectively; from
static-logic QDI approaches in Figs. 5(a)(c): (a) Delay-Insen- a perspective of dual-rail async and (single-rail) sync circuits,
sitive-Minterm-Synthesis (DIMS) approach [33], (b) NULL the smaller IC area is worthwhile because the IC area overhead
Convention Logic (NCL) with complex gates [34] (denoted of the former is somewhat mitigated. In short, cells embodying
NCL1), and (c) NCL with fast-reset complex gates [35] (de- the proposed PCSL approach simultaneously exhibit the lowest
noted NCL2). On the basis of simulations (130 nm CMOS), , shortest delay and smallest IC area.
Table II benchmarks , delay and IC area of the aforesaid With the proposed PCSL QDI realization approach, an 8
six basic cells of the various approaches. The competing cells 8-Bit Quad-Channel Async QDI FRM FB is designed. A semi-
are normalized to the PCSL cells whose actual values are custom design flow is adopted, where the front-end is designed
shown within parentheses. The average attributes are tabulated using an assortment of in-house design tools and commercial
in the last row. synthesis tools based on a flow similar to NCL-X [34]. The
It is apparent from Table II that the cells embodying the pro- back-end implementation, on the other hand, is based on com-
posed PCSL approach feature the lowest , save the simple mercial EDA tools with our customized library cells (including
AND/NAND and OR/NOR gates of NCL1. On average, the proposed PCSL). Each FB channel is independent and Fig. 6
LIN et al.: AN ULTRA-LOW POWER ASYNCHRONOUS-LOGIC IN-SITU SELF-ADAPTIVE SYSTEM FOR WIRELESS SENSOR NETWORKS 579
Fig. 7. Die microphotograph (left) and layout (right) of the fabricated test-chips: (a) proposed SSAVS system with async QDI FRM filter bank, and (b) sync
benchmark filter.
Fig. 9. Example of the captured waveforms depicting (a) self-adjustment of and from the async QDI FRM filter bank, and (b) self-adjustment of
and under sudden temperature drop.
Fig. 10. Variation of the sync filter critical path delay under various PVT conditions: Monte Carlo simulations.
1 and Latch 2, and subsequently multiplied by the 8-Bit PCSL until the last tap of the filter. When (one of 14
Multiplier. The multiplication product is captured by Latch 4 in Fig. 2) is finally computed, the Async Read/Write Controller
and sign-extended to 20 bits to accommodate potential over- of each channel will assert its clock to indicate completion.
flow. The 20-Bit PCSL Adder is used to add this product to the The overall clock is output to the Async FIFO Buffer which
accumulated product stored in Latch 3. The result of the adder subsequently resets and de-asserts the clock.
is looped back to Latch 3, thereby updating its value and com- This in turn resets all FB channels and the system is now ready
pleting the first MAC operation. The MAC operation repeats to process the next input data from the FIFO.
LIN et al.: AN ULTRA-LOW POWER ASYNCHRONOUS-LOGIC IN-SITU SELF-ADAPTIVE SYSTEM FOR WIRELESS SENSOR NETWORKS 581
Fig. 11. Scenario 1: Benchmarking delay and of a sync DVFS filter bank and the async SSAVS filter bank for three temperature corners: (a) ,
(b) 25 , and (c) 125 . Note: Bold lines are measured while dotted lines are from simulations.
IV. RESULTS AND BENCHMARKING from a pattern generator) and comparing the ensuing output
We will first demonstrate the robustness of the proposed data (by means of a logic analyzer) with that expected. We
async FB to PVT variations, particularly large and tem- will thereafter delineate the efficacy of the SSAVS system em-
perature variations, on the basis of physical measurements bodying the async FB and benchmark it against the competing
on prototype ICs (@130 nm CMOS) embodying the SSAVS conventional DVFS system embodying a sync filter. The die
system and the FB, and where pertinent, by simulations. microphotograph of DVFS system embodying one sync FB
Fig. 7(a) depicts the die microphotograph (left) and its layout channel is depicted in the left of Fig. 7(b) and on the right, the
(right). The async FB embodying 4 channels occupies an IC layout; the 4-channel sync FB would occupy ,
area of . All 30 prototype ICs tested were fully or smaller than the async FB. The lowest functional
functional for , and this of the sync filter (probably attributed to the hold time
in some sense corroborates the robustness of the design. The violations of registers therein [36]) is 200 mV, a minimum
functionality was verified by sampling the input data (generated voltage higher than that of the async FB (130 mV).
582 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 2, FEBRUARY 2013
Fig. 13. Scenario 2: Benchmarking delay and of a sync DVFS filter bank and the async SSAVS filter bank for three temperature corners: (a) ,
(b) 25 , and (c) 125 . Note: Bold lines are measured while dotted lines are from simulations.
in Figs. 1114. For completeness, the delays @Upper/Lower delay increases for decreasing temperature. Third, with the tem-
and 10% obtained by simulations for the async FB are also perature ascertained by the sensor, the delay variations, hence
plotted. the ensuing delay safety margins of the sync FB, are relatively
Figs. 11(a)(c) depict the delay (for computing one sample, small (vis--vis Scenario 2, see later). Consequently and not un-
equivalent 14 clock cycles) and at the three aforesaid tem- expectedly, the delay of the sync FB for 25 and 125 is
perature corners; as we are only able to measure at (in- largely comparable to its async counterpart at its nominal con-
stead of ), the remarks henceforth for the extreme cold dition. Fourth, the delay of the sync FB is longer @ on
temperature is for operation at . Note that is ascer- average, 4.0 longer than the async FB. This can be attributed
tained at each over the delay of computing one sample. to the longer delay at for compared to that at 125 .
On the basis of the delay plots, we remark the following. First, On the basis of the plots, we remark the following.
in general and as expected, the delay increases with reducing First, in general and as expected, the minimum for both
for both FBs. Second, also in general and for both FBs, the FBs decreases as the temperature decreases. Second, for
584 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 2, FEBRUARY 2013
the delay of the circuits increases with decreasing temperature [10] K.-S. Chong, B.-H. Gwee, and J. S. Chang, Energy-efficient syn-
(vis--vis increasing temperature in supra- ), that the delay at chronous-logic and asynchronous-logic FFT/IFFT processors, IEEE
JSSC, vol. 42, no. 9, pp. 20342045, Sep. 2007.
increases the most at the extreme cold temperature (vis--vis [11] S. Hanson et al., Exploring variability and performance in a sub-
at other temperatures), that at very low the leakage current 200-mV processor, IEEE JSSC, vol. 43, no. 4, pp. 881891, Apr.
2008.
is dominant (over dynamic), that the leakage current is exponen- [12] E. Beigne et al., An asynchronous power aware and adaptive NoC
tially related to temperature, and that because the FB is a rela- based circuit, IEEE JSSC, vol. 44, no. 4, pp. 11671177, Apr. 2009.
[13] I. J. Chang, S. P. Park, and K. Roy, Exploring asynchronous design
tively simple circuit, the delay of the critical path of the sync FB techniques for process-tolerant and energy-efficient subthreshold oper-
is only slightly longer than its non-critical paths (explaining the ation, IEEE JSSC, vol. 45, no. 2, pp. 401410, Feb. 2010.
relatively low delay of the sync FB, particularly in Scenario 1). [14] R. D. Jorgenson et al., Ultralow-power operation in subthreshold
regimes applying clockless logic, in Proc. IEEE, Feb. 2010, vol. 98,
Overall, this benchmarking depicts that in Scenario 1, no spe- no. 2, pp. 299314.
cific FB is particularly advantageousthe sync DVFS FB and [15] K.-S. Chong et al., Synchronous-logic and globally-asynchronous-lo-
cally-synchronous (GALS) acoustic digital signal processors, IEEE
async SSAVS FB are advantageous in different conditions. Nev- JSSC, vol. 47, no. 3, pp. 769780, Mar. 2012.
ertheless, the sync FB may be disadvantageous if the tempera- [16] A. J. Martin and M. Nsytrom, Asynchronous techniques for
system-on-chip designs, in Proc. IEEE, Jun. 2006, vol. 96, no. 6, pp.
ture sensor overheads associated with DVFS for Scenario 1 are 11041115.
considered. In Scenario 2, the async FB is advantageous in terms [17] J. S. Chang, B.-H. Gwee, and K.-S. Chong, Asynchronous-Logic for
of reduced delay with respect to , usually lower with Full Dynamic Voltage Scaling, US Provisional Patent Application No.
61/364,478, Jul. 15, 2010.
respect to , and in terms of power dissipation, advantageous [18] J. S. Chang, Digital Asynchronous-Logic: Dynamic Voltage Control
in some conditions (while the sync advantageous in other condi- Final Technical Report for DARPA Project, HR0011-09-2-0006, Aug.
2010, et al..
tions). Further, in the context of continuous circuit operation and [19] J. Kwong et al., A 65 nm sub-Vt microcontroller with integrated
overheads associated with DVS, the proposed SSAVS is advan- SRAM and switched-capacitor DC-DC converter, IEEE JSSC, vol.
44, no. 1, pp. 115126, Jan. 2009.
tageous over the conventional DVFS in terms of uninterrupted [20] D. N. Truong et al., A 167-processor computational platform in 65
circuit operation and not requiring external intervention (such nm CMOS, IEEE JSSC, vol. 44, no. 4, pp. 11301144, Apr. 2009.
[21] J. Tschanz et al., Adaptive frequency and biasing techniques for tol-
as changing clock rate, pre-characterization, etc.). erance to dynamic temperature-voltage variations and aging, in Proc.
IEEE ISSCC, Feb. 2007, pp. 292293.
V. CONCLUSIONS [22] J. Kao, M. Miyazaki, and A. Chandrakasan, A 175-mV multiply-ac-
cumulate unit using an adaptive supply voltage and body bias architec-
We have proposed an SSAVS system for a WSN with the ture, IEEE JSSC, vol. 37, no. 11, pp. 15451554, Nov. 2002.
[23] B. H. Calhoun and A. P. Chandrakasan, Ultra-dynamic voltage scaling
objective of lowest possible power operation for the prevailing (UDVS) using sub-threshold operation and local voltage dithering,
throughput and circuit conditions adjusted to within 50 IEEE JSSC, vol. 41, pp. 238245, Jan. 2006.
[24] M. Elgebaly and M. Sachdev, Variation-aware adaptive voltage
mV of the minimum voltage, yet high operational robustness scaling system, IEEE Trans. VLSI Syst., vol. 15, no. 5, pp. 560571,
with minimal overheads. High robustness has been achieved May 2007.
[25] Y. Ramadass and A. Chandrakasan, Minimum energy tracking loop
by adopting the async QDI protocols, and the embodiment of with embedded DC-DC converter enabling ultra-low-voltage operation
our proposed PCSL design approach. Minimal overheads has down to 250 mV in 65 nm CMOS, IEEE JSSC, vol. 43, pp. 256265,
been achieved by exploiting already existing signals in the QDI Jan. 2008.
[26] D. Bol et al., A 25 MHz 7 ultra-low-voltage microcon-
protocols. The proposed async SSAVS system has been bench- troller SoC in 65 nm LP/GP CMOS for low-carbon wireless sensor
marked against its conventional sync DVFS system counterpart nodes, in Proc. IEEE ISSCC, Feb. 2012, pp. 490492.
[27] S. Das et al., A self-tuning DVS processor using delay-error detection
for two scenarios, and their merits and disadvantages delineated. and correction, IEEE JSSC, vol. 41, no. 4, pp. 792804, Apr. 2006.
[28] S. Das et al., Razor II: In situ error detection and correction for PVT
and SER tolerance, IEEE JSSC, vol. 44, no. 1, pp. 3248, Jan. 2009.
REFERENCES [29] K. A. Bowman et al., A 45 nm resilient microprocessor core for dy-
[1] G. Chen, S. Hanson, D. Blaauw, and D. Sylvester, Circuit design ad- namic variation tolerance, IEEE JSSC, vol. 46, no. 1, pp. 194208,
vances for wireless sensing applications, in Proc. IEEE, Nov. 2010, Jan. 2011.
vol. 98, no. 11, pp. 18081827. [30] J. Mkip et al., Timing-error detection design considerations in sub-
[2] M. Hempstead, D. Brooks, and G.-Y. Wei, An accelerator-based wire- threshold: An 8-bit microprocessor in 65 nm CMOS, J. Low Power
less sensor network processor in 130 nm CMOS, IEEE JESTCAS, vol. Electron. Appl., vol. 2, no. 2, pp. 180196, 2012.
1, no. 2, pp. 193202, Jun. 2011. [31] O. C. Akgun, J. Rodrigues, and J. Spars, Minimum-energy sub-
[3] T. Reddy and D. Linden, Lindens Handbook of Batteries, 4th ed. : threshold self-timed circuits: Design methodology and a case study,
McGraw-Hill Professional, 2010. in Proc. 16th ASYNC, 2010, pp. 4151.
[4] Y. C. Lim, Frequency response masking approach for the synthesis [32] W.-C. Hsieh and W. Hwang, Adaptive power control technique on
of sharp linear phase digital filters, IEEE Trans. Circuits and Systems, power-gated circuitries, IEEE Trans. VLSI Syst., vol. 19, no. 7, pp.
vol. 33, no. 4, pp. 357364, Apr. 1986. 11671180, Jul. 2011.
[5] J. S. Chang and Y.-C. Tong, A micropower-compatible time-multi- [33] J. Spars, J. Staunstrup, and M. Dantzer-Sorensen, Design of delay in-
plexed SC speech spectrum analyzer design, IEEE JSSC, vol. 28, no. sensitive circuits using multi-ring structures, in Proc. European DAC,
1, pp. 4048, Jan. 1993. 1992, pp. 710.
[6] L. S. Nielsen et al., Low-power operation using self-timed circuits and [34] A. Kondratyev and K. Lwin, Design of asynchronous circuits using
adaptive scaling of the supply voltage, IEEE Trans. VLSI Syst., vol. 2, synchronous CAD tools, IEEE Design Test Comput., vol. 19, no. 4,
no. 4, pp. 391397, Dec. 1994. pp. 107117, 2002.
[7] M. Nakai et al., Dynamic voltage and frequency management for a [35] J. Cortadella et al., Coping with the variability of combinational logic
low power embedded microprocessor, IEEE JSSC, vol. 40, no. 1, pp. delays, in Proc. ICCD, Oct. 2004, pp. 505508.
2835, Jan. 2005. [36] D. Bol, Robust and energy-efficient ultra-low-voltage circuit design
[8] A. Raychowdhury et al., Computing with subthreshold leakage: De- under timing constraints in 65/45 nm CMOS, J. Low Power Electron.
vice/circuit/architecture co-design for ultralow-power subthreshold op- Appl., vol. 1, no. 1, pp. 119, 2011.
eration, IEEE Trans. VLSI Syst., vol. 13, pp. 12131224, Nov. 2005. [37] D. Bol et al., The detrimental impact of negative Celsius temperature
[9] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-Threshold De- on ultra-low-voltage CMOS logic, in Proc. ESSCIRC, Sep. 2010, pp.
sign for Ultra Low-Power Systems. : Springer, 2006. 522525.
586 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 48, NO. 2, FEBRUARY 2013
Tong Lin received the B.Eng. (First Class Honours) Bah-Hwee Gwee (S93M97SM03) received
degree in electrical and electronic engineering from the B.Eng. degree in Electrical and Electronic
Nanyang Technological University (NTU), Singa- Engineering from University of Aberdeen, U.K., in
pore, in 2008 (with a full scholarship from Ministry 1990. He received the M.Eng. and Ph.D. degrees
of Education, Singapore). He went for an exchange from Nanyang Technological University (NTU),
program at University of Miami, USA, in 2006. Singapore, in 1992 and 1998 respectively.
He was also a recipient of the Nanyang Presidents He was an Assistant Professor in School of EEE,
Graduate Scholarship. He received the Best Student NTU from 1999 to 2005 and has been an Associate
Paper Award at IEEE Subthreshold Microelectronics Professor since 2005. He holds the concurrent ap-
Conference in 2012. pointment of Assistant Chair (Students) of School of
He is currently pursuing the Ph.D degree at NTU, EEE since 2010. He was the Principal Investigators
where he is a Research Associate with Temasek Laboratories. His current (PIs) of a number of research projects including the ASEAN-European Union
research interests include asynchronous-logic circuit design and ultra-robust University Network Programme, Ministry of Education Tier-1 and Tier-2, De-
ultra-low power circuit/system design. fence Science and Technology Agency and Temasek Laboratories projects. He
was also the co-PIs of DARPA (USA), NTU-Panasonic, NTU-Lingkping re-
search projects. His total research grant is amounting to more than US$5M. He
has filed and granted several USA and Singapore patents in circuit design. His
Kwen-Siong Chong (S03M09) received the research interests include sub-threshold/dynamic voltage scaling asynchronous
B.Eng., M.Phil. and Ph.D degrees in electrical and circuit, GALS NoC and Class-D amplifier designs.
electronic engineering from Nanyang Technological Dr. Gwee was the Chairman of IEEE Singapore Circuits and Systems Chapter
University (NTU), Singapore, in 2001, 2002, and in 2005 and in 2006. He has been the members of IEEE CAS Society DSP,
2007 respectively. VLSI and Bio-CAS Technical Committees since 2004. He has served in the Or-
He is presently a Senior Research Scientist with ganizing Committees for IEEE BioCAS-2004, IEEE APCCAS-2006, Technical
Temasek Laboratories @ NTU, Singapore. He was Program Chair for ISIC-2007, co-Chair for ISIC-2011 and served in the steering
a visiting researcher in Nara Institute of Science and committee for IEEE APCCAS 20062008. He has been an associate editor for
Technology, Japan, in 2010. He was the co-principal journal of Circuits, Systems and Signal Processing 20072012, an associate ed-
investigator/collaborator of the Defense Advanced itor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS IIEXPRESS BRIEFS
Research Projects Agency (USA) and Ministry of 20102011 and an associate editor for IEEE TRANSACTIONS ON CIRCUITS AND
Education Tier-2 (Singapore) research projects. His research interests include SYSTEMS IREGULAR PAPERS since 2012. He is a senior member of IEEE and
asynchronous VLSI designs, low-voltage low power VLSI circuits, audio was an IEEE Distinguished Lecturer for CAS Society in 2009/2010.
signal processing and soft-error tolerant designs.
Dr. Chong was the Secretary of IEEE Circuits and Systems (CAS) Society,
Singapore Chapter, in 2011 and 2012. He has been the member of IEEE CAS
Society VLSI Technical Committee since 2009. He is a member of IEEE.