Beruflich Dokumente
Kultur Dokumente
1, MARCH 2013 23
Abstract—Microcontrollers play a vital role in embodying tion. This paradigm is known as the internet of things (IoT).
intelligence into battery-powered everyday objects to realize the The realization of IoT requires everyday objects to be able to
internet of things (IoT). The desirable attributes of such a micro- harvest/store/use energy. In general, objects are either ac-pow-
controller and the like include high energy and area efficiency,
and robust error-free operation under dynamic voltage scaling ered (power consumption not critical), rechargeable (runs for a
(DVS), workload, process, voltage, and temperature (PVT) vari- few days with one recharge), or battery-powered (runs for a few
ation effects. In this work, a synchronous-logic and a years with standard batteries, or indefinitely on harvested en-
quasi-delay-insensitive asynchronous-logic 8051 micro- ergy). AC-powered or rechargeable objects are currently well-
controller core are designed and fabricated for full-range DVS connected to the internet via various well-established wired/
from nominal to deep sub-threshold. The performance of the
and are largely comparable at nominal conditions wireless protocols. However, these objects, in comparison, play
and the entire DVS range, but differs when PVT and workload are only a very small part in the IoT. A large number of everyday
varied. At nominal , both the microcontroller cores feature battery-powered objects are energy-constrained and thus chal-
comparable energy and speed, with the electromagnetic inter- lenging for efficient internet connectivity. Nevertheless, these
ference of the lower and the area larger objects are usually low-cost, purpose-specific, and highly suit-
than the . When DVS is applied, both the microcontroller
cores feature comparable energy and speed; the requires able for crowd sourcing. Connecting these objects to the internet
simultaneous adjustment of clock frequency with . At wide not only completes the IoT framework but also serves as an en-
PVT variations, up to delay margins are required for the abler for applications in remote health-care, remote monitoring,
, whereas the operates at actual speed. When the smart transport, and logistics.
workload of both microcontrollers is varied, the features Embodying internet/networking capabilities on everyday
lower energy dissipation per workload due to the exploitation of
its asynchronous-logic protocols. For IoT applications that incur battery-powered objects challenges circuit and system de-
wide PVT and workload variations, is more suitable due to signers in all aspects of software and hardware microcontroller
its self-timed nature, whereas when PVT and workload variations design [1]. Designing microcontrollers for use in battery-pow-
are less severe, is more suitable due to a smaller IC area. ered objects is challenging primarily due to unpredictable
Index Terms—Asynchronous logic, dynamic voltage scaling, mi- quantity and quality of the energy source. The quantity of
crocontrollers, ubiquitous sensors. energy is unpredictable due to the energy-constrained operation
modality of such objects, e.g., reliability/capacity of standard
batteries, and sparse availability of solar, temperature, piezo-
I. INTRODUCTION
electric, radio-frequency energy, etc. The quality of energy is
I N THE future, everyday objects will ubiquitously embody also unpredictable due to the highly variable nature of such
digital sensing, communication, and processing capabilities sources of energy. Hence, the desirable attributes of such a
and are capable of collecting, processing, and relaying informa- microcontroller and the like include high-speed, high-energy
efficiency, small integrated circuit (IC) area, and features
error-free robust operation at wide operation space (dynamic
Manuscript received July 31, 2012; revised October 29, 2012; accepted Jan-
uary 01, 2013. Date of publication February 26, 2013; date of current version voltage scaling (DVS) and workload) and wide variation space
March 07, 2013. This paper was recommended by Guest Editor A.-Y. Wu. [process, voltage, and temperature (PVT)].
K.-L. Chang is with Institute of Materials Research and Engineering At the system-level, there is a choice between two general
(IMRE), A*STAR, Synthesis and Integration Group, 117602, Singapore
(e-mail: changkl@imre.a-star.edu.sg). digital-logic design philosophies—the prevalent conventional
J. S. Chang is with Nanyang Technological University, School of Electrical (clocked) synchronous-logic (sync) approach [2], the (clock-
and Electronic Engineering, Division of Circuits and Systems, 639798, Singa- less) asynchronous-logic (async) approach [3], and the hybrid
pore (e-mail: ejschang@ntu.edu.sg).
B.-H. Gwee is with Nanyang Technological University, School of Electrical globally async locally sync (GALS) approach [4]. In sync sys-
and Electronic Engineering, 639798 Singapore (e-mail: ebhgwee@ntu.edu.sg). tems, the operation between circuit modules are synchronized
K.-S. Chong is with Temasek Laboratories, Nanyang Technological Uni- according to a global timing reference (the global clock or sub-
versity, Integrated Systems Research Laboratory, 639798, Singapore (e-mail:
kschong@ntu.edu.sg). multiple clocks thereof) distributed via a clock infrastructure.
Digital Object Identifier 10.1109/JETCAS.2013.2243031 The clock frequency is ascertained such that the delay of every
operation is within the period of said clock, and the critical path static timing analysis (SSTA) can be employed to significantly
or slowest module typically defines the maximum clock fre- reduce delay margins of sync circuits. Nevertheless, when the
quency. For sake of error-free robust operation, the maximum entire operation and variation space is specified, SSTA and the
clock frequency is limited by the worst-case condition in the conventional approaches are comparable.
entire specified operation and variation space. It is well estab- In the case of QDI async systems, DVS simply involves
lished that in order to optimize energy and speed, the clock fre- varying and the circuits therein innately operate at their
quency should be adjusted accordingly to match the delay of maximum (actual) speed. Put simply, the QDI async system
the critical path [5] as the operating condition changes within innately accommodates the change in the operation space and
the specified operation and variation space, particularly at nom- variation space to achieve error-free operation.
inal and best-case conditions thereof. The async approach, on From a system’s viewpoint, it is essential to identify a system
the other hand, employs distributed local timing references in- (conventional sync or QDI async) for a specific operation
stead of a global timing reference. Async systems, particularly and variation space. In [17], a comparison between a sync
quasi-delay-insensitive (QDI) async [6], are self-timed where and async 8051 microcontroller is reported. However, the
local async circuit modules independently operate at their max- microcontrollers are fabricated on different dies (embodying
imum (actual) speed [7] in the specified operation and variation different memories) and are compared at the nominal supply
space, whereas its sync counterpart requires clock frequency ad- voltage with no variations considered. In [18], a synchronous
justments to achieve the same. microcontroller is desynchronized to an asynchronous equiva-
In many applications, there are instances of the need for lent, however the microcontrollers are similarly fabricated on
both high and modest computation speeds. An established a different die, and comparisons are done at nominal supply
and well-adopted method to increase the energy efficiency voltage with no variations considered. To the best of our knowl-
(hence reduce power dissipation) of these applications is DVS edge, there has not hitherto been a direct comparison between
where the supply voltage , is at nominal (high) voltage the conventional sync system and the QDI async system to
for the former instances and reduced (near-threshold and delineate and compare their attributes at wide operation and
sub-threshold voltages) for the latter other instances, thereby variation space, and of particular interest, where the range of
facilitating a trade-off between energy and speed: high energy, the DVS is full-range—from nominal voltage to sub-threshold
high speed nominal voltage low energy, medium speed voltage. Put differently, at this juncture, despite the maturity
near threshold voltage highest energy efficiency, low of the conventional sync and QDI async design philosophies
speed (sub-threshold voltage); other methods include clock and reported designs, the question of which design philosophy
gating, power gating, etc. One possible application is the en- is advantageous for full DVS remains somewhat contentious;
ergy-critical hearing aid [8], [9] where there are instances of noting that the availability of EDA tools, test and verification
high computation (e.g., noise reduction [10] and microphone methodologies and general acceptance are also important
directivity) and low computation (quiet conditions [11]). considerations in this question.
For sync systems, this method is appropriately known as dy- In this paper, we attempt to compare the speed, energy and
namic voltage and frequency scaling (DVFS) [12], [13] where IC area at wide operation and variation space between a sync
DVFS simultaneously scales the clock frequency (speed) with and an async system—a sync 8051 and an async 8051
respect to (the clock frequency is reduced for lower . To enable the comparison to be as equitable as pos-
) to accommodate the ensuing increased delay of the sible, they are both fabricated on the same die using 130 nm
critical path(s). Correlating the clock frequency (delays) with CMOS, and are based on the same standard library cells and
typically involves firmware programming by means of shared standard library memories; clock-gating is applied on
post-silicon timing characterization at every possible in the where possible. Both the and are de-
DVFS [14]. Further, delay margins are added to the ascertained signed for low to mid speed (50 MHz), and are realized with
clock frequency at every possible to accommodate the static-logic gates for robustness for full-range DVS; the reasons
worst-case condition in the specified PVT variation space. for adopting static-logic (vis-à-vis dynamic-logic) is delineated
Delay margins incur both energy and speed overheads, partic- in Section II. There is no deliberate attempt to optimize var-
ularly at the nominal and best-case conditions in the specified ious cell designs by means of custom designs. On the basis of
PVT variation space. It is generally accepted that because simulations and measurements on prototype ICs, the and
variations (due to PVT) increase as reduces (especially are compared at wide operation and variation spaces.
at sub-threshold voltages vis-à-vis at nominal voltage) and as The energy and speed of the and are largely com-
the minimum feature size of the fabrication technology scales parable at nominal conditions and the entire DVS range, but dif-
down [15], delay margins would need to be even more con- fers when PVT and workload are varied. At nominal , both
servative. For example, at sub-threshold voltages for 130-nm the microcontroller cores feature comparable energy and speed,
complementary metal–oxide–semiconductor (CMOS), a delay with the electromagnetic interference of the
margin of [14] is required to accommodate variations lower and the area larger than the . When DVS is
due solely to (process in PVT variations); a larger delay applied, both the microcontroller cores feature comparable en-
margin is not excessive in view of the other variations (voltage ergy and speed; the requires simultaneous adjustment
and temperature in PVT variations) [1], [16]. It is worth of clock frequency with . At wide PVT variations, up to
noting that when the specified operation and variation space delay margins are required for the , whereas the
is (limited operation and variation space), statistical operates at its actual speed. When the workload of both
CHANG et al.: SYNCHRONOUS-LOGIC AND ASYNCHRONOUS-LOGIC 8051 MICROCONTROLLER CORES FOR REALIZING THE INTERNET OF THINGS 25
Fig. 1. Simulated (a) energy and (b) speed of the static-logic async 8051 ALU and the dynamic-logic async 8051 ALUs implemented using , 0.2,
and 0.5, normalized with respect to the static-logic async 8051 ALU @500 mV. KCR is the ratio of the strength of the keeper and the strength of the critical path.
microcontrollers is varied, the features lower energy dis- and nominal voltage regions. Fig. 1(a) and (b), respectively,
sipation per workload due to the exploitation of its async-logic depicts the energy dissipation and speed of four
protocols. async 8051 arithmetic logic units (ALUs) at the typical process
This paper is organized in the following fashion. A succinct corner and : an async 8051 ALU implemented using
review of async approaches is presented in Section II. The ar- static-logic gates, and three async 8051 ALUs implemented
chitecture of the and the proposed are presented using dynamic-logic gates with the ratio of the strength of the
in Section III. Section IV presents the simulation and measure- keeper and the strength of the critical path ,
ment results on prototype ICs and benchmarking. Conclusions 0.2, and 0.5 (i.e., the larger the ratio, the larger the keeper);
are drawn in Section V. for ease of comparison, the results are normalized to the
static-logic async 8051 ALU @500 mV. In the near-threshold
II. REVIEW OF ASYNCHRONOUS-LOGIC APPROACHES voltage region , the dynamic-logic async 8051
In general, there are four async design philosophies ALU with is faster, but dissipates
[19]–[23]: the matched-delay approach (also known as bun- higher energy than the static-logic async 8051 ALU. At this
dled-data, assumes correlation between circuit delays and KCR, the minimum of the dynamic-logic async 8051
independently generated delays), the speed-independent ap- ALU is limited, . This is because at lower
proach (assumes arbitrary delays for circuits and all wire forks voltages, the reduces significantly, and KCR needs
are isochronic [24]), the QDI approach (assumes arbitrary to be increased for keepers to function robustly. Specifically,
delays for circuits and some wire forks are isochronic), and for the dynamic-logic async 8051 ALU to operate in the
the delay-insensitive approach (assumes arbitrary delays for sub-threshold voltage region [e.g., at the minimum energy
circuits and wires). Isochronic wire forks are wire fanouts point, (see Section IV-B)], it is necessary that
which has matching signal transitions at all ends of the fanout. . At , the dynamic-logic async 8051 ALU
The QDI async approach offers the best compromise between with is slower and dissipates higher
circuit complexity and robustness in view of the state-of-the-art energy than the static-logic async 8051 ALU. Further, when the
fabrication processes and DVS operation [20]. dynamic-logic async 8051 ALU operates at the near-threshold
The QDI async approach is generally based on one of two region , it dissipates higher energy than
broad logic implementations: dynamic-logic or static-logic. the static-logic async 8051 ALU. For completeness, the dy-
In dynamic-logic circuits, keepers are usually employed to namic-logic async 8051 ALU with is operable
maintain voltages in wires that are occasionally tri-stated. On down to 100 mV, at significant energy and speed overheads.
the other hand, in static-logic circuits, voltages in wires are In short, dynamic-logic sized at above-threshold is faster, but
always maintained by either inputs or combinational feedbacks. dissipate higher energy than static-logic and not robust at
Keepers that are appropriately sized (for speed and energy) sub-threshold voltages. Dynamic-logic sized for robustness at
to operate at nominal voltage may fail to operate at lower sub-threshold voltages is slower and dissipates higher energy
voltages. Conversely, keepers that are appropriately sized to than static-logic for the entire DVS range. The static-logic
operate at sub-threshold voltages suffer from significant speed approach, on the other hand, is ratioless and hence appropriate
and energy penalties when operating in the near-threshold for full range DVS [25].
26 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 3, NO. 1, MARCH 2013
Fig. 2. Simulated (a) logic-low noise margin (NML) and (b) logic-high noise margin (NMH) of static-logic gates with fan-in of 1–4 for to 500 mV.
NML and NMH are normalized to and expressed as a percentage thereof.
To further qualify the robustness of static-logic for full-range controller) and the GP Port Control (general-purpose port
DVS, consider the noise margin of standard sized static-logic controller). Prog Port Control allows the initialization of the
gates with various fan-in; dynamic-logic circuits are not con- on-chip ROM by means of an off-chip mnemonic programmer
sidered here due to their lower robustness at low voltages. Nev- (through the program I/O) during power-up. GP Port Control
ertheless, for completeness, the noise margin of dynamic-logic serves as an interface controller between the and
circuits is reported in [25]. The static-logic gates consist of NOT (I/O of and , respectively), and the general-pur-
fan in , NOR , and NAND pose I/O [ , which comprises of , ,
gates. Fig. 2(a) and (b), respectively, depict the , and (where is 0–3)].
logic-low (NML) and logic-high (NMH) noise margins [26] Table I tabulates the operation modes of the ,
at the typical process corner and 25 , normalized to and their shared memories. The master reset (RSTN),
(and expressed as a percentage thereof). It can be seen, as ex- program enable (PROGN), microcontroller core selector
pected, that higher fan-in gates have lower noise margins, and ( ), and external interrupt (INTN) inputs apply
at very low voltages, the noise margins are unacceptable—neg- to both and . The active low input PROGN
ative values. For a minimum NML and NMH of 10% (de- disables both the and , and allows the ROM
picted in Fig. 2(a) and (b) with a horizontal dashed line) [27], to be programmed via the Prog Port Control block by means
employing static-logic gates with features a min- of the program I/O. The active high (active low) input
imum operating of 180 mV [operable at the minimum en- activates the . The active low
ergy point, (see Section IV-B)]. Further, to operate at input INTN triggers the interrupt system of the and
lower voltages with the same minimum NML and NMH, gates .
of lower fan-in should be used, e.g., employing static-logic gates
with features a minimum operating of 115 mV. A. Synchronous Microcontroller Core—S8051
The design of sync microcontrollers, including the ,
is mature and extensively reported in literature [28]. The de-
III. ARCHITECTURE OF THE S8051 AND A8051 sign of herein is based on the technology-independent
This section first succinctly reviews the general architecture Synopsys microcontroller core macro cell and synthesized for
of the and of the proposed , and serves as pre- low-mid speed (50 MHz). The is based on standard li-
amble to the detailed design of the latter. brary cells and clock-gating is employed where necessary. The
Fig. 3 depicts the architecture of , the proposed , standard library cells are based on static-logic gates; this ap-
and the shared embedded 1 kB ROM (read-only memory for proach is adopted together with the limit on the fan-in for each
program), 128 B RAM (random-access memory for data), gate to [25]. The reason for employing static-logic and lim-
and 1 kB XRAM (external random-access memory for data). iting fan-in has been delineated in Section II earlier. The micro-
The and also share three groups of inputs and controller core features four clocks per instruction cycle and 1–5
outputs: the control I/O (main control signals), program I/O instruction cycles per instruction (i.e., 4–20 clocks per instruc-
(programmable and debug signals), and general-purpose I/O tion). This synthesized design, being a practical design, em-
(general-purpose input/output signals). They also share two bodies 30% delay margins to accommodate modest PVT varia-
controller blocks: the Prog Port Control (programmable port tions at nominal voltage operation.
CHANG et al.: SYNCHRONOUS-LOGIC AND ASYNCHRONOUS-LOGIC 8051 MICROCONTROLLER CORES FOR REALIZING THE INTERNET OF THINGS 27
Fig. 3. Block diagram of the QDI async 8051 microcontroller , the sync 8051 microcontroller , and the shared blocks.
B. Asynchronous Microcontroller Core—A8051 Fig. 4 depicts the block diagram of the which is in part
specified using Balsa [30], and in part [flow controller (FCont)
This section first presents the general architecture of the block and memory controller (MemCont) block] handcrafted;
, and thereafter, the general-purpose I/O that Balsa is an async behavioral synthesis EDA tool based on the
exploit the modality of the local handshake protocols therein. syntax-directed translation approach. The syntax-directed trans-
The is implemented using null convention logic lation approach translates every software code into an equiv-
(NCL) gates [29] based on the same standard library cells as alent circuit. In the case of Balsa, the circuits are based on
the , and their shared (with the ) ROM, RAM, and standard library cells, and as delineated earlier, the generated
XRAM are implemented using standard library memories; as designs may be inefficient [31], [32]. The FCont block syn-
delineated earlier, standard library cells are used to make the chronizes the sync control I/O with the via the ARSTN
comparison with the as equitable as possible. Although (async reset) and AINTN (async interrupt) signals, while Mem-
using standard library cells will potentially be disadvantageous Cont block synchronizes the sync embedded memories with the
for because standard library cells are highly optimized via the AROM (async ROM), ARAM (async RAM), and
for sync. Nevertheless, this work attempts to make an equitable AXRAM (async XRAM) signals.
sync and async comparison by means of standard library cells Both the and are designed as two-stage
to demonstrate the viability of both approaches, especially pipeline systems. The instruction fetch (IF) block forms the
when operation and variation spaces are considered. first pipeline stage and manages the fetching and grouping
28 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 3, NO. 1, MARCH 2013
TABLE I
OPERATION MODES OF THE , AND THEIR SHARED BLOCKS
Fig. 6. General-purpose I/O configuration for (a) , (b) active mode, and (c) passive mode. The timing diagram of (d) , (e)
active mode, and (f) passive mode. The state transition diagram for the general-purpose I/O for (g) , (h) active mode, and (i) passive
mode.
TABLE II
MEASURED EPI AND MIPS OF THE AND THE PROPOSED AT NOMINAL OPERATING CONDITION
for the exploitation of async protocols for stalling operations in cells); this area overhead can be mitigated (to [6]) with
a microprocessor [33], [34]. Nevertheless, this work extends the custom cells.
energy efficiency of an async microcontroller beyond the min-
imum energy point in DVS. A. Performance at Nominal Operating Condition
Six benchmark programs are used to evaluate the EPI and
MIPS of the and : arithmetic, logical, data
IV. BENCHMARKING THE S8051 AND A8051 CORES
transfer, Boolean variable, branching, and Dhrystone v2.1.
In this section, we will first compare the and The first five benchmark programs evaluate the performance
at the nominal operating condition ( , , of one particular instruction type, whereas the last evaluates
full workload and no PVT variations). The figures-of-merit the average performance. The average measurement results
(FOM) for nominal operating condition include energy per in- on 40 prototype ICs are tabulated in Table II and for ease of
struction (EPI) and millions of instructions per second (MIPS). comparison, the results are normalized to the and shown
EPI delineates the energy dissipated for executing an instruc- in parentheses. Based on Dhrystone v2.1, the performance of
tion, and MIPS delineates the speed of the instruction execution the and are comparable—the MIPS is
rate. Second, we will compare the EPI and MIPS of the 10% higher and EPI is 10% lower than the .
and at wide operation space and variation space Fig. 8(a) and (b), respectively, depict the power spectrum
(PVT). Third, we compare the energy per computation (EPC) (0 Hz–1 GHz) of the and . The 50-MHz clock
of the and at wide operation space (workload). of the causes peaks at its harmonic frequencies, and the
EPC is employed to delineate energy dissipation at different highest peak is at 400 MHz. In comparison and not unexpect-
workloads because it takes into consideration energy expended edly, the features a more evenly distributed power spec-
due to execution of redundant instructions, if any. All com- trum and the highest peak at 330 MHz is lower
parisons in the first and second parts are based on comparable than the . A low and evenly distributed power spectrum is
and , where the general-purpose I/O often desirable in many applications, including sensitive ubiq-
operates in the active mode. The comparison in the third part uitous RF transceivers.
includes the with the general-purpose I/O operating in
the passive mode to demonstrate the potential advantages of B. Performance at Wide Operation Space and
the async protocol. Variation Space (PVT)
The , and their shared memory blocks are real- Consider now the performance at wide operation space where
ized using 130-nm CMOS and the chip microphotograph of one for and is varied from deep sub-threshold
of the 40 prototypes is shown in Fig. 7. Collectively, they oc- (at 100 mV) to nominal (at 1.2 V), at full workload and
cupy 4.12 mm , with the mm occupying ; for shared blocks (e.g., memories), is main-
the area of the mm , and this is largely due to tained at nominal and for all blocks, no PVT variations are
the dual-rail-encoded QDI async (and based on standard library assumed. Fig. 9(a) and (b), respectively, depict the measured
CHANG et al.: SYNCHRONOUS-LOGIC AND ASYNCHRONOUS-LOGIC 8051 MICROCONTROLLER CORES FOR REALIZING THE INTERNET OF THINGS 31
Fig. 9. Measured (a) EPI and (b) MIPS performance when changes from 100 mV to 1.2 V.
Fig. 10. Simulated (a) EPI and (b) MIPS performance when , normalized to the A8051 at .
EPI and MIPS of the and . From the Consider now the condition where (min-
DVFS and DVS lines, their EPI and MIPS are compa- imum energy point), and full workload, and at wide
rable. However, note that the EPI for is only comparable variation (PVT) space: , and
to when it is operating at minimum at all clock to ; for shared blocks (e.g., memories),
frequencies. The is operable from 100 mV to 1.2 V is maintained at nominal and no PVT variations are assumed.
(full-range DVS) while the is operable in the same varia- The variation spaces are introduced by means of simulations
tion space with clock frequency reduced to 57 kHz to operate at because it is challenging to introduce precise variations into the
the minimum energy point , and to 3 kHz to operate prototypes. The is clocked at 57 kHz for comparable EPI
at (full-range DVFS); is operable to . Fig. 10(a) and (b), respectively, depict the simulated
down to due to the 30% delay margins already added EPI and MIPS of the and when ;
for practicality during synthesis. For completeness, the EPI and for ease of comparison, the results are normalized to the
MIPS trajectories for 57 kHz and 3 kHz operation for at . The innately accommodates the
are plotted in Fig. 9(a) and (b), respectively. Interestingly, both variation space, whereas requires a delay margin to
the and that are designed for minimum of achieve the same. The delay margin translates to
180 mV (see Section II) are operable at 100 mV, nonetheless, lower speed, and higher energy dissipation compared
exhaustive benchmarks are required for reliable operation. to the when conditions are nominal . For
32 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 3, NO. 1, MARCH 2013
Fig. 11. Simulated (a) EPI and (b) MIPS performance when , normalized to the A8051 at .
Fig. 12. Simulated (a) EPI and (b) MIPS performance when to , normalized to the A8051 at .
completeness, the EPI and MIPS for the are de- the same. The delay margin translates to lower
picted. At this clock frequency, the fails to operate below speed, and higher energy dissipation when conditions are
. nominal . For completeness, the
Fig. 11(a) and (b), respectively, depict the simulated EPI and is operable down to .
MIPS of the and when ; for Overall, the requires a delay margin to op-
ease of comparison, the results are normalized to the at erate at the minimum energy point (250 mV) and over the com-
. As before, the innately accommodates the bined variation space of , and
variation space, whereas requires a delay to . The delay margin accounts for worst-case
margin to achieve the same. The delay margin trans- conditions in the operation and variation (PVT) space,
lates to lower speed, and higher energy dissipa- but not unexpectedly, results in worst-case speeds and incurs ex-
tion when conditions are nominal . For com- cess energy due to leakage currents at nominal conditions.
pleteness, the EPI and MIPS for the are de-
picted. At this clock frequency, the is operable down to C. Performance at Wide Operation Space (Workload)
. Consider now the performance at wide operation space where
Fig. 12(a) and (b), respectively, depict the simulated EPI and the required computation speed is varied from 10% to 100%, at
MIPS of the and for to ; for and ; no PVT variations are assumed.
ease of comparison, the results are normalized to the at Fig. 13 depicts the measured EPC of the and .
. The innately accommodates the variation The required computation speed is normalized to the compu-
space, whereas requires a delay margin to achieve tation speed of at 100% workload and expressed as a
CHANG et al.: SYNCHRONOUS-LOGIC AND ASYNCHRONOUS-LOGIC 8051 MICROCONTROLLER CORES FOR REALIZING THE INTERNET OF THINGS 33
[27] M. Alioto, “Understanding DC behavior of subthreshold CMOS logic conferences, including the IEEE-National Institutes of Health (NIH) Life
through closed-form analysis,” IEEE Trans. Circuits Syst. I, Reg. Pa- Sciences Systems and Applications Workshop, the IEEE-NIH CAS Medical
pers, vol. 57, no. 7, pp. 1597–1607, Jul. 2010. and Environmental Workshop, and the International Symposium on Integrated
[28] I. S. MacKenzie and R. C.-W. Phan, The 8051 Microcon- Circuits and Systems. He has also been awarded numerous academic, defense
troller. Upper Saddle River, NJ: Pearson Prentice Hall, 2007. and industrial grants, exceeding $11M, including from Defense Advanced Re-
[29] K. M. Fant and S. A. Brandt, “NULL convention logic: A complete and search Projects Agency (USA), E.U. grants, from multinational corporations,
consistent logic for asynchronous digital circuit synthesis,” in Proc. Int. etc.
Conf. Appl. Specific Syst., Archit. Processors, 1996, pp. 261–273.
[30] A. Bardsley, “Implementing Balsa handshake circuits,” Ph.D. disserta-
tion, Dept. Comput. Sci., Univ. Manchester, Manchester, U.K., 2000.
[31] C.-F. Law, B.-H. Gwee, and J. S. Chang, “Asynchronous control Bah-Hwee Gwee (S’93–M’97–SM’03) received
network optimization using fast minimum-cycle-time analysis,” IEEE the B.Eng. degree in electrical and electronic en-
Trans. Computer-Aided Design Integr. Circuits Syst., vol. 27, no. 6, gineering from University of Aberdeen, Aberdeen,
pp. 985–998, Jun. 2008. U.K., in 1990, and the M.Eng. and Ph.D. degrees
[32] S. F. Nielsen, J. Sparso, and J. Madsen, “Behavioral synthesis of asyn- from Nanyang Technological University, Singapore,
chronous circuits using syntax directed translation as backend,” IEEE in 1992 and 1998 respectively.
Trans. Very Large Scale (VLSI) Syst., vol. 17, no. 2, pp. 248–261, Feb. He was an Assistant Professor in School of
2009. Electrical and Electronic Engineering, Nanyang
[33] C. I. Kelly, V. Ekanayake, and R. Manohar, “SNAP: A Sensor-network Technological University (NTU) from 1999 to 2005
asynchronous processor,” in Proc. Int. Symp. Asynch. Circuits Syst., and has been an Associate Professor since 2005.
2003, pp. 24–33. He has been holding the appointment of Assistant
[34] V. Ekanayake, I. C. Kelly, and R. Manohar, “An ultra low-power pro- Chair (Students) of School of Electrical and Electronic Engineering since
cessor for sensor networks,” in Int. Conf. Archit. Support Program. 2010. He was the Principal Investigator (PI) of a number of research projects
Languages Operat. Syst., 2004, pp. 27–36. including the ASEAN-European Union University Network Programme,
Ministry of Education Tier-1 and Tier-2, Defence Science and Technology
Agency and Temasek Laboratories projects. He was also the co-PIs of DARPA
Kok-Leong Chang (S’07–M’11) received the (USA), NTU-Panasonic, NTU-Lingköping research projects. His total research
B.Eng. (first-class honors) and Ph.D. degrees in grant is amounting to more than U.S. $5m. He has filed and granted several
electrical and electronic engineering from Nanyang USA and Singapore patents in circuit design. His research interests include
Technological University, Singapore, in 2004 and sub-threshold, dynamic voltage scaling asynchronous circuit, GALS NoC and
2011, respectively. Class-D amplifier designs. He has been an Associate Editor for Journal of
He is a Scientist with the Institute of Materials Circuits, Systems and Signal Processing (2007–2012).
Research and Engineering, A*STAR, Singapore. His Dr. Gwee was the Chairman of IEEE Singapore Circuits and Systems Chapter
research interests include robust asynchronous-logic in 2005, 2006 and 2013. He has been the members of IEEE CAS Society
circuits, microprocessor architectures, and printed DSP, VLSI and Bio-CAS Technical Committees since 2004. He has served in
electronics. the Organizing Committees for IEEE BioCAS-2004, IEEE APCCAS-2006,
Technical Program Chair for ISIC-2007, co-Chair for ISIC-2011 and served
in the steering committee for IEEE APCCAS 2006–2008. He has been an
Associate Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II:
Joseph S. Chang received the B.Eng. in electrical EXPRESS BRIEF 2010–2011, and an Associate Editor for IEEE TRANSACTIONS
and computer engineering from Monash University, ON CIRCUITS AND SYSTEMS—PART I: REGULAR PAPERS since 2012. He was
Melbourne, Australia, and the Ph.D. degree from the an IEEE Distinguished Lecturer for CAS Society in 2009/2010.
Department of Otolaryngology, University of Mel-
bourne, Melbourne, Australia.
He is currently with Nanyang Technological
University (NTU), Singapore, where he was previ- Kwen-Siong Chong (S’03–M’09) received the
ously the Associate Dean of Research and Graduate B.Eng., M.Phil., and Ph.D. degrees in electrical and
Studies at the College of Engineering. He is also an electronic engineering from Nanyang Technological
Adjunct at Texas A&M University, College Station, University, Singapore, in 2001, 2002, and 2007,
TX, USA. He is a multi-disciplinary engineer and respectively.
his research interests encompass emerging technologies and traditional crcuits He is presently a Senior Research Scientist with
and system-related fields, including printed electronics, microfluidics, life Temasek Laboratories, Nanyang Technological Uni-
sciences, audiology, psychophysics, acoustics, and biomedical and electronic versity, Singapore. He was a visiting researcher in
devices. He publishes prolifically and has been awarded 10 patents with several Nara Institute of Science and Technology, Japan, in
pending. He has founded two startups in the field of electroacoustics, and has 2010, and in the University of Michigan, USA, in
designed numerous related products, adopted for industry and commercially. 2012. He was the co-principal investigator/collabo-
Dr. Chang served as Editor of the Open Column, IEEE CIRCUITS AND rator of the Defense Advanced Research Projects Agency (USA) and Ministry
SYSTEMS MAGAZINE, Associate Editor of the IEEE TRANSACTIONS ON of Education Tier-2 (Singapore) research projects. His research interests include
CIRCUITS AND SYSTEMS—PART I and the IEEE TRANSACTIONS ON CIRCUITS asynchronous VLSI designs, low-voltage low-power VLSI circuits, audio signal
AND SYSTEMS—PART II, Guest Editor for the PROCEEDINGS OF THE IEEE, processing, and soft-error tolerant designs.
Guest Editor of the CIRCUITS AND SYSTEMS MAGAZINE (life sciences special Dr. Chong was the Secretary of IEEE Circuits and Systems (CAS) Society,
issue), and Chair of the Life Sciences Systems and Applications Technical Singapore Chapter, in 2011 and 2012. He has been the member of IEEE CAS
Committee of the IEEE CAS Society. He has chaired several international Society VLSI Technical Committee since 2009.