Sie sind auf Seite 1von 21

Feature

Baseband Analog
Front-End and Digital
Back-End for Reconfigurable
Multi-Standard Terminals
Andrea Baschirotto, Fabio Campi,
Rinaldo Castello, Giovanni Cesura,
Roberto Guerrieri, Luciano Lavagno,
Andrea Lodi, Piero Malcovati, and
Mario Toma

Abstract
Multimedia applications are driving wireless net-
work operators to add high-speed data services
such as Edge (E-GPRS), WCDMA (UMTS) and
WLAN (IEEE 802.11a,b,g) to the existing network.
This creates the need for multi-mode cellular
handsets that support a wide range of communi-
cation standards, each with a different RF fre-
quency, signal bandwidth, modulation scheme,
etc. This in turn generates several design chal-
lenges for the analog and digital building blocks
of the physical layer. In addition to the above-
mentioned protocols, mobile devices often
include Bluetooth, GPS, FM-radio and TV services
that can work concurrently with data and voice
communication. Multi-mode, multi-band, and
multi-standard mobile terminals must satisfy all
these different requirements. Sharing and/or
switching transceiver building blocks in these
handsets is mandatory in order to extend battery
life and/or to reduce cost. Only adaptive circuits
that are able to reconfigure themselves within the
handover time can meet the design requirements
of a single receiver or transmitter covering all the
different standards while ensuring seamless
inter-interoperability. This paper presents analog
and digital base-band circuits that are able to
support GSM (with Edge), WCDMA (UMTS), WLAN
and Bluetooth using reconfigurable building
© DIGITAL VISION

blocks. The blocks can trade off power consump-


tion for performance on the fly, depending on the
standard to be supported and the required QoS
(Quality of Service) level.

8 IEEE CIRCUITS AND SYSTEMS MAGAZINE 1531-6364/06/$20.00©2006 IEEE FIRST QUARTER 2006
I. Introduction standards. In order to reach this goal, one must define
The growing economic and social impact of mobile which standards can be used at the same time. We
telecommunication devices, together with the evolution assumed that only two standards among the supported
of protocols and interoperability requirements among dif- ones can operate concurrently at a given time (e.g., WLAN
ferent standards for voice and data, is currently driving with Bluetooth or voice with Bluetooth or voice with
worldwide research towards the implementation of fully- WLAN) and that no handover is supported for Bluetooth.
integrated multi-standard transceivers. The most Basing on these considerations, we defined the receiver
advanced fully integrated solutions in the scientific litera- and transmitter architectures shown in Figure 1 and
ture and on the market do not cover the four most impor- Figure 2, respectively. These architectures reflect the
tant telecommunication standards, namely GSM, following basic ideas:
WCDMA, Bluetooth, and wireless LANs (WLANs). In order ■ two parallel receiver (RX) chains based on direct
to allow the user to switch seamlessly among different conversion architecture are implemented, one
standards, achieving so-called “global roaming,” for both supporting all cellular standards and Bluetooth,
voice and data applications, all these standards have to and the other supporting all WLAN standards and
be supported by an integrated transceiver. GSM and Bluetooth;
WCDMA (UMTS) are the dominant standards for voice ■ two parallel transmitter (TX) chains are imple-
and mixed voice/data mobile services, while WLANs mented, one based on direct modulation for GSM,
based on the IEEE 802.11a/b/g protocol are the most Bluetooth and possibly WCDMA (UMTS), and the
important standards for high data-rate wireless internet other, based on direct conversion architecture, for
access. Finally, Bluetooth enables the terminal to be wire- all WLAN standards and Bluetooth;
lessly connected with other devices at low data rates ■ the RX and TX chains covering the cellular stan-
over a short distance. Implementation of an integrated dards can reconfigure themselves in a short time
multi-standard transceiver that is competitive with solu- (less than 200 µs), thus allowing vertical handover
tions based on separate devices for the different stan- between GSM and WCDMA, which do not need to
dards must take various points into account. First of all, operate concurrently;
both silicon area and static power consumption must be ■ vertical handover between cellular and WLAN stan-
minimized, thus requiring the maximum possible hard- dards, which can operate concurrently, is based on
ware sharing among the transceivers for the different the use of two different transceivers;

Phone/Bluetooth RX
Phone IQ A/D
LNA
BP Filter Demodulator
VGA LP Filter VGA
Digital Processor

RF
Analog Baseband
Digital Baseband

WLAN/Bluetooth RX
WLAN IQ
LNA A/D
BP Filter Demodulator
LP Filter VGA

Figure 1. Receiver channels.

Andrea Baschirotto is with Department of Innovation Engineering, University of Lecce, Italy, andrea.baschirotto@unile.it.
Rinaldo Castello is with Department of Electronics, University of Pavia, Italy, rinaldo.castello@unipv.it
Fabio Campi, Giovanni Cesura, and Mario Toma are with STMicroelectronics, Italy, fabio.campi@st.com, giovanni.cesura@st.com,
mario.toma@st.com
Roberto Guer rieri and Andrea Lodi are with Advanced Research Center on Electronic Systems, University of Bologna, Italy,
rguerrieri@deis.unibo.it, andrea.lodi@deis.unibo.it
Luciano Lavagno is with Department of Electronics, Politecnico di Torino, Italy, luciano.lavagno@polito.it
Piero Malcovati is with Department of Electrical Engineering, University of Pavia, Italy, piero.malcovati@unipv.it

FIRST QUARTER 2006 IEEE CIRCUITS AND SYSTEMS MAGAZINE 9


The project “Enabling technologies for reconfigurable cuit performance, and hence the power consumption, to
wireless terminals,” funded by the Italian National Pro- the standard considered. This adaptation of an analog
ject FIRB, is a first step toward the above mentioned device is performed through a digital control which either
multi-standard integrated transceiver. In particular, five adjusts the biasing conditions of the active building
different chips, which represent a preliminary step blocks (e.g., operational amplifiers) or turns on or off the
towards the final device, are presently under testing: (1) entire stages (e.g., in an analog-digital converter or in a
receiver and (2) transmitter for DCS1800, UMTS, and programmable-gain amplifier), or reconfigures the inter-
Bluetooth, (3) receiver and (4) transmitter for WLAN at connections among the blocks of the circuit. The details
2.4 GHz and 5 GHz, and Bluetooth, as well as (5) the dig- of architectural choices and circuit design for the receiver
ital processor for all standards. This paper presents the and transmitter chains are reported in the next Sections.
baseband section (both analog and digital) for all five
chips, discussing the most important design aspects and A. Receiver Analog Baseband Channel
the achieved experimental results. RF circuits are pre- The input spectrum of the receiver baseband block typi-
sented in a companion paper. cally includes adjacent channels, in-band and out-of-band
blockers, that can dominate (by up to 40–60 dB) the sig-
II. Analog Baseband Section nal to be processed. For this reason analog baseband
The challenges in designing the analog baseband section blocks are required to exhibit not only a target in-band
of a reconfigurable transceiver are mainly related to the dynamic range, but also excellent linearity for out-of-band
very different specifications of the different standards. In signals. This is because a non-linear behavior with
particular, bandwidth, gain, noise, resolution and linearity out-band-signals would result in an intermodulation
requirements are quite different from one standard to whose product components would fall in the signal band,
another. One “brute force” approach to design could be to corrupting the signal quality. The analog baseband block
select the most stringent requirement for each parameter, of a receiver is composed of a series of Voltage Gain
thus deriving a set of specifications valid for all stan- Amplifiers (VGA) and Low-Pass Filters (LPF). The VGAs
dards. This approach, however, is definitely not efficient, increase the signal amplitude, while the LPF reduces the
especially in terms of power consumption. A more rea- amount of the out-of-band signal in order to increase the
sonable approach, which has been adopted for the design signal dynamics available for the useful signal. This func-
of the circuits reported in this paper, is to adapt the cir- tionality is shown in Figure 3.

GSM/(WCDMA)/Bluetooth TX

Modulator LINC
PPA PA Balun Linear PA
PLL Combiner

RF
Analog Baseband
WLAN/(WCDMA)/Bluetooth TX Digital Baseband

IQ
D/A LP Filter PPA PA
Modulator
Differential Antenna

Figure 2. Transmitter channels.


Available

VGA LPF VGA LPF VGA LPF VGA


Swing

Blocker
Figure 3. Receiver baseband analog signal processing.

10 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2006


The design of the receiver baseband channel implies a
trade-off between LPF selectivity (higher filter selectivity VGA1 Filter VGA2 ADC
would result in a lower number of stages) and circuit +
complexity. In the design considered in this paper, we
used a structure with two VGAs and one LPF, as shown in
Figure 4. −

For the channel devoted to cellular application, the


signal is amplified by 59 dB. This is because the input sig- −10/29 dB 4 dB 0/35 dB
nal can be very low. The VGA1 (with a gain programma-
Figure 4. Block diagram of the developed analog baseband
bility in the 0 dB–29 dB range) then requires a reduced
signal processing channel.
linear range, while it must have a very low Input Referred

Noise (IRN ≈ 5 nV / pHz). This constraint was satisfied
by using an open loop approach implemented by a
resistively-degenerated and resistively-loaded differential
R2
stage, whose dc-gain is fixed by a resistive ratio. In this
scheme we used an open-loop architecture to reduce the
power consumption. For small input signals, a large gain R1
is required. This is achieved by minimizing the degenera- vi

tion resistance, which also reduces the IRN, as required by
ADC vo
the low input signal amplitude. On the other hand, for
large input signals, a reduced gain is required. This is 1+s·τ
C1
achieved by maximizing the degeneration resistance, +
which also increases the linear range, as required by the
large input signal amplitude. Several solutions for the filter
are proposed in literature. Active-RC structures exhibit Figure 5. The active-gm -RC biquadratic cell.
excellent linearity at the cost of high power consumption
[1]. On the other hand, gm -C filters feature a reduced
linear range but with low-power consumption [2]. In this respect to other closed-loop structures (active-RC
design we developed a novel structure that is the merging or MOSFET-C), in which the op-amp unity-gain
of the two solutions above and is called “active-gm -RC.” bandwidth fu > 50 ÷ 100 · fLP is used, requiring a
Figure 5 shows the 2nd order low-pass active-gm -RC cell large power consumption;
structure in its single-ended form. ■ High-linearity: a very large linear range is achieved
The operational amplifier (op-amp) has a single-pole due to its closed-loop structure. Moreover, out-of-
transfer function (in the frequency range of interest) that band signals are first filtered by the very linear
is taken into account in the transfer function synthesis. R1 -C 1 low pass filter at the input. This gives a very
An Adjusting Circuit controls the op-amp frequency high out-of-band IP3 (3rd order intercept point),
response in order to track the time constant of the pas- which is particularly interesting in telecom systems
sive components (R and C). This transforms the depend- where the higher amplitude of out-of-band blockers
ence of the filter frequency response on the transistor requires a large out-of-band linearity;
parameters into a dependence only on the passive com- ■ Frequency response accuracy: the Adjusting Circuit
ponent values (R’s and C’s). The active-gm -RC cell exhibits makes the op-amp frequency response depend on
the following features, which make it preferable for the the passive component values (R and C) spread,
implementation of the baseband filter of portable multi- which is the only spread to be compensated.
standard terminals: The 4th-order UMTS/WLAN reconfigurable filter is
■ Low power consumption (a key objective for realized by the cascade of two active-gm -RC biquadratic
portable terminals): one op-amp is used to synthe- cells. The filter can be reconfigured to adjust the band-
size a 2nd order transfer function, halving the width to the selected standard (2.11 MHz and 11 MHz for
power consumption compared with standard two- UMTS and WLAN, respectively) by a single bit that con-
op-amp active-RC biquadratic cells. In addition, the trols the values of the resistors (this keeps the overall
op-amp frequency response is used to synthesize noise constant). In addition, in the UMTS case the power
the filter frequency response. Thus the op-amp consumption is reduced by controlling the input stage
unity-gain-bandwidth is comparable with the filter device sizes and their current level. For both standards,
pole. This reduces its power consumption with the capacitors are grounded in order to be seen by the

FIRST QUARTER 2006 IEEE CIRCUITS AND SYSTEMS MAGAZINE 11


low resolution. Thus the combination of the two can
Table 1.
Summary of the simulated receiver analog baseband cover a wide portion of the speed-resolution space
channel features. [3–6]. Furthermore, both topologies include the same
building blocks, such as op-amps, comparators, switch-
Value
es and capacitors. The difference between them is the
Parameter UMTS WLAN
network that interconnects the blocks. Thus, a convert-
Filter order 5th er made of those building blocks and a reconfigurable
Power supply 2.5 V interconnection network can implement different
topologies and work at the different bandwidth and res-
Power consumption 51.7 mW 55 mW
olution levels needed for the various standards. In addi-
(UMTS/WLAN) tion to supporting reconfiguration among different
VGA1 19 mW — standards, the ADC can adapt its architecture, perform-
LPF (UMTS/WLAN) 22.7 mW 45 mW ance and power consumption to its environment. A
dynamic configuration manager, considering the
VGA2 10 mW 10 mW
required Quality of Service (QoS), can exploit the recon-
Gain range −6 dB ÷ 68 dB 4 dB ÷ 39 dB figurability of the ADC in order to save power and
In-band IIP3 @ Max gain 5 dBm 1 dBm extend the battery life-time [7, 8].
Another key aspect of the ADC architecture proposed
Out-of-band IIP3 @ Max gain 30 dBm 26 dBm
in this work is the extensive use on the digital side of
IRN @ Min gain 9.6 µVRMS 51 µVRMS background self-calibration algorithms [9–21]. Those
algorithms are used in the background to overcome the
common-mode signal as well. Otherwise, a high frequency effects associated with the limited precision of analog
resonance for the common-mode signals would be pres- building blocks and with component mismatches that
ent. This implies that the capacitance dominates the impair performance. This also turns out to be a very
overall filter area. However, sharing the capacitors for the efficient way to reduce the power consumption of the
two standard configurations thus minimizes the area ADC. For instance, the capacitances can be sized to meet
occupation. The capacitor values are finally adjusted by thermal noise requirements rather than matching (which
the tuning circuit to compensate for technology varia- is compensated digitally), which often results in a smaller
tions. A key feature of this structure is the limited power capacitor size and therefore less op-amp current for the
consumption due to the use of low fu op-amps. In fact, the same gain-bandwidth product. The complete ADC archi-
fu /fLP ratio is less than two. tecture, including the digital blocks used for background
The full filter design has been optimized in order to self-calibration, is highlighted in Figure 6. The ADC core
minimize the power consumption using a specifically is made of six equally sized stages, each one resolving
developed automatic design toolbox, which for a given 1.5 bits, followed by a 2-bit quantizer. In order to further
set of constraints (noise, linearity, transfer function) and reduce the power dissipation, every pair of stages shares
device models, directly defines all the device sizes in one op-amp; this block therefore requires only three
order to minimize power. Finally, the amplitude of the op-amps. The maximum achievable resolution is 8 bits,
input signal of the second VGA is very large (after the pre- and a 6 bit resolution can be digitally selected. In this case
vious amplification), so that for this stage linearity is the first two stages are switched off and the input signal is
more important than noise performance. For this reason fed directly to the third stage. To increase the overall ADC
we used a closed loop architecture, with two 17.5 dB gain- resolution a 2/3 bit reconfigurable stage has been added
stages and a 2.5 dB gain resolution. This block also imple- before the core ADC block, extending the overall resolu-
ments an additional 1st order LPF. Finally, since the offset tion to 10 or 11 bits. In  mode, the hardware is reused
may be significant at this stage, due to the large amplifi- to implement a second order 2-bit modulator which
cation of the previous stages, it includes an offset com- achieves 74 dB DR over a 200 kHz signal bandwidth.
pensation circuitry. Table 1 reports the overall simulated Component matching and a finite op-amp gain-band-
baseband channel features. width product affect the linearity of the ADC, thus reducing
the IP3 of the system. In order to overcome this potential
B. Analog-to-Digital Converter problem, two digital algorithms have been implemented in
The architecture of the ADC stems from the observa- the chip. The first one, called DAC Noise Cancellation (DNC)
tion that while Delta-Sigma is the topology of choice for [21], estimates and corrects the non-linearity error intro-
low speed high resolution conversion, pipelined topolo- duced by the multi-bit (2/3 bit) unit element DAC mismatch
gies are well suited to medium-high speed and medium- of the first conversion stage. The second one, called Gain

12 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2006


Error Correction (GEC) [15, 19], estimates and corrects gain standards). The same is true for a change in the QoS,
errors in the first, second and third stages. In both tech- which should be transparent to the user. This ADC can
niques the analog error is modulated by a pseudo-random change mode in less than 100 µs, while a switch from
noise sequence and then the digital output is processed in 10 bit to 6 bit resolution takes only 600 ns. Figure 7 shows
order to extract the modulated information and digitally the measured power spectral density of a full scale
enhance the ADC performance. It is worth noting that those 9.6 MHz input sine wave before and after the background
techniques require few modifications in the analog part of correction in WLAN mode. The digital algorithms
the chip, while more complexity is added in the digital part. improve the SNR by more than 10 dB and the SFDR by
As CMOS technology scales, more digital signal processing more than 19 dB. The digital algorithms dramatically
(DSP) becomes available for the same area at a reduced reduce the second and the third harmonic to a level of
power consumption; hence those techniques are very −80 dBc and −77 dBc. Figure 8 shows the measured rela-
attractive when the reduction of the overall power con- tionship between the analog power consumption and the
sumption is of major importance, as in the case of portable SNDR of the converter. This reconfigurable converter effi-
terminals. Combining those digital techniques with the ciently exploits the trade-off between resolution and
selective power-down of unused blocks can minimize the power consumption highlighted in this plot.
power consumption. For example, the single ADC in WLAN-
mode can convert the input signal (10 MHz bandwidth) C. Transmitter Analog Baseband Channel
with a 9 bit resolution consuming 6.8 mA; if the resolution is The transmitter analog baseband channel has to trans-
switched to 6 bits the current used is only 3.2 mA. In UMTS- form the digital data stream produced by the digital
mode, the required resolution is 11 bits and the input signal processor (described in Section III) into an analog signal,
(2 MHz bandwidth) is converted using 8 mA. When a GSM- which is then delivered to the RF section. The analog
EDGE signal has to be converted, the architecture is recon- baseband channel developed for the reconfigurable ter-
figured in -mode; in this case only the first two stages minal described in this paper presents the following chal-
are used and the current consumption is 4 mA. lenging key features:
A key aspect for a multi-mode terminal is the ability to ■ it operates at low voltage (i.e., at 1.2 V);
switch from mode to mode in a seamless way. This in turn ■ it is reconfigurable in terms of bandwidth, reso-
sets a requirement on the time allotted for so-called ver- lution and data-rate in order to satisfy the differ-
tical handover (in some cases this time is specified by the ent standards;

Analog Input

10/9b res 8b res 6b res Opamp Sharing Opamp Sharing


1.5 b 1.5 b 1.5 b 1.5 b
Pipeline 2-b
S and H + Pipeline x2 Pipeline x 2 Pipeline
Stage ADC
Stage Stage Stage

1 1 1
+ 2 2 2 2
r [n] r [n]
GEC_1 GEC_3
5 4 3 2
r [n]
GEC_2

5/9-Level 5/9-Level 1.5-b 1.5-b


Flash 1.5-b Flash 1.5-b
Flash DEM DAC
ADC ADC DAC
ADC DAC

5/9
GSM/Other
3/4 qDNC[n] 2 2
_ INT Analog
2
Ist Order Digital
3
Digital
Requantizer Digital for Mode

Delays not Shown

GEC 8 GEC 7 GEC 6


Logic Logic Logic

9/10 DNC GSM/Other


Logic sinc3( 16)
15 GSM/Other
_
14
12 FIR( 4)
14
14 Digital Output
FIR( 2)

WLAN/UMTS

Figure 6. Block diagram of the reconfigurable ADC.

FIRST QUARTER 2006 IEEE CIRCUITS AND SYSTEMS MAGAZINE 13


Digital Output Spectrum Before Digital Calibration Digital Output Spectrum After Digital Calibration
0 0
−10 −10
Fundamental −1 dBFS Fundamental −1 dBFS
−20 −20
Power Spectral Density [dB]

SNRFS 45.8 dBFS SNRFS 56.7 dBFS


−30 SFDR 52.9 dBc −30 SFDR 71.9 dBc
THD −51.3 dBc THD −69.3 dBc
−40 −40
−50 −50
−60 −60
−70 −70
−80 −80
−90 −90
−100 −100
−110 −110
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Frequency [MHz] Frequency [MHz]

Figure 7. Measured power spectral densities at the ADC output before and after background calibration.

■ its power consumption changes depending on the use of DAC load resistors R L (instead of forcing the DAC
selected standard, in order to maximize the current directly into the virtual ground of the first filter
efficiency. op-amp) allows us to decouple the DAC output current
The overall DAC + filter architecture, implemented from the filter op-amp output current, which can be
with a fully-differential topology, is shown in Figure 9. An designed to be much smaller than the former. As a con-
8 bit Current-steering DAC drives a resistive load sequence the desired output dynamic is achieved by
(R L = 600 ) and the resulting output voltage is directly using large resistances in the filter, and by making the
applied to a 4th-order low pass analog filter. A number of filter input impedance much higher than R L . This
design choices, described in the rest of this section, were power consumption reduction is obtained at the cost of
made in order to minimize the power consumption, while an increased thermal noise which, however, is still neg-
achieving a reconfigurable device. ligible with respect to the quantization noise. The DAC
Regarding the DAC structure, a current-steering structure has been designed to achieve the worst-case
approach has been preferred to a R-2R ladder DAC, linearity and dynamic range target specifications even
since it avoids the use of input and output reference in the presence of the worst-case technology mis-
voltage buffers, which would increase power consump- matches and parameter variations [22].
tion. Regarding the coupling between DAC and filter, The same analysis suggested that, due to the 8 bit
resolution, the area penalty of a fully thermometric
implementation is negligible with respect to the lineari-
ty improvement. The unit current source area is
20
designed to satisfy the matching requirements. A maxi-
18 VDD=2.5V, BW=10MHz, mum relative standard deviation σrel of 2% results in an
Pipeline Mode
16 Integral Non-Linearity (INL) yield of 99%, with 0.5 LSB as
Power [mW]

14
the upper limit [23]. Thus the minimum area (W × L) of
each current source is obtained from the Pelgrom
12
model of the mismatch [24]. The choice of the unit cur-
10 rent source overdrive (Vov ) is a trade-off between low
8 sensitivity to threshold voltage mismatches (which
would require a large Vov ) and the headroom available
6
35 40 45 50 55 60 65 from the 1.2 V supply (which limits the maximum value
SNDR [dB] of Vov ). As a consequence, we chose the value Vov ≈ 70
mV, which requires an area of 36 µm2 for the unit cur-
Figure 8. Measured variation of the analog ADC power
consumption as a function of the SNDR. rent source. Finally, the unit current level ( IUNIT ) is
designed to minimize the glitches introduced by the

14 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2006


charge injection of the switches MS (Figure 9). The which allows for rail-to-rail output swing. In fact, the
glitch amplitude is reduced by setting minimum device full-scale peak-to-peak differential output voltage of the
sizes for the differential switches and by driving them block is 1.8 V, which means a filter DC gain of 8.2 dB.
with minimum swing signals (Vlow = 300 mV, Vhigh = The op-amp bandwidth is reduced for lower pole fre-
800 mV). The resulting glitch is about 1 µA. To make quencies (UMTS) by reducing the bias current. As a
this contribution negligible a IUNIT of 5 µA is chosen, result, the power consumption is also reduced. The
which implies W = 6 µm and L = 6 µm for the unit cur- DAC + filter block can be reconfigured for two stan-
rent cell. The coupling between different unit sources dards (WLAN and UMTS) as follows. The DAC sampling
is reduced by using a driver circuit for each of them to frequency (F S ) can be changed in order to achieve the
provide the desired voltage levels for the switches. The required resolution, exploiting the resulting oversam-
current IUNIT is generated with the bias circuit shown pling ratio (OSR) to increase the signal-to-noise ratio
in Figure 9, which makes IUNIT = VREF /RB . Resistance (SNR) to above the 8 bit level. In the case of WLAN, the
R B is matched with the load resistance (R L ) in order to OSR is 4 (using F S = 80 MHz), which leads to a resolu-
make the R L /R B ratio constant. This reduces the tion of 9 bits (this implies a design margin of 1 bit with
dependence of the DAC output voltage amplitude on respect to the required SNR). Similarly, in the UMTS
the technology spread. case, the OSR is 8 (using F S = 40 MHz) with a 1.5 bit
The output common mode voltage is fixed to analog additional resolution. On the other hand, the filter
ground (VDD /2) through a common-mode feedback cir- transfer function can be programmed for two band-
cuit CMFB. The maximum swing on each of the output width values (11 MHz and 2.11 MHz) through a selection
nodes (controlled by VREF ) is fixed to 350 mVpp around bit (BS ), which digitally controls the value of the resis-
the analog ground. This is a trade-off between having tors, and the in-band noise floor. The smaller band for
a significant input signal for the filter following and the UMTS standard (obtained with larger resistance val-
introducing a negligible signal distortion due to the cur- ues) can accept the resulting larger noise floor. The
rent source output impedance, which is, anyway, large choice of programming the cut-off frequency with the
thanks to the cascoding action of the current switches. resistance values allowed us to reduce the power con-
This choice implies a value of  for R L , considering that sumption for low bandwidth. The selection switches
the full-scale peak-to-peak differential current is equal to were carefully designed and layouted in order to mini-
1.275 mA. Again in designing the 4th-order Bessel low- mize their parasitic effects. The switches are connected
pass reconfigurable baseband filter, particular care was to virtual ground nodes (op-amp inputs, with a limited
taken to reduce power consumption. The filter is the cas- voltage swing on the parasitic capacitances), or to low
cade of two identical multi-path active-RC biquadratic impedance nodes (op-amp outputs). In addition the
cells. Active-RC allows us to achieve the required linear capacitor values can be adjusted by ±35% with a 4 bit
range. The use of a single op-amp to synthesize two poles digital word, in order to control the technology spread
reduces power consumption. In this structure the op-amp or to allow a fine selection of filter bandwidth. Finally,
bandwidth has to be about 50–100 times broader than the the filter power consumption is optimized to the stan-
position of the filter poles. dard selected by the signal BS, which selects a bias cur-
The op-amps that we used are based on a fully- rent level for the op-amp, in order to save power when
differential Miller-compensated two-stage topology, a smaller bandwidth is programmed.

VDD BS BS
CMFB

RB VCM −
+
VREF

BS BS BS
− BS
+ +
+ − −
RL RL
Bias VCM VCM VO
Circuit + +
io− io+ io− io+ io− io+
Unit Unit
Unit − −
VDD MS Cell MS BS
Cell Cell BS BS BS BS BS

b0 b0 b1 b1 b255 b255 BS BS

Thermometric Current Steering DAC 4th-Order Programmable Lowpass Filter

Figure 9. DAC + filter structure.

FIRST QUARTER 2006 IEEE CIRCUITS AND SYSTEMS MAGAZINE 15


0
0

−5 −10

Power Spectral Density [dBr]


Power Spectral Density [dBr]

−10 Transmit Spectrum Mask


−20
−15
−20 −30
WLAN Signal Spectrum
−25
−30 −40

−35
−50
−40
−45 −60
0 5 10 15 20 25 30 35 0 2 4 6 8 10
Frequency [MHz] Frequency [MHz]
WLAN UMTS

Figure 10. Measured output power spectra of the transmitter baseband channel for a WLAN 802.11a
and a UMTS input signal.

The DAC + filter block has been fabricated in a


standard 0.13 µm CMOS technology with six metal lay-
ers and MIM capacitors. Figure 10 shows the measured
output power spectra of the circuit for a WLAN 802.11a
and a UMTS input signal. Figure 11 shows the micro-
photograph of the chip, whose active area is 0.82 . In
both configurations the transmitter baseband channel
respects the transmission masks for the standards.
The measured features of the complete circuit are sum-
marized in Table 2.

Filter III. Digital Baseband Section


Digital signal processing systems aimed at the wireless
consumer market must handle a variety of high perform-
ance real-time tasks, for which traditional general-purpose
processors are often a poor match. Flexibility is required
to reduce masks and design costs, high computational
power must match the growing complexity of applica-
tions, and low power consumption is needed to ensure
portability, under severe battery capacity constraints.
A common way to tackle such constraints is by map-
ping critical computational kernels on custom-designed
DAC
hardware units inside the processor pipeline [25, 26], pro-
viding application specific extensions to the standard
instruction set. However, in most cases, extensions are
defined at mask level, severely limiting the flexibility and
application field of the device. Such application-specific
standard processors (ASSPs) also involve high up-front
Figure 11. DAC + filter chip micro-photograph.
non-recurring design costs, justified only for long lifespan
and high volume products. Low volume or frequently

16 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2006


updated products require novel method- Table 2.
ologies to match flexibility with high Summary of the measured transmitter analog baseband channel features.
computational power and low cost. Recon- Parameter Value
figurable processors are an appealing
Technology CMOS 0.13 µm
option [27–33], combining standard pro-
Supply voltage 1.2 V
cessor cores with embedded programma-
Core area 0.8 mm2
ble hardware. Reconfigurable processors
Standard WLAN UMTS
offer high programming flexibility by Fs 100 MHz 50 MHz
means of run time extension of the instruc- Filter bandwidth 11 MHz 2.11 MHz
tion set. Non-critical computations or Differential output swing 1.8 Vpp 1.8 Vpp
control-dominated tasks can be efficiently DR (@ FS) 54 dB 58 dB
mapped on the hardwired portion of the SFDR (@ FS) 54 dB (@ 3 MHz) 61 dB (@ 0.6 MHz)
processor, taking advantage of its soft- THD (@ FS) −51 dB (@ 3 MHz) − 60 dB (@ 0.6 MHz)
ware programmability and shortening the OIP3 29.3 dBm 32.5 dBm
overall development time. According to DAC power consumption 5.4 mW 5.4 mW
Amdahl’s law and the 90–10 rule (90% of Filter power consumption 5.6 mW 3 mW
time is spent executing 10% of the code, Total power consumption 11 mW 8.4 mW
[34]), performance can be enhanced by
up to one order of magnitude by simply focusing imple- design team, always working under severe time pressure,
mentation efforts on identification and improvement of rel- to speed up a larger number of critical kernels. Conse-
atively small critical kernels. Several techniques can be quently, the PiCoGA was designed to be programmed using
used to exploit spatial (i.e., concurrency between a C-like description language, resulting in tighter integra-
resources) and temporal (i.e., pipelining) parallelism at the tion of the hardware and software design flows.
reconfigurable instruction level. The widespread knowl-
edge of the ANSI-C programming language among embed- A. XiRisc Architecture Description
ded systems and wireless algorithm developers suggests XiRisc can be described as a Very Long Instruction Word
using it as the application description language for recon- (VLIW) RISC processor, with a 32-bit datapath (see
figurable processors as well. This introduces the problem Figure 12). The basic XiRisc instruction set includes a
of translating behavioural C into some form of HDL set of DSP-specific instructions such as multiply-and-
description, or directly into hardware (in the case of recon- accumulate, branch-and-decrement, SIMD (Single Instruction
figurable devices, configuration bits). Most of the existing Multiple Data) and saturating arithmetic operations. The
reconfigurable architectures use automated or semi-auto- XiRisc architecture [35] is strictly separated into control
mated C-to-HDL conversion tools to plug into standard syn- logic and data path. The micro-architecture was designed
thesis and Place & Route techniques for configuring the to provide a simple straightforward control model and to
hardware accelerator. Unfortunately, the introduction of offer the programmer full control over the processor
these abstraction layers hides many implementation resources. All Functional Units (FUs) are independent,
choices from the designer, making it difficult to obtain high- concurrent, and fully pipelined.
quality results without a deep understanding of the tools The control architecture is based on the classic RISC
and the underlying architecture. five stage pipeline, with a strict load/store architecture
In this section we present a reconfigurable system that may result in a bottleneck for memory intensive
based on a VLIW processor architecture including a applications. In order to maintain a high data throughput
runtime reconfigurable embedded function unit called to and from the FUs, the processor is structured as a
PiCoGA. The integration of the PiCoGA inside the proces- Very Long Instruction Word machine, fetching and
sor core reduces communication overhead towards other decoding two 32-bit instructions each clock cycle. The
functional units, thus making it easier to use for a variety of instruction pairs are then executed concurrently on the
computation kernels. In what follows, we will use the name set of available FUs, determining two symmetric execu-
“XiRisc” to refer to the processor architecture with the tion flows. Simple, commonly used FUs such as ALU and
tightly integrated reconfigurable unit, as opposed to a stan- Shifter are duplicated in both data channels, while other
dard processor core accelerated by an external device. In FUs such as the multiplier and the branch unit are shared
this context, trading off some potential performance between the two channels. To simplify hazard handling,
speed-up for a higher level of programmability was a key software compilation schedules instruction pairs which
design decision, since it permits a dramatic decrease in avoid simultaneous access to the same shared FU, so
application development time, and hence it may allow a that pairs of instructions never need to be separately

FIRST QUARTER 2006 IEEE CIRCUITS AND SYSTEMS MAGAZINE 17


stalled during computation. All other pipeline hazards results written into the register file, and specific assem-
are resolved at run-time by a fully bypassed architecture bly instructions are used to control both array elabora-
and a hardware stall mechanism. The PiCoGA is handled tion and configuration. The XiRisc instruction set is
by the control logic and the compilation tool chain as a extended at runtime by mapping new functionalities on
shared functional unit. Operands are read from and the gate-array through the issue of pGA-load and pGA-free

MUX

ALU

MUX
Shifter
Data
Channel 1
Main
Instr Decode
Logic MUX F.U. #1
(Multiply/MAC)
Instr
Shared
Register File

Memory Functional
MUX

F.U. #2
(Data Memory Handle) Units
Auxiliary
Instr Decode
MUX

Logic F.U. #3
(...)

Data
Channel 2
ALU
MUX

Shifter

MUX
FPGA Control Unit

FPGA Gate – Array

Gate-Array Control

Gate-Array Writeback Channel

Figure 12. XiRisc architecture.

18 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2006


instructions. As a result, after a configuration latency of The PiCoGA is seen as a customizable unit that dynam-
a few hundred cycles, extended instructions called pGA- ically adapts the instruction set to the application work-
op’s are available to execute custom computations on load. Application-specific custom instructions are
the PiCoGA pipelines. After an execution latency that synthesized using a dataflow-oriented paradigm. They
may range from 1 to 24 cycles pGA-op results are written are modeled as ANSI-C procedures which are automati-
back into the processor register file. cally translated by the compiler into data-flow graphs
From an architectural point of view, the main differ- (DFGs). The DFG extraction step from the ANSI-C function
ences between the PiCoGA and the other FUs are: description is performed using a customized version of
1) The PiCoGA supports up to 4 source and 2 destina- the Impact compiler [36].
tion registers for each instruction. In order to avoid
bottlenecks on the write-back channels, a special B. System Architecture
purpose register file has been designed, featuring The high-bandwidth memory transfer requirements that
four read and four write ports, of which two are are typical of DSP algorithms required a careful imple-
reserved for the PiCoGA. mentation of a layered memory hierarchy (see Figure 13),
2) The PiCoGA instructions may have unpredictable supported by an on-chip AMBA bus architecture. A single
latency, if they directly include data-dependent AHB channel is used to load instructions and data from
loops. A special register locking mechanism was the system memory. Possible conflicts on the bus inter-
designed to maintain program flow consistency in face are resolved by a dedicated arbitration logic within
case of data dependency between PiCoGA instruc- the AHB master. Processor transfers are converted into
tions and other instructions. AHB cycles, supporting wrap burst transfers for cache

ROM
Control Unit

XiRisc Gate-Array
CORE

Parallel
Instruction Data Configuration Port
Arbiter
Cache Cache Memory Interface
APB BUS

AHB Master AHB Slave LCD


Interface Interface Interface

AHB BUS Bridge

EMI
External Control
Test 128KB Registers
AHB Memory
Interface On-Chip
Arbiter Interface
Controller SRAM
(EMI)

Figure 13. System architecture.

FIRST QUARTER 2006 IEEE CIRCUITS AND SYSTEMS MAGAZINE 19


Table 3. C. Pipelined Configurable Gate Array
Prototype chip characteristics. The PiCoGA (Pipelined Configurable Gate Array) [37] is a
reconfigurable datapath which has been designed specif-
Parameter Value
ically to be integrated inside a processor core. The aim
Technology 0.13 µm CMOS process was to provide a device which can reduce both execution
6 Metal Layers
time and energy consumption over a wide range of het-
Power Supply 1.2 V
erogeneous applications and can be easily programmed
Clock Frequency 120 MHz (WC-COM)
166 MHz (TYP) by a software developer.
Power consumption Static: 35 mW Computational efficiency in a reconfigurable fabric is
Dynamic (excluding PiCoGA): 1.48 mW/MHz achieved by exploiting parallelism and implementing cus-
Dynamic (PiCoGA): 100 µm/MHz per active tomized operators with the minimum size required by the
row operands. Achievable parallelism is related to the capacity
SRAM Memory Size Main memory: 128 KB of the device, while operand size customization depends
Instruction cache: 4 KB on the array granularity. Both are design parameters
Data cache: 4 KB whose values have been a direct consequence of the archi-
Tag memories: 1 KB (×2)
tectural choice of integrating the accelerator inside the
PiCoGA configuration cache: 64 KB
processor core. In this context there is no communication
Chip size 6 × 6 mm2
PiCoGA size 11 mm2 overhead between the processor FUs and the configurable
I/Os 151 device, so that they can cooperate together, under direct
Transistor count 17.5 M control of the processor logic. The computation can be par-
titioned at a very fine level of granularity between the
reconfigurable device and the processor functional units,
line refills and locked accesses. The AMBA system is con- almost without penalty. As a consequence, a relatively
nected to a set of IO peripherals, on-chip SRAM and an small array with a fast reconfiguration time can achieve
interface for off-chip memories (EMI) featuring up to high acceleration of computational kernels.
1.328 Gbit/s bandwidth. In order to support a parallel Concerning device granularity, we considered that the
load of data and instructions according to its Harvard processor functional units perform very well 32-bit stan-
internal structure, the processor includes direct-mapped dard operations, while they are very inefficient when
instruction and data caches. dealing with unusual operations over a few bits. There-
Cache memory sizes are programmable at time of fore, in order to efficiently cover the widest range of appli-
synthesis. The memory management unit supports run- cations, we designed the configurable array with fine
time cacheable space configuration, allowing a flexible granularity to balance well the overall architecture.
distribution of program, data and gate array configura- The computational model for the PiCoGA has been cho-
tion among different memory spaces. Furthermore, the sen to be easily integrated inside a processor core. For this
data cache may alternate at runtime between write- reason the gate array provides a hardware platform where
back and write-through policies, depending on the run- application specific instructions can be easily implement-
ning application. ed and added to the native instruction set. This is achieved
Memory resources have also been adapted to han- by means of a special structure which supports direct
dle configuration bits of the reconfigurable gate-array, mapping of pipelined computations. Concurrency does not
which are described in the processor addressing space have to be explicitly described by the user, in order to help
(viewed as part of the program code). Configuration software developers (who are used to sequential high level
bits for different pGA-op operations can thus be placed languages). Instruction level parallelism is extracted from
anywhere in the addressing space but, in order to min- sequential code by our tools in order to program the dedi-
imize configuration latency, they are loaded on a spe- cated PiCoGA control unit. Once the PiCoGA is configured,
cific 64 KB on-chip configuration cache memory directly the pipeline activity is automatically controlled, handling
connected to the array. Programming data can be irregular input data flow, loops and other synchronization
loaded into this memory via the AHB bus from a larger issues between pipeline stages. When a pGA-op instruction
third library of configurations stored in an external is decoded, new input data are provided by the register file
SRAM or FLASH. The peripheral APB bus contains con- to the PiCoGA and, depending on the connectivity of the
figuration and status registers for several system mapped DFG, the control unit activates each PiCoGA row
peripherals, a programmable timer, an interface for in the right order following a dataflow paradigm, whenever
external LCDs, and a parallel port interface for connec- its operands are ready. At the end of the computation a
tion to an external host. write-back operation is performed.

20 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2006


From a structural point of view the PiCoGA is an array ed, which stores 4 configuration contexts for each
of rows, each representing a stage (or part of a stage) of logic cell inside the PiCoGA [38, 39]. Context switch
the implemented pipeline. The width of the configurable takes place in one clock cycle, providing 4 immedi-
datapath has been designed to fit the processor architec- ately available pGA-op instructions. Furthermore,
ture, so each row is able to process 32-bit operands. As Partial Run-Time Reconfiguration (PRTR) [40] is
shown in Figure 14, each row is connected to the others supported, allowing reconfiguration of just a
via configurable interconnect channels, and to the portion of the array to implement more functions in
processor register file via six 32-bit global busses. the same context layer. While the PiCoGA is com-
1) Configuration Caching: One of the reasons for tightly puting, reconfiguration of the next instruction can
integrating an FPGA in a processor core is the be performed, so that cache misses are rare, even
opportunity to use it frequently, for many different when the number of configurations used is large.
computational kernels. However, reconfiguration of These two mechanisms are useful to acceler-
a traditional FPGA can take hundreds or even ate parts of a kernel or different kernels of the
thousands of cycles, depending on the re-pro- same algorithm. However, in the case of a recon-
grammed region size. Although execution can still figurable cellular terminal, the communication
continue on other processor resources, scheduling protocol can even change during the same phone
will hardly find enough instructions to avoid stalls call, requiring the PiCoGA to be completely
that could nullify any benefit from the use of reconfigured. Therefore the reconfiguration time
dynamically configurable arrays. Furthermore, in has been shortened, exploiting a wide configura-
some algorithms the exact function to be executed tion bus to the PiCoGA. Reconfigurable Logic
is only known at runtime, so that reconfiguration Cells (RLC) in a row are written in parallel with
cannot be done in advance. In such cases it is very 192 dedicated wires, taking a few hundred cycles
difficult to take advantage of a configurable unit. to complete the reconfiguration of one PiCoGA
Three different approaches have been adopted context layer. A dedicated second-level on-chip
to overcome these limitations. A first level cache cache is needed to feed such a wide bus, while
(or more precisely a scratchpad, since it is man- the whole set of available functions can be
aged directly by the software) has been implement- stored in an off-chip memory.

4×32-b Input Data


2×32-b Output Data

Horizontal 12
Switch
Connection
PiCoGA Control Unit

Block
Block

RLC
12 Global Lines to/from RF

Configuration Bus

Input
Logic
Connection
Vertical

Block

LUT LUT
16×2 16×2

Output EN
Logic,
Registers

pGA Control Unit Signals

Figure 14. PiCoGA structure.

FIRST QUARTER 2006 IEEE CIRCUITS AND SYSTEMS MAGAZINE 21


2) Reconfigurable Logic Cell: A Reconfigurable Logic routing resources required. Initialization of state
Cell (RLC) is composed of a cluster of 24-input inside the array (e.g., value stored in an accumu-
LUTs, each having 2-bit granularity. An RLC con- lator) is performed by dedicated hardware logic
tains four pairs of registers, which are controlled managed by the PiCoGA control unit.
by the configurable control unit or by another RLC. Dedicated wires along each row are provided to
A single RLC can implement purely combinational achieve fast propagation of carry signals. A carry-
logic, as well as both single cycle and two-cycle select architecture has been implemented, where a
pipeline stages, in order to support timing opti- 2-to-1 multiplexer, driven by the carry-in signal com-
mization and improve the maximum throughput in ing from a previous RLC, is used to select
a pipeline having a complex Data Flow Graph. the correct carry-out. One of the LUTs (LUT1 in
The RLC includes two internal feed-back paths: a Figure 15) computes carry-out signals, in the case of
synchronous one and an asynchronous one. The carry-in equal to both 0 and 1. Appropriate program-
first one is used for implementing accumulator-like ming of LUT1 also allows one to use the chain for effi-
operators, while the second one enables an LUT to cient implementation of a number of useful functions
be fed with the output of the other one. Both inter- such as comparators, wide input logic gates
nal paths are useful for reduction of the amount of (AND,OR, . . .), parity bit generator/checker and sign
inversion. Furthermore, the PiCoGA
carry chain is enhanced combining
B C the carry select architecture with a
lookahead level-1 technique, which
A 2 2
2 roughly cuts the critical path in half.
D 2 3) Programmable Interconnec-
tions: The PiCoGA routing
INIT
2 Input MUX network has been designed
Asynchronous Feedback
Synchronous Feedback

and INIT Logic with 2-bit granularity, reduc-


Const
2 ing the area occupation with
4 4
respect to a single-bit granu-
larity. However, the input
LUT1 LUT0
connection blocks provide
16×2 16×2
both 2-bit and 1-bit connec-
2 tion granularity in order to
2 2
maintain routability and effi-
ciency of resources in cases
Chain Carry 4
like odd shifting and single
Out Chain
Chain bit control signals. Channels
Mux in
Out are composed of 15 pairs of
Mux tracks with a length of 3 tiles,
Sel
which has been found to be a
good trade-off between prop-
cout Cin
Add One cin agation delay and routability.
Sel
Furthermore, four global hor-
2 izontal lines have also been
designed to support fast
En EN EN
EEN Sel FF FF propagation of multi-fanout
control signals.
In standard FPGAs the
FF FF
CK routing network is largely
responsible for most of the
S0 D1 S1 A1 S1 D0 S0 A0
area occupation and delays.
This becomes even worse in
Z[3:2] Z[1:0] the case of multi-context
Figure 15. RLC architecture.
arrays, where configuration
bits have to be replicated.

22 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2006


For this reason the connect and switch blocks E. Digital System Performance
include a decoding stage between configuration The digital system architecture discussed in this section
memories and programmable switches. For has been implemented as a prototype chip (Figure 16)
example, the output connect block can connect [44] fabricated in 0.13 µm, 1.2 V, 6 metal layer CMOS
an output line of the RLC to only a single wire technology provided by STMicroelectronics. The die
in a routing channel, because of the 1 over N size is 36 mm2 and contains 17.5 M transistors. The SoC
decoding logic introduced. This causes a small operates at a nominal frequency of 166 MHz and fits in a
loss of routability, but allows to have a loga- 256-pin package with 151 I/Os. Static power consump-
rithmic reduction of the number of multi-con- tion is 35 mW, while average dynamic power consump-
text SRAM cells needed [41]. tion is 1.48 mW/MHz for the whole system, except for
the reconfigurable gate-array which consumes 100
D. XiRisc Computation Pattern µW/MHz for each active row.
The exploitation of run-time reconfigurability requires the In order to prove the effectiveness of our reconfig-
designer to explore a complex space in order to identify and urable digital processor for new generation cellular hand-
optimize critical kernels in the target application [42]. Typ- sets, we first analyzed one of the most challenging
ical kernels, for a wide spectrum of embedded applications, wireless communication specifications, namely 3GPP. Of
are the cores of innermost loops, which can be usually course, we cannot assume that digital architecture will be
described using traditional data-flow graphs. Significant used only for telecommunication algorithms, since we
speed-ups can be achieved by overlapping suc-
cessive loops iterations. In case of configurable
computing, we can increase throughput by
pipelining overlapped and/or unrolled loop ker-
nels. Of course, this technique requires accu- PLL Instruction
Cache Configuration
rate management of the custom pipeline.
Memory
Data Cache and
Tag Memories

Starting from a source code written in (slightly


extended) ANSI-C [43], a scheduler creates a Standard
Cells
pipelined DFG and then maps: (Including
■ DFG-node operators to the array of RISC Core)
Reconfigurable Logic Cells (RLCs) in
the PiCoGA;
■ pipeline management to a row-based
dedicated control unit which enables
128KB PiCoGA
the execution of pipeline stages imple-
On-Chip
mented on the PiCoGA computational Memory
resources.
The pipeline is built through careful stage
scheduling, starting from the above men-
tioned ANSI-C representation of the function-
ality of each extended instruction. The
dedicated programmable control unit man- Figure 16. Digital system micro-photograph.
ages the pipeline activity by starting new
PiCoGA operations or by stalling them when
requested resources are not yet available. Table 4.
The control unit has a minimum granularity Timing and energy consumption performance
of one array row, but more than one PiCoGA for some DSP algorithms.
row can be used to build a wider pipeline Rows Energy
stage. In order to maintain a fixed clock fre- Algorithm Occupation Speed-up Saving
quency, cascaded RLCs have to be mapped MPEG-2 23 5× 66%
to different pipeline stages. When a pipeline Turbo-decoder 24 8.6× 84%
stage completes its computation, it produces Reed Solomon encoder 14 80× 94.5%
a “token” which is sent both to predecessor DES 5 13.5× 89%
and to successor nodes through dedicated CRC 11 4.3× 49%
programmable interconnection channels. Motion estimation 24 14.8 74.8%

FIRST QUARTER 2006 IEEE CIRCUITS AND SYSTEMS MAGAZINE 23


expect that in the future the number of applications scheduling, only 84 cycles/bit/iteration are required for
provided by network operators will grow very rapidly. We the mathematical operations involved in the Linear-log
thus also evaluated the performance of the XiRisc archi- MAP algorithm, providing a theoretical performance
tecture on multimedia applications. upper-bound of 2 Mbps.
Table 4 shows some results for different applications Another algorithm that is typically used for both chan-
and algorithms, comparing the execution time and the nel coding environments and a variety of different appli-
energy consumption of the XiRisc with those of a DSP- cations is Reed Solomon (RS) error correction coding.
like architecture with the same processor core, DSP-spe- The implementation of the RS encoder on a 239-byte mes-
cific function units and cache memories, but with a sage, with 16 redundancy bytes, uses the PiCoGA device
single datapath. very efficiently, achieving an impressive 80× speedup.
In the telecom application field, we implemented a This is mainly due to the bit-level nature of the algorithm,
benchmark based on a turbo-decoder algorithm compli- which nicely fits the fine granularity of the reconfigurable
ant with the 3GPP mobile communication specifications array. The same considerations apply to the case of the
[45]. We chose a Linear-log MAP algorithm [42, 46], DES encryption algorithm.
because it offers BER performances which are up to As a reference benchmark on multimedia applications,
0.5 dB better than the more common Max-log-MAP algo- we present the results obtained on an MPEG-2 encoding,
rithm with an increase in computational cost of about applied to a standard QCIF stream with a frame resolution
40%. On a 640 bit block size, the proposed SoC requires of 176 × 144 pixels and half-pel precision. The recon-fig-
432 cycles/bit/iteration, corresponding to 384 kbps/itera- urable architecture permits a 5× execution speed-up,
tion at a typical clock frequency of 166 MHz. The same thus allowing encoding of up to eight frames per second.
algorithm requires 3715 cycles/bit/iteration on a basic The overall energy consumption to process a 12 frame
XiRisc processor, so that the resulting speed-up is 8.6×. stream is 718 mJ, corresponding to a 66% energy reduc-
The energy consumption is 1 mJ (84% saving). Further tion compared to the traditional DSP. Setting a full-pel res-
optimizations are possible through manual optimiza- olution, the frame rate increases to 12 frames/sec, as
tions at assembly level. For instance, with optimized required by the H261 video compression standard.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

DSP-Like
MPEG-2 Architecture
Encoder 268.23 mJ
(QCIF 64×64 px
XiRisc
Nine Frames)
92.64 mJ
DSP-Like
Turbodecoder Architecture
(640 b 6.75 mJ
Message) XiRisc
1.06 mJ
DSP-Like
Reed-Solomon Architecture
Encoder 174.52 µJ
(239 B XiRisc
Message) 9.77 µJ
DSP-Like
Motion Architecture
Estimation 913.36 µJ
(16×16 px
XiRisc
Search Window)
230.47 µJ

Processor Core
Memories PiCoGA
and AMBA System

Figure 17. Energy consumption for several DSP algorithms.

24 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2006


Figure 17 summarizes energy consumption contribu- References
tions from the main architectural components and [1] J. Rogin, I. Kouchev, and Q. Huang, “A 1.5 V 45 mW direct conversion
WCDMA receiver IC in 0.13 µm CMOS,” in IEEE International Solid-State
shows significant savings in energy consumption for Circuits Conference Digest of Technical Papers (ISSCC ’03), pp. 268–493,
both the processor and the bus, mainly due to the exe- Feb. 2003.
cution time decrease. The use of cache memories fur- [2] J. Bouras, S. Bouras, T. Georgantas, N. Haralabidis, G. Kamoulakos,
C. Kapnistis, S. Kavadias, Y. Kokolakis, P. Merakos, J. Rudell, S. Plevridis,
ther reduces energy consumption of memory accesses I. Vassiliou, K. Vavelidis, and A. Yamanaka, “A digitally calibrated
by 90%. One notices that both energy and execution 5.15–5.825 GHz transceiver for 802.11a wireless LANs in 0.18 µm CMOS,”
in IEEE International Solid-State Circuits Conference Digest of Technical
time are reduced very significantly over a broad range
Papers (ISSCC ’03), pp. 352–498, Feb. 2003.
of applications. [3] K. Gulati and H.S. Lee, “A low-power reconfigurable analog-to-digital
converter,” IEEE Journal of Solid-State Circuits, vol. 36, no. 12,
pp. 1900–1911, Dec. 2001.
IV. Conclusions [4] A. Stojcevski, J. Singh, and A. Zayegh, “CMOS ADC with reconfig-
This paper has described the analog and digital base- urable properties for cellular handset,” in Proceedings of the IEEE Inter-
band channels for a multi-standard reconfigurable ter- national Workshop on Electronic Design, Test and Applications (DELTA
’04), pp. 103–107, Jan. 2004.
minal supporting GSM (with Edge), WCDMA (UMTS), [5] B.J. Minnis and P.A. Moore, “A highly digitized multimode receiver
WLAN and Bluetooth, developed within the “Enabling architecture for 3G mobiles,” IEEE Transactions on Vehicular Technology,
vol. 52, no. 3, pp. 637–653, May 2003.
technologies for reconfigurable wireless terminals”
[6] A. Dezzani and E. Andre, “A 1.2-V dual-mode WCDMA/GPRS  mod-
project, funded by the Italian FIRB Project. The RF cir- ulator,” in IEEE International Solid-State Circuits Conference Digest of Tech-
cuits of such a terminal are described in a companion nical Papers (ISSCC ’03), pp. 58–59, Feb. 2003.
[7] S. Haykin, “Cognitive radio: Brain-empowered wireless communica-
paper. All the most important design issues, relating tions,” IEEE Journal on Selected Areas in Communications, vol. 23, no. 2,
to the management of different standards in a recon- pp. 201–220, Feb. 2005.
figurable transceiver, are analyzed. Experimental [8] N.J. Drew, D. Williams, M. Dillinger, P. Mangold, T. Farnham, and
M. Beach, “Reconfigurable mobile communications: Compelling needs
results on five test chips are reported to validate the and technologies to support reconfigurable terminals,” in Proceedings of
solutions adopted both at the architecture and at the the IEEE International Symposium on Personal, Indoor and Mobile Radio
Communications (PIMRC ’00), vol. 1, pp. 484–489, Sep. 2000.
circuit level.
[9] X. Li, A.R. Bugeja, and M. Ismail, “A fast and accurate calibration
method for high-speed high-resolution pipeline ADC,” in Proceedings of
the IEEE International Symposium on Circuits and Systems (ISCAS ’02),
vol. 2, pp. 800–803, May 2002.
Acknowledgments [10] H.S. Lee, “A 12-bit 600 kS/s digitally self-calibrated pipeline algorith-
The reported work is the result of the combined effort mic ADC,” IEEE Journal of Solid-State Circuits, vol. 29, no. 4, pp. 509–515,
of many more people than those listed as authors, all Apr. 1994.
[11] E.B. Blecker, T.M. McDonald, O.E. Erdogan, P.J. Hurst, and S.H. Lewis,
participating in the still ongoing Italian National Pro- “Digital background calibration of an algorithmic analog-to-digital con-
ject FIRB, whose contributions, at many different lev- verter using a simplified queue,” IEEE Journal Solid-State Circuits, vol. 38,
no. 6, pp. 1812–1820, June 2003.
els, should be recognized. Giorgio Baccarani, Pietro
[12] H.S. Lee and B.S. Song, “Digital-domain calibration of multistep
Erratico and successively Maurizio Zuffada have coor- analog-to-digital converter,” IEEE Journal of Solid-State Circuits, vol. 22,
dinated two of the participating groups sharing the no. 4, pp. 1679–1688, Dec. 1992.
[13] X. Wang, P.J. Hurst, and S.H. Lewis, “A 12-bit 20-MS/s pipelined ADC
burden of some of the key decisions to be taken. The
with nested digital background calibration,” in Proceedings of the IEEE
following staff member of STMicroelctronics, the Uni- Custom Integrated Circuits Conference (CICC ’03), pp. 21–24, Sep. 2003.
versity of Bologna and the University of Pavia have [14] B. Murmann and B.E. Boser, “A 12b 75MS/s pipeline ADC using open-
loop residue amplification,” in IEEE International Solid-State Circuits Con-
either led or participated to one of the workpackages ference Digest of Technical Papers (ISSCC ’03), pp. 328–497, Feb. 2003.
or subtasks of the project with their fundamental com- [15] E.J. Siragusa and I. Galton, “Gain error correction technique for
plementary competence: Alessandro Bosi, Michele pipelined analogue-to-digital converters,” Electronics Letters, vol. 37,
no. 7, pp. 617–618, Mar. 2000.
Fedeli, Marco Morelli, Andrea Panigada, Claudio [16] J. Ming and S. Lewis, “An 8-Bit 80-MSample/s pipeline analog-to-
Passerone, Guido Torelli and Carla Vacchi. Last but not digital converter with background calibration,” IEEE Journal of Solid-
State Circuits, vol. 36, no. 10, pp. 1489–1497, Oct. 2001.
least the following young researchers, Ph.D. students
[17] A. Bosi, A. Panigada, G. Cesura, and R. Castello, “An 80 MHz 4×
and Post-Doc have performed the bulk of the research oversampled cascaded -pipelined ADC with 75 dB DR and 87
lending to their creativity, enthusiasm and motivation: dBSFDR,” in IEEE International Solid-State Circuits Conference Digest of
Technical Papers (ISSCC ’05), Feb. 2005.
Walter Audoglio, Massimo Bocchi, Andrea Cappelli,
[18] J. McNeil, M. Coln, and B. Larivee, “A Split-ADC architecture for
Luca Ciccarelli, Stefano D’Amico, Nicola Ghittori, deterministic digital background calibration of 16b 1MS/s ADC,” in IEEE
Alberto La Rosa, Mihai Lazarescu, Silvia Marabelli, International Solid-State Circuits Conference Digest of Technical Papers
(ISSCC ’05), Feb. 2005.
Roberto Massolini, Claudio Mucci, Matteo Rossi, [19] J. Li and U.K. Moon, “Background calibration techniques for multi-
Andrea Vigna and Everest Zuffetti. To all of them goes stage pipeline adcs with digital redundancy,” IEEE Transactions on Cir-
the recognition and appreciation of all the authors and cuits and Systems II, vol. 50, no. 9, pp. 531–538, Sep. 2003.
[20] U.K. Moon and B.S. Song, “Background digital calibration tech-
especially of Rinaldo Castello as coordinator of the niques for pipelined ADC’s,” IEEE Transactions on Circuits and Systems II,
FIRB project. vol. 44, no. 2, pp. 102–109, Feb. 1997.

FIRST QUARTER 2006 IEEE CIRCUITS AND SYSTEMS MAGAZINE 25


[21] I. Galton, “Digital cancellation of D/A converter noise in pipelined Proceedings of the International Symposium on System-on-Chip, pp. 19–21,
A/D converters,” IEEE Transactions on Circuits and Systems II, vol. 47, no. Nov. 2003.
3, pp. 185–196, Mar. 2000. [44] M. Bocchi, C. De Bartolomeis, C. Mucci, F. Campi, A. Lodi, M. Toma,
[22] A. Baschirotto, N. Ghittori, P. Malcovati, and A. Vigna, “Design trade- R. Canegallo, and R. Guerrieri, “A XiRisc-based SoC for embedded DSP
offs for a 10 Bit, 80 MHz current steering digital-to-analog converter,” in for embedded DSP applications,” in Proceedings of the IEEE Custom Inte-
Proceedings of the IEEE Northeast Workshop on Circuits and Systems grated Circuits Conference (CICC ’04), pp. 595–598, Oct. 2004.
(NEWCAS ‘04), pp. 249–252, June 2004. [45] 3G TS.25.212 V5.1.0 Multiplexing and Channel Coding (FDD), Techni-
[23] Y. Cong and R. L. Geiger, “Formulation of INL and DNL yield estima- cal Specification Group Radio Access Network, 3rd Generation Partner-
tion in current steering D/A converters,” in Proceedings of the IEEE Inter- ship Project, 2002.
national Symposium on Circuits and Systems (ISCAS ’02), vol. 3, pp. [46] M.C. Valenti and J. Sun, “The UMTS turbo code and efficient decoder
149–152, May 2002. implementation suitable for software-defined radios,” International Jour-
[24] M.J. Pelgrom, A.C. Duinmaijer, and A.P. Welbers, “Matching proper- nal of Wireless Information Networks, vol. 8, no. 4, pp. 203–216, Apr. 2001.
ties of MOS transistor,” IEEE Journal of Solid-State Circuits, vol. 24, no. 10,
pp. 1433–1439, Oct. 1989.
[25] Tensilica Inc. [Online]. Available: http://www.tensilica.com
[26] H. Zhang, V. Prabhu, V. George, M. Wan, M. Benes, A. Abnous, and Andrea Baschirotto was born in Legnago
J.M. Rabaey, “A 1 V hetero-geneus reconfigurable processor IC for base-
band wireless applications,” in IEEE International Solid-State Circuits Con- (VR), Italy in 1965. He graduated in elec-
ference Digest of Technical Papers (ISSCC ’00), pp. 68–69, Feb. 2000. tronic engineering (summa cum laude)
[27] R. Razdan and M. Smith, “A high-performance microarchitecture
from the University of Pavia in 1989. In
with hardware-programmable functional units,” in Proceedings of the
Annual International Symposium on Microarchitecture, pp. 172–180, Nov. 1994 he received the Ph.D. degree in elec-
1994. tronic engineering from the University of
[28] P. Athanas and H. Silverman, “Processor reconfiguration through
instruction-set metamorphosis,” IEEE Computer, vol. 26, no. 3, pp. 11–18, Pavia. In 1994 he joined the Department of
Mar. 1995. Electronics, University of Pavia, as a
[29] R. Wittig and P. Chow, “OneChip: An FPGA processor with reconfig-
Researcher (Assistant Professor). In 1998 he joined the
urable logic,” in Proceedings of the IEEE Symposium on Field-Programma-
ble Custom Computing Machines, pp. 126–135, Mar. 1996. Department of Innovation Engineering, University of Lecce,
[30] T. Callahan, J. Hauser, and J. Wawrzynek, “The garp architecture and Italy, as an Associate Professor. Since 1989 he has collabo-
C compiler,” IEEE Computer, vol. 33, no. 4, pp. 62–69, Apr. 2000.
[31] Atmel FPSLIC. [Online]. Available: http://www.atmel.com/
rated with several companies on the design of mixed sig-
products/FPSLIC nals ASICs. He participates in several research
[32] Motorola MRC6011 Reconfigurable Compute Fabric (RCF) collaborations, also funded by National and European proj-
Device. [Online]. Available: http://e-www. motorola.com/files/if/cnb/
ASICRCFWP_D.pdf ects. He is now the Coordinator of a national project for the
[33] M. Borgatti, F. Lertora, B. Foret, and L. Calı̀, “A reconfigurable sys- design of large-dynamic range gas sensors. His main
tem featuring dynamically extensible embedded microprocessor, FPGA,
and customizable I/O,” IEEE Journal of Solid-State Circuits, vol. 38, no. 3,
research interests are in the design of mixed analog/digital
pp. 521–529, Mar. 2003. integrated circuits, in particular for low-power and/or high-
[34] D. Patterson and J. Hennessy, Computer Architecture: A Quantitative speed signal processing. He has authored or co-authored
Approach, Morgan Kaufmann, 1991.
[35] A. Lodi, M. Toma, F. Campi, A. Cappelli, R. Canegallo, and R. Guerrieri, more than 160 papers in international journals and presen-
“A VLIW processor with reconfigurable instruction set for embedded tations at international conferences, 6 book chapters, and
applications,” IEEE Journal of Solid-State Circuits, vol. 38, no. 11,
holds 25 industrial patents. In addition, he has co-authored
pp. 1876–1886, Nov. 2003.
[36] P. Chang, S. Mahlke, W. Chen, N. Water, and W. Hwu, “IMPACT: An more than 120 papers within research collaboration on
architectural framework for multiple-instruction-issue processors,” in high-energy physics experiments. Prof. Baschirotto was
Proceedings of the 18th Annual Int’l Symposium on Computer Architecture,
May 1991, pp. 266–275. Associate Editor of IEEE Trans. Circuits Syst.—Part II for
[37] A. Lodi, M. Toma, F. Campi, A. Cappelli, R. Canegallo, and R. Guerrieri, the period 2000–2003, and he is now serving IEEE Trans.
“A pipelined configurable gate array for embedded processors,” in Pro-
Circuits Syst.—Part I as an associate editor. He has been
ceedings of the ACM/SIGDA International Symposium on FPGA, pp., 21–30,
Feb. 2003. the Technical Program Committee Chairman for ESSCIRC
[38] A. DeHon, “DPGA-coupled microprocessors: Commodity ICs for the 2002 and he was the guest editor for the IEEE J. Solid-State
early 21st century,” in Proceedings of the IEEE Symposium on Field-
Programmable Custom Computing Machines, pp. 31–39, Apr. 1994.
Circ. for ESSCIRC 2003. He is the member of the Technical
[39] S. Trimberger, D. Carberry, A. Jhonson, and J. Wong, “Configura- Program Committee of several conferences (ISSCC, ESS-
tion caching techniques for FPGA,” in Proceedings of the IEEE Sympo- CIRC, DATE, etc.). He is IEEE Senior member.
sium on Field-Programmable Custom Computing Machines, pp. 34–49,
Apr. 2000.
[40] S. Hauck, K.C. Compton, and Z. Li, “A time multiplexed FPGA,” in Pro- Fabio Campi received the M.S. degree in
ceedings of the IEEE Symposium on Field-Programmable Custom Comput-
microelectronics and the Ph.D. degree in
ing Machines, pp. 34–40, Apr. 1997.
[41] A. Lodi, R. Giansante, C. Chiesa, L. Ciccarelli, F. Campi, and M. Toma, electronics and computing science from
“Compact buffered routing architecture,” in Lecture Notes in Computer University of Bologna, Bologna, Italy, in
Science, Field-Programmable Logic and Applications, vol. 3203. Springer-
Verlag, pp. 179–188, Sep. 2003.
1999 and 2003, respectively. In 1995 and
[42] A. La Rosa, L. Lavagno, and C. Passerone, “Software development 1996 he was with the Tampere University
for high-performance, reconfigurable, embedded multimedia systems,” of Technology, Tampere, Finland, as Vis-
IEEE Design and Test of Computers, vol. 22, no. 1, pp. 28–38, Jan. 2005.
[43] C. Mucci, C. Chiesa, A. Lodi, M. Toma, and F. Campi, “A C-based algo- iting Student. Since 1999, he has been a
rithm development flow for a reconfigurable processor architecture,” in consultant for Central Research and Development, ST

26 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2006


Microelectronics, for the application of innovative CMOS restrial Physics in Munich where he developed new pixel
design platforms on digital system-on-chip design. He is cur- detectors with embedded amplification for Nuclear Physics
rently with the Advanced Research Center on Electronic experiments. From 1996 to 1998 he was consulting ST-
Systems (ARCES), Bologna. His main research interests are Microelectronics for high resolution Delta-Sigma ADCs and
VLSI system-on-chip design, embedded microprocessors DACs for audio applications. Since 1999 he is with ST-Micro-
and development of advanced architectures and algorithms electronics where he contributed to start the Studio di
for digital signal processing. Microelectronica a University of Pavia-STMicroelectronics
joint research center. His main interests include the design
Rinaldo Castello was born in Genova, of high performance ADCs and DACs with digital calibration
Italy, in 1953. He graduated in electrical techniques and the design of analog base band front end for
engineering from the University of Gen- communication applications. Presently he is a Member of
ova (summa cum laude) in 1977. He the Technical Staff with the New IP and Design Support of
received the M.S. and the Ph.D. degrees the Computer and Peripheral Group of ST-Microelectronics
from the University of California, Berke- where he is in charge of the medium long time research for
ley, in 1981 and 1984, respectively. From UWB and HDD read write channel applications.
1983 to 1985 he was a Visiting Assistant
Professor at the EECS department of the University of Roberto Guerrieri received the Dr. Eng.
California, Berkeley, where he thought courses on ana- and Ph.D. degrees from the University of
log/digital integrated circuits design and advising sever- Bologna, Italy, in 1980 and 1986, respec-
al graduate students. In 1987 he joined the Department tively. After a period of time spent at the
of Electronics of the University of Pavia as an Associate Department of Electrical Engineering and
Professor where he is now a Full Professor. In addition to Computer Sciences, University of Califor-
his academic activities he has been acting as a Consul- nia, Berkeley as Visiting Researcher and at
tant for ST-Microelectronics, Milan, Italy in the area of the MIT as visiting scientist, he joined the
design of Integrated Circuits for many areas of applica- University of Bologna where he is currently Full Professor.
tions like Telecom, Disk Drive, etc. Since 1998 he has His research interests are in various aspects of integrated
been the Scientific Director of a joint research centre circuit modeling and design, including digital systems and
between the University of Pavia and ST-Microelectron- sensors, and applications of microelectronics to biotech-
ics, located in Pavia, Italy, and devoted to middle/long nology. His work on VLSI design has been cited by widely
term research in the area IC for wireless systems and read magazines, such as the Nikkei and Electronic Design
A/D interfaces. Dr. Castello has been a member of the and documented in more than 90 scientific papers. Seven of
technical program committee of the European Solid these have been presented in the last six years at various
State Circuit Conference (ESSCIRC) since 1987, the Inter- ISSCC conferences. His work on silicon sensors for finger-
national Solid State Circuit Conference (ISSCC) from print recognition has shown the first published example of
1992 to 2004, and was Technical Chairman of ESSCIRC a fully- integrated, silicon-only sensor, while his activity in
’91 and General Chairman of ESSCIRC 2002. He was the the area of reconfigurable computers has produced a sili-
Guest Editor of the July ’92 special issue of the IEEE J. con-tested processor that can be programmed in C and can
Solid-State Circ. and the Associate Editor for Europe of reconfigure run-time its instruction set. In 1992 he won the
the same magazine from ’94 to ’96. Since the year 2000 best paper award from the IEEE Transactions on Semicon-
he has been a Distinguished Lecturer of the Solid State ductor Manufacturing for his research carried out on issues
Circuit Society of IEEE. Prof. Castello was named one of related to the modeling of various IC manufacturing steps.
the outstanding contributors for the first 50 years of the In 2004 he was awarded an ISSCC best paper award for his
ISSCC. He was also a co-recipient of the Best Student work in the area of silicon-based lab-on-a-chip.
Paper Award at the 2005 Symposium on VLSI. Prof.
Castello is a Fellow of IEEE. Luciano Lavagno graduated magna cum
laude in electrical engineering from
Giovanni Cesura received the Laurea and Politecnico di Torino (Italy) in 1983.
the Ph.D. degrees in electrical and elec- From 1984 to 1988 he was with CSELT
tronics engineering both from the Univer- Laboratories (Torino, Italy). In 1988 he
sity of Pavia in 1989 and 1993, respectively. joined the Department of Electrical Engi-
From 1993 to 1996 he has been with the neering and Computer Science of the Uni-
Semiconductor laboratory of the Max versity of California at Berkeley, where he
Planck Institute of Terrestrial and Extrater- worked on logic synthesis and testing of synchronous and

FIRST QUARTER 2006 IEEE CIRCUITS AND SYSTEMS MAGAZINE 27


asynchronous circuits. In 1992 he received the Ph.D. in Piero Malcovati was born in Milano,
electrical engineering and computer science from the Uni- Italy in 1968. He received the “Laurea”
versity of California at Berkeley. Dr. Lavagno is a co-author degree (Summa cum Laude) in elec-
of two books on asynchronous circuit design, of a book on tronic engineering from University of
hardware/software co-design of embedded systems, and Pavia, Italy in 1991. In 1992 he joined
has published over 100 journal and conference papers. the Physical Electronics Laboratory
Between 1993 and 1998 he was an Assistant Professor with (PEL) at the Federal Institute of Tech-
Politecnico di Torino, and between 1998 and 2001 he was nology in Zurich (ETH Zurich), Switzer-
an Associate Professor with the University of Udine. land, as a Ph.D. candidate. He received the Ph.D.
Between 1993 and 2000 he was the architect of the POLIS degree in electrical engineering from ETH Zurich in
project (a cooperation between U.C. Berkeley, Cadence 1996. From 1996 to 2001 he has been Assistant Profes-
Design Systems, Magneti Marelli and Politecnico di Tori- sor at the Department of Electrical Engineering of the
no), developing a complete hardware/ software co-design University of Pavia. From 2002 Piero Malcovati is Asso-
environment for control-dominated embedded systems. ciate Professor of Electrical Measurements in the same
He is currently an Associate Professor with Politecnico di institution. His research activities are focused on
Torino, Italy and a research scientist with Cadence Berke- microsensor interface circuits and high performance
ley Laboratories. In 1991 he received the Best Paper award data converters. He authored and co-authored more
at the 28th Design Automation Conference in San Francis- than 20 papers in international journals, more than 70
co, CA. He has served on the technical committees of sev- presentations at international conferences (with pub-
eral international conferences in his field (e.g., the Design lished proceedings), 5 book chapters, and 5 industrial
Automation Conference, the International Conference on patents. He was guest editor for the Journal of Analog
Computer Aided Design, the International Conference on Integrated Circuits and Signal Processing for the spe-
Computer Design, and Design Automation and Test in cial issue on IEEE ICECS 1999. He served as Special Ses-
Europe) and of various other workshops and symposia. sion Chairman for the IEEE ICECS 2001 Conference and
His research interests include the synthesis of asynchro- as Secretary of the Technical Program Committee for
nous and low-power circuits, the concurrent design of the ESSCIRC 2002 Conference. He was and still is mem-
mixed hardware and software embedded systems, and ber of the Scientific Committees for several interna-
dynamically reconfigurable processors. tional conferences, including ESSCIRC and DATE. He is
Senior IEEE member and associate editor of the Journal
Andrea Lodi received the degree in of Circuits, Systems, and Computers.
electrical engineering and the Ph.D. in
1998 and 2002, respectively at the Uni- Mario Toma received the M.S. degree in
versity of Bologna, Bologna, Italy. Since electronics and the Ph.D. degree from the
1998 he has been working as a consult- University of Bologna, Bologna, Italy, in
ant for STMicroelectronics in the fields 1998 and 2002, respectively. Since 1999 he
of signal processing algorithms and has been a consultant for ST-Microelec-
innovative architectures of systems-on- tronics for the application of innovative
chips and reconfigurable devices. He is currently with CAD CMOS design platforms on digital
the “Advanced Research Center on Electronic Systems” system-on-chip design. He is currently
(ARCES), Bologna, Italy. with CR&D ST-Microelectronics, Agrate Brianza, Italy.

28 IEEE CIRCUITS AND SYSTEMS MAGAZINE FIRST QUARTER 2006

Das könnte Ihnen auch gefallen