Beruflich Dokumente
Kultur Dokumente
3, MARCH 1998
303
I. INTRODUCTION
304
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1998
holding output sample values. The radices are the moduli, and
the scaling operation can be performed on any single modulus
or a combination of moduli (by repeating the single modulus
operations). Scaling by more than one modulus corresponds
to shifting by more than one bit position in the binary number
system.
The fundamental idea is to perform the radix division
by inverse multiplication. To guarantee that the result will
be integral, an initial subtraction is performed to round the
operand to the next smallest multiple of the radix. Letting
and denoting by
the result of
scaling
by modulus
we have that
(1)
That is, the scaling operation consists of a subtraction of
from and multiplication by
in each modulus except the
does not exist in the th modulus. Conseith. Note that
quently, the subtraction and the multiplication are performed
independently and concurrently in each modulus except the ith.
This means that the result is defined only in digits other than
the th. An operation called Base Extension [12] can be used
to restore the lost digit, if it is needed. It consists of scalings by
the remaining moduli (with the lost digit initialized to zero),
followed by a final multiplication by the additive inverse of
the product of these moduli. Base Extension is not needed for
frequency synthesis, and will not be discussed further. Note
that scaling requires some method of converting a residue digit
from one modulus to another. It will be seen in Section III that
in the OHR number system these conversions are simple and
fast.
As an example of the scaling operation, let
and
Then
305
xi :
With this one-hot representation of the residue digits, addition can be performed by cyclic shifts (rotations). One of the
operands (the data operand) is rotated by an amount equal
to the others (the shift operand) value. The rotation can be
performed by one of several types of circuits; in our work
we have chosen to use barrel shifters. These circuits compute
all possible rotations in parallel and pass when required the
appropriate one to the output. They can be used to perform
multiplication also, as will be discussed in Section III.
Calculation of inverses and indices is simple and fast in
the OHR. The process is merely an appropriate permutation,
in each modulus simultaneously, of the signals that comprise
the residue digit. The permutation requires no hardware and
causes little delay. Modulus conversion, which is the process
of converting a residue digit to its value in another modulus,
is also very efficient in the OHR and consists of using OR
gates to collect digit values which are congruent modulo the
target modulus.
C. Direct Digital Frequency Synthesis (DDFS)
DDFS is a method of sinusoidal signal generation that
yields frequencies of high precision and resolution. It is a
purely digital technique and therefore is more reliable and
precise than analog methods. Additionally, it allows fast
frequency switching that is phase continuous, something that
is costly and complicated with analog techniques. It is widely
used in communications and instrumentation systems where
exceptionally pure and stable signals must be generated.
Most DDFS systems use the Sine table lookup method [13],
[14] wherein the output is generated by periodically accessing
a ROM containing contiguous samples of a single period (or
quadrant) of a sine wave. The samples are converted to analog
by a digital-to-analog converter (DAC). The ROM addresses
are computed using a phase accumulator, which generates
successive multiples of an externally supplied frequencysetting word. The value of this word establishes the output
frequency. If it is large, the output frequency is high because
the ROM addresses and sine samples are widely spaced.
Conversely, small frequency setting words produce low output
frequencies because the ROM samples are closely spaced.
The architecture of our OHR-based synthesizer is shown in
Fig. 2. It employs a pipelined architecture and barrel shifterbased RNS processing to achieve low-latency sample generation and fast frequency switching. It is designed for use in
a high-performance frequency-hopped spread spectrum communication system.
All signal processing is performed in an -modulus RNS
in which one of the moduli must be a multiple of 4. This
condition allows the size of the sample ROM to be reduced
306
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1998
(a)
multiplier.
307
(a)
(b)
Fig. 7. Level restoration methods: (a) output buffered and (b) active pull up.
308
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1998
Fig. 10.
For the OHR circuits, DP-product was estimated for simultaneous single transitions on both shift and data inputs. The
combined average power on both the rails and output lines was
computed by using MS EXCEL utilities on the SPICE output
data files. The output delay was measured to the half-rail point
on the output waveform. The product of these two quantities
is the DP-product (per transition).
For the binary circuits, the average rail and output powers
were simulated in the same way but for single operand
changes on both inputs. These changes were chosen so that
approximately half of the output and input bits changed. The
delay times were found for worst case operands and were
measured from the time of the input change (simultaneous on
all inputs) to the 50% point on the last output bit to settle.
The DP-product as a function of modulus is plotted in
Fig. 11 for TG OHR circuits. Analytical expressions, found
using an MS EXCEL curve fitting tool, are also given. It can
be seen that the DP-product is reduced below that of binary
adders and multipliers by at least 35% and 90%, respectively,
Fig. 12.
309
(a)
Fig. 13.
(b)
Fig. 15. (a) PA adder and (b) register element.
Fig. 14.
310
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1998
(a)
(b)
Fig. 16.
(c)
311
TABLE I
CRITICAL PATH DELAY (TRANSISTORS)
Fig. 17.
AI unit architecture.
(b)
Fig. 18.
word line is asserted and the bit lines evaluate. Output data are
active low and are converted to active high by the inverting
action of the SA/OBs. Note that each word line drives only
two transistor gates (due to the OHR encoding).
The Encoder unit generates the msb and next-msb from
the even modulus of the Scaler output. It does this with NOR
combinational logic which classifies the input magnitude as
being in the upper half range (msb) or in either of second
or fourth quartiles (next-msb). The DAC is a standard highspeed twos-complement type with a sign invert (SI) input
which toggles the sign of the output.
There are two primary limitations of the OHR design. First,
the size of the adders and multipliers grows as the square of the
modulus. For large frequency resolutions (i.e., wide PAs) the
chip area could become large enough to prohibit cost-effective
fabrication. We are presently researching decomposition methods whereby large moduli can be partitioned into smaller ones
which consume much less area.
The second limitation is routing area. Operand widths in
the OHR number system increase linearly with the modulus.
Consequently, chip routing area is large if few metal layers are
312
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1998
TABLE II
POWER ESTIMATES (TRANSISTORS)
which vary greatly. Other data (not shown) indicate that the
reduction in delay is relatively independent of modulus size,
and that the DP-product reduction is due to both delay and
power reduction (at least 60% for each for all modulus sets
Furthermore, changes in
and have
and values of
modest effects on the DP-product.
V. CONCLUSIONS
The OHR number system appears to offer a significant reduction of the DP-product of CMOS arithmetic circuits below
that of binary number system or binary-encoded RNS circuits.
OHR-based arithmetic circuits offer several other advantages,
including regular layout, operand-independent delay, gate-free
operations such as inversion and modulus conversion, and
simplicity. We have presented SPICE simulation results which
indicate that the ripple carry adder (Wallace tree multiplier)
DP-product is reduced by as much as 70% (95%) for smaller
moduli (e.g., 17), and that this improvement is due primarily
to delay, rather than power, reduction.
Use of the OHR is exemplified in the design of an
OHR-based direct digital frequency synthesizer for frequencyhopped spread-spectrum communication systems. Estimates of
its DP-product indicate a reduction by at least 90% below that
of a recently proposed binary-encoded RNS-based synthesizer.
ACKNOWLEDGMENT
The author is grateful to W. Ivancic of the NASA Lewis Research Center, Space Communications /Electronics Division,
for supporting this research. The simulation assistance of C.
Brogdon and D. Andrevska is much appreciated.
313
REFERENCES
[1] R. Krishnamurthy, I. Lys, and L. Carley, Static power driven voltage
scaling and delay driven buffer sizing in mixed swing QuadRail for
Sub-1V I/O swings, in Proc. 1996 Int. Symp. Low Power Electronics
Design.
[2] S. Rajgopal, Challenges in low power microprocessor design, in Proc.
9th Int. Conf. VLSI Design: VLSI Mobile Commun., 1996.
[3] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design,
A Systems Perspective, 2nd ed. Reading, MA: Addison Wesley, 1993,
p. 370.
[4] S. Prasad and K. Roy, Circuit optimization for transistor reordering
for minimization of power consumption under delay constraints, ACM
Trans. Des. Automat. Electron. Syst., vol. 1, no. 2, Apr. 1996.
[5] C. Nagendra, R. M. Owens, and M. J. Irwin, Unifying carry-sum and
signed-digit number representations for low power, in Proc. 1995 Int.
Symp. Low Power Design.
[6] K. Roy and S. Prasad, Syclop: Synthesis of CMOS logic for low power
application, in Proc. 1992 Int. Conf. Computer Design.
William A. Chren, Jr. received the Ph.D. degree from The Ohio State University, Columbus,
in 1987.
He is presently an Associate Professor of electrical engineering at Grand Valley State University,
Grand Rapids, MI. His current research interests
include low area/delay/power CMOS, ASIC, and
FPGA architectures for DSP and telecommunications, alternative number systems (e.g., RNS, Galois fields) for DSP, quantum-effect devices and
their use in A/D and D/A conversion, and highperformance chip architectures for encryption/decryption and testing of ATM
network packet switches. He has taught various undergraduate and graduate
courses at Ohio State, Penn State, and the University of Kentucky. He
currently heads the Laboratory for VLSI Development at GVSU and teaches
digital VLSI, electronics, microcontroller, and communications courses.