Beruflich Dokumente
Kultur Dokumente
<;+>=?02$,02@A%.-CB>%ED+F'?GIHKJ&0LM0
Without loss of generality, let’s assume that it is imple- Figure 4. A two-input C-element implemented
mented as as a majority gate.
egfih l h fihEl
k fih j m h fihEj
a a
b b
x c
x_ x
x_ x_
x x_
x_
Figure 5. Schematics for a 2-input and a 3-input C-elements. The central part is used for feedback
paths.
wt
wf
x_
ri
x_
xt_
xf_
wack_
0 + 0 +
en en
X F X ~F X Ft Ff
z
zt_
z_
zf_
Figure 8. Function block with single output. Figure 9. Schematic for a dual-rail function
Both boolean expressions and m are used block
to make the gate combinational.
m S m f j
m S m f j
A symbolic layout is shown in Figure 8 . For example, if
is the dual-rail equality of y and z , i.e. 1 y O{z Sy Oz ,
with the pulldowns unchanged. This simplification requires
the complement of has to be implemented as m y O m y S that m US Sc be an invariant of the system. In other
m z O m y S m z O m y S m z O m z , since unfortunately
words, the condition for applying the transformation is that
we can only implement parallel or-terms. This leads to 5 the function inputs are not reset when U is set. A symbolic
parallel pullups. The added complexity may be significant. nano-layout for this solution is shown in Figure 9. Certain
But it should be observed that the added pullups and pull- QDI templates already satisfy this condition. It is the case
downs are used only to maintain the value of the output in for the control/data decomposition scheme of the pipeline
the floating state(s). Therefore, their possible complexity, stage used in the CAM. (It relies on an easily satisfiable
which translates into a weak drive ( U current), might be isochronicity assumption.) Other QDI templates can eas-
acceptable since those extra terms are not used to switch ily be transformed to satisfy the invariant. It is the case for
the output but only to maintain the current value of the out- the PCHB (“precharge half-buffer”) of the MiniMIPS de-
put. However, the following simplification significantly sign style. Since the validity (usually called LP<^T ) of the
improves the circuit. It applies whenever it is required to data input is always computed, it suffices to replace the en-
compute the dual-rail version of the function, i.e.: able signal with U2 defined as LP<,T9 7 (i.e. L is the
f j f j output of the C-element with and 2P4.T as inputs).
m U m
f l f l
UOW O
5. Implementing an Adder
If the only floating states are UO# O m for the block,
and UO O m for the block, then we can staticize Next, we describe the implementation of an n-bit adder
the PRS as: stage in nano. The adder is designed in the same style as
the one h.¢¥
used h in the CAM. The basic QDI pipeline stage
£ ¤
I
.¡ P T§¦ can be implemented as in Figure 10 us-
ing the control/data decomposition approach of the CAM.
(See, for instance, [7].) As shown in Figure 11, the con-
trol part can be implemented as a simple half-buffer (which
lo ri can be as simple as a C-element in this case); the datapath
li ctrl ro consists of a dual-rail register per bit of input with the two
nand-gates of the read part replaced by the precharge func-
CT tions computing the carry and sum for every pair of input
bits.
PCF
In this design, the condition for removing staticizers is
x reg x f f(x)
satisfied provided a mild isochronicity assumption is ful-
filled, and therefore the function blocks for sum and carry
can use the derived transformation. The isochronicity as-
Figure 10. Control/data decomposition of a sumption is that the delay for the valid inputs to propagate
pipeline stage from the outputs of the registers to the input of the function
blocks should be less than the delay of the adversary path
consisting of the wacks (in parallel), the completion tree,
and the C-element in the control.
Furthermore, in order to avoid propagating the control
signal U (which is also ¨ª© in this case) to all bits of the
datapath, we can replace U with } S|} in all cells but the
least significant one, where } and } are the pair of bits
implementing the carry-in. The nano-layouts for the carry
and sum functions are shown in Figure 12 and Figure 13.
The production rules describing the pullups and pulldowns
of the sum and carry cells are as follows. For the sum:
f l
} |
O Py zT«SQ} OPyW ¬ z T
f l
} OPy
lo_ ri zTS} OPyW ¬ zT
m } O m f j
li C } S m
m } O m f j
ro } S m
x1 reg1 x1 F1 f(x1)
wack
6. Isochronic Fork
x2 reg2 x2 F2 f(x2) Once all QDI building blocks have been defined, the
next step is to compose them into a working system. We
leave aside the issue of assembling 1 into a layout and con-
Figure 11. A control/data pipeline stage with centrate on the critical issue of isochronic forks: Given
half-buffer control the wide timing variations to be expected in this technol-
ogy and in nano-CMOS, can we satisfy the timing require-
ments of isochronic forks? Although the Async reader
should be familiar with the concept of isochronic fork, there
still exists enough misunderstanding about the nature of
the isochronicity requirement that a discussion is in order.
In the past, a simple-to-explain timing condition has often
0 + hEj l
that h precedes ¯ .h In an implementation of the circuit,
ct gate ± with output
h
is directly connected to gate ±#¯ with
cf output ¯ , i.e., is an input of ±#¯ .
bt Hence, a necessary condition for an asynchronous cir-
bf cuit to be delay-insensitive is that all transitions are ac-
at knowledged.
af
Unfortunately, the class of computations in which all
dt_
transitions are acknowledged is very limited, as we proved
in [8]. Consider the following example in which the specifi-
df_
cation of the computation
h
requires an ordering of transitions
dt_
on variables , ¯ , and defined by the sequence:
df_
hEjZ¢ jZ¢ ¢hEjE¢ j
²³²´² ¯ ²´²³² ²³²´²
Figure 12. Carry-out computation Implementing this sequence of transitions requires intro-
ducing at least hEone
j
control j variable } to distinguish the
hZj
states in j which causes ¯ from the states in which
0 + causes , leading to PRs of the form:
h f j
ct
}O ¯
m & h f
a j
cf } O
bt h h
As h shown in Figure 14, as the output of gate ± is h forked
bf
to , an input of gate ±#¯ with output ¯ , andhEj to 6 , an
at
input of gate ±& with output . Aj transition when }
af holds j is followed by a h transition ¯ , but not by a transi-
j
st_ tion , i.e. transition is acknowledged but transition
h j
sf_ 6 is not, and vice versa when m } holds. Hence, in ei-
st_ ther case, a transition on one output of the fork is not ac-
sf_
knowledged. In order to guarantee that the unacknowledged
transition completes without violating the specified order, a
timing assumption called the isochronicity assumption has
Figure 13. Sum computation to be introduced, and the forks that require that assumption
are called isochronic forks[8]. The branch of the fork with
the non-acknowledged transition is called an isochronic
branch. (Not all forks in a QDI circuit are isochronic.)
been used to implement the isochronic fork; but, in the pres-
µ¶J,0n·ª3HKJ&@)$^"4HE"*'D¹¸33F,Mb,'A"4$
ence of important timing variations, this sufficient timing
condition is difficult to implement as it relies on a two-
sided timing inequality. The designer should switch to a First, it is important to realize that the weakest timing
more complex but easier to satisfy necessary and sufficient assumption associated with an isochronic fork is not a re-
condition. This condition relies on the notion of adversary lation between the delays on the different branches of the
path. It is a one-sided inequality stating that a single tran- fork—which would be a very tight assumption indeed. For
sition should be shorter in time than a sequence of multiple example, in the case of a fork with two branches, if ºKP<» T
transitions. and ºKP<»§6T are the delays of transition » on one branch and
Let us first review how the need for isochronic forks transition 6 on the other branch, the timing assumption is
arises. A computation implements a partial order of tran- NOT that ¼ ºKPX» T+½¾ºKPX»§6ATU¼E¿xÀ . As already mentioned, such
sitions. In the absence of timing assumptions, this partial a timing requirement has been used because of its simplic-
order is
hEj
based on a causalityl relation. For example, tran- hEj
ity. It is sufficient but not necessary, and is very difficult to
sition causes transition ¯ in state ° if and only if fulfill in the presence of large parameter variations, in par-
k l l
makes guard ¯ of ¯ hZtrue j
in ° . Transition ¯ is said to ticular threshold voltages.
acknowledge transition . We do not have to be more hZj
spe- In order to characterize a weaker timing condition for an
cific
l
about the precise ordering in time of transitions and isochronic fork to behave properly, let us examine how a
¯ . The acknowledgment relation is enough to introduce circuit will fail when a transition on an isochronic branch h j
the desired partial order among transitions, and to conclude fails to complete. In our previous example, assume that
li (0−>1) li2(0−>1)
c Gy y ro (0−>?)
li1(0−>1)
x1 x2_(1−>0)
x C
x2 x1_ ri_ (1)
lo_ ri_
c Gz z
Acknowledgments
References