Beruflich Dokumente
Kultur Dokumente
Abstract
A novel methodology and algorithm for the design of
large low-power asynchronous systems are described.
The system is synthesized by a commercial tool as a
synchronous circuit, and subsequently converted into an
asynchronous one. The conversion algorithm consists of
extracting input and output sets, replacing the storage
elements, identifying fork and join sets, and constructing
request and acknowledge networks. A DLAP (Doubly
Latched Asynchronous Pipeline) architecture is
employed. The resulting asynchronous circuit can adapt
its effective operating frequency to the supply voltage,
facilitating flexible and efficient power management. The
algorithm has been validated on several circuits.
1.
Introduction
2.
DLAP
Lm+
Ao
Ai
Ri control Ro
LS
LM
LM
Ai+
Ri+
Ri-
G+
CL
Ds+
Ro+
G-
Lm-
Ao-
Ao+
Ls-
B+
Dm-
Ds-
Ai-
Ro-
Ro
C
LS
CL
CL
B-
Dm+
Ao
Ai
Ri control Ro
LS
Ls+
CL
C
G
Ao
Ai
Lm
Ls
Dm
Ds
CL
CL
CL
3.
Lm
O1
I1
CL
O2
FF
CL
O3
FF
I4
CL
DL
Cntl
I2
CL
O4
DL
I4
CL
O4
I3
I1
O2
FF
O3
I2
CL
Ls
DL
CLK
I1
AO
Cntl
O1
RO
Cntl
O1
I3
I2
CL
O2
At this second step, the algorithm identifies all flipflops or registers. Each flip-flop is replaced by a DLAP
single-bit register and control logic. Only one control
logic block is required for all the bits of the same register.
Figure 6 shows the result of this step applied to the
example circuit of Figure 5.
The four handshake signals of each controller (RI, AI,
RO, AO) are interconnected to other controllers during
the steps below that construct the Request and
Acknowledge networks. But before they can be
interconnected, Fork and Join sets must be generated.
O1
req_i1
CL
CL
d12
req_o3
d23
req_o1
req_o3
d33
req_o1
req_o2
req_o1
req_i2
req_i1
Cntl
Lm
O1
I1
CL
Ls
DL
req_o2
I2
req_i3
DL
req_o3
I3
req_o4
req_i4
Cntl
CL
Cntl
O2
O3
DL
I4
CL
O4
- request network
req_i3
req_i2
req_o3
d33
O3
I3
Figure 8: Fork(I1)
req_i1
d23
req_i2
I2
req_o2
d11
req_o2
fork
point
req_i3
O2
req_o1
d11
req_i1
O[m]
CL
ack_i1
O[n]
CL
ack_i4
ack_o4
4.
ack_i1
I1
req_i2
CL
ack_o1 Lm
O1
Ls
ack_i2
DL
req_o2
I2
req_i3
ack_o2
O2
req_o3
ack_o3
CL
Cntl
O3
req_o4
req_i4
Cntl
DL
ack_i4
I4
ack_o4
CL
O4
ack_i3
DL
I3
req_i'
CL
O[n]
DL
I [n]
CL
I[n]
Analysis
Cntl
ack_o
cntl
req_i
logic
Lm
Ls
ack_o3
O3
O4
I[k]
CL
ack_i3
I4
I [n]
FF
req_o
d
ack_i2
CL
ack_o
req_o
ack_o2
I3
CL
O2
I2
O[n]
ack_o1
O1
I1
CLOCK
- request network
- acknowledge network
5.
Conclusions
References
[1]
[2]
[3]
Ckt
Function
Gates Added
Added
Added
in
Control
Gates in
Delay
Sync Logic
Async
Gates
Ckt
Units
Ckt
[4]
Total
Gates
Area
in
Growth
Async
Ckt
N5
Test
Circuit
Test
Circuit
Error
Detector
Traffic
Light
Controller
Data Path
780
74
210
990
27%
N6
FIR Filter
1148
10
93
293
1441
26%
N1
N2
N3
N4
18
14
37
55
206%
32
10
56
88
175%
113
32
147
260
130%
[5]
[6]
[7]
[8]
[9]
64
48
71
135
111%
[10]
[11]
[12]
250%
[13]
200%
150%
[14]
[15]
100%
50%
[16]
0%
0
200
400
600
800
1000
1200
[17]
[18]
[19]