Sie sind auf Seite 1von 37

Architecture exploration in

Bluespec

Arvind
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology

Guest Lecture 6.973 (lecture 7)

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-1
Chip design has become
too risky a business
Ever increasing size and complexity
„ Microprocessors: 100M gates ⇒ 1000M gates
„ ASICs: 5M to 10M gates ⇒ 50M to 100M gates
Ever increasing costs and design team sizes
„ > $10M for a 10M gate ASIC
„ > $1M per re-spin in case of an error (does not
include the redesign costs, which can be substantial)

18 months to design but only an eight-mon


selling opportunity in the market th
ƒ Fewer new chip-starts every year

ƒ Looking for alternatives, e.g., FPGA’s

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-2
Typical SOC Architecture
For example: Cell phone
Hardware/software
development needs to
be tightly coupled in
order to meet
performance/power/
cost goals
System validation for
functionality and
performance is very
difficult
Stable platform for
software development
IP block reuse is
essential to mitigate
development costs

IP = Intellectual Property
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
Downloaded on [DD Month YYYY].
March 1, 2006 L-3
IP re-use sounds great until
you start to use it...
data_in data_out

Example: Commercially available push_req_n full

FIFO IP block pop_req_n


empty

clk

uc h rstn
o f s le
i o n asib
cat f e
i fi s
ver ints i
i ne tra
ach o n s
o m al c
N orm
inf
These constraints are spread over many pages of
the documentation...
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-4
New semantics for expressing behavior
to reduce design complexity
Decentralize complexity: Rule-based
specifications (Guarded Atomic Actions)
„ Let us think about one rule at a time
Formalize composition: Modules with
guarded interfaces
„ Automatically manage and ensure the
correctness of connectivity, i.e., correct-by-
construction methodology
„ Retain resilience to changes in design or
layout, e.g. compute latency ∆’s
„ Promote regularity of layout at macro level

Bluespec
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-5
Bluespec promotes composition
through guarded interfaces
Self-documenting
theModuleA interfaces;
Automatic generation
theFifo.enq(value1); Enqueue
arbitration
of logic to eliminate
control conflicts in use.
theFifo.deq();
value2 = theFifo.first(); theFifo
n

enq
enab
not full
rdy

theModuleB

deq
enab FIFO
not empty
rdy
n

first
theFifo.enq(value3);
not empty
rdy
Dequeue
arbitration
theFifo.deq(); control

value4 = theFifo.first();

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-6
In Bluespec SystemVerilog (BSV) …
Power to express complex static
structures and constraints
„ Checked by the compiler
“Micro-protocols” are managed by the
compiler
„ The compiler generates the necessary
hardware (muxing and control)
„ Micro-protocols need less or no verification
Easier to make changes while
preserving correctness

Î Smaller, simpler, clearer, more correct code


Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-7
Bluespec: State and Rules
organized into modules
module

interface

All state (e.g., Registers, FIFOs, RAMs, ...) is explicit.


Behavior is expressed in terms of atomic actions on the state:
Rule: condition Î action
Rules can manipulate state in other modules only via their
interfaces. Cite as:MITVladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-8
Programming with
rules: A simple example
Euclid’s algorithm for computing the
Greatest Common Divisor (GCD):
15 6
9 6 subtract
3 6 subtract
6 3 swap
3 3 subtract
0 answer: 3 subtract
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-9
GCD in BSV x

swap
y

sub

module mkGCD (I_GCD);


Reg#(int) x <- mkRegU;
State
Reg#(int) y <- mkReg(0);
typedef int Int#(32)
rule swap ((x > y) && (y != 0));
x <= y; y <= x;
Internal
endrule
behavior
rule subtract ((x <= y) && (y != 0));
y <= y – x;
endrule
method Action start(int a, int b) if (y==0);
x <= a; y <= b;
External
endmethod
interface
method int result() if (y==0);
return x;
endmethod
endmodule
Assumes x /= 0 and y /= 0
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-10
GCD Hardware Module
t
int
t
int In a GCD call t

start
enab could be

module
y == 0 rdy Int#(32),

GCD
implicit UInt#(16),
conditions t Int#(13), ...
int

result
rdy
y == 0
#(type t)
interface I_GCD;
t t
method Action start (int a, int b);
method intt result();
endinterface
The module can easily be made polymorphic
Many different implementations can provide the same
interface: module mkGCD (I_GCD)

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-11
Bluespec Tool flow
Bluespec SystemVerilog source

Bluespec Compiler

Blueview C Verilog 95 RTL

Bluespec C sim Cycle


Accurate
Verilog sim RTL synthesis

VCD output gates

Legend
Debussy
files
files
Visualization
Bluespec tools
Bluespec tools C ite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
33rdrd party
party tools
tools Downloaded on [DD Month YYYY].
March 1, 2006 L-12
Generated Verilog RTL: GCD
module mkGCD(CLK,RST_N,start_a,start_b,EN_start,RDY_start,
result,RDY_result);
input CLK; input RST_N;
// action method start
input [31 : 0] start_a; input [31 : 0] start_b; input EN_start;
output RDY_start;
// value method result
output [31 : 0] result; output RDY_result;
// register x and y
reg [31 : 0] x;
wire [31 : 0] x$D_IN; wire x$EN;
reg [31 : 0] y;
wire [31 : 0] y$D_IN; wire y$EN;
...
// rule RL_subtract
assign WILL_FIRE_RL_subtract = x_SLE_y___d3 && !y_EQ_0___d10 ;
// rule RL_swap
assign WILL_FIRE_RL_swap = !x_SLE_y___d3 && !y_EQ_0___d10 ;
...
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
Downloaded on [DD Month YYYY].
March 1, 2006 L-13
Generated Hardware
x

start
y
wen
rdy x_en y_en y
x

next state values


> !(=0) sub
result

x predicates
rdy

swap? subtract?

x_en = swap?
y_en = swap? OR subtract?

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
Downloaded on [DD Month YYYY].
March 1, 2006 L-14
Generated Hardware Module
x

start
y
wen start_en start_en
rdy x_en y_en
x y

> !(=0) sub


result

x
rdy

swap? subtract?

x_en = swap? OR start_en


y_en = swap? OR subtract? OR start_en
rdy = (y==0)
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
Downloaded on [DD Month YYYY].
March 1, 2006 L-15
Design a 802.11a Transmitter
802.11a is an IEEE Standard for wireless
communication
Frequency of Operation: 5Ghz band
Modulation: Orthogonal Frequency Division
Multiplexing (OFDM)

Transmitter Analog
TX MAC
TX

Channel

Analog Receiver RX MAC


RX

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-16
Transmitter Overview
headers

Controller data Scrambler Encoder

Interleaver Mapper

Cyclic
IFFT
Extend

IFFT Transforms 64 (frequency domain)


complex numbers into 64 (time domain) compute intensive
complex numbers
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-17
Receiver Overview
FFT, in half
duplex
Serial to system is
Synchronizer FFT
Parallel often
shared with
IFFT

Detector /
Deinterleaver
Viterbi Controller

Descrambler

compute intensive
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-18
IFFT Requirements
802.11a needs to process a symbol in 4 µsec
(250KHz)
„ IFFT must output a symbol every 4 µsec
Š i.e. perform an Inverse FFT of 64 complex numbers
„ Each module before IFFT must process every 4 µsec
Š 1 frame for 6Mbps rate
Š 2 frames for 12Mbps rate
Š 4 frames for 24Mbps rate
„ Even in the worst case (24Mbps) the clock frequency
can be as low as 1Mhz.

But what about the area & power?

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-19
Area-Frequency Tradeoff
We can decrease the area by multiplexing some
circuits and running the system at a higher frequency

Reuse
Twice the frequency
but half the area

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-20
Combinational IFFT
in0 out0
in1 Radix 4 Radix 4 Radix 4 out1
Permute_1

Permute_2

Permute_3
in2 out2
Radix 4 Radix 4 Radix 4
x16
in3 out3
… … …
in4 out4
Radix 4 Radix 4 Radix 4
… …
in63 out63

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-21
Radix-4 Node
k0 * + + out0

twid0
k1
* - - out1

twid1
k2 * + + out2

twid2
k3 * - *j - out3

twid3
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-22
Bluespec code: Radix-4 Node
function Tuple4#(Complex, Complex, Complex, Complex)
radix4(Tuple4#(Complex, Complex, Complex, Complex) twids,
Complex k0, Complex k1, Complex k2, Complex k3);

match {.t0, .t1, .t2, .t3} = twids;


Complex m0 = k0 * t0; Complex m1 = k1 * t1;
Complex m2 = k2 * t2; Complex m3 = k3 * t3;

Complex y0 = m0 + m2; Complex y1 = m0 - m2;


Complex y2 = m1 + m3; Complex y3 = m1 - m3;

Complex y3_j = Complex {i: negate(y3.q), q: y3.i};

Complex z0 = y0 + y2; Complex z1 = y1 - y3_j;


Complex z2 = y0 - y2; Complex z3 = y1 - y3_j;

return tuple4(z0, z1, z2, z3);

endfunction
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-23
Bluespec code for pure
Combinational Circuit
function SVector#(64, Complex) ifft (SVector#(64, Complex) in_data);
//Declare vectors
SVector#(64, Complex) stage12_data = newSVector();
SVector#(64, Complex) stage12_permuted = newSVector();
SVector#(64, Complex) stage12_out = newSVector();
SVector#(64, Complex) stage23_data = newSVector();

//Radix 4 stage 1 (unpermuted)
for (Integer i = 0; i < 16; i = i + 1)
begin
Integer idx = i * 4;
let twid0 = getTwiddle(0, fromInteger(i));
match {.y0, .y1, .y2, .y3} = radix4(twid0,
in_data[idx], in_data[idx + 1],
in_data[idx + 2], in_data[idx + 3]);
stage12_data[idx] = y0; stage12_data[idx + 1] = y1;
stage12_data[idx + 2] = y2; stage12_data[idx + 3] = y3;
end
//Stage 1 permutation
for (Integer i = 0; i < 64; i = i + 1)
stage12_permuted[i] = stage12_data[permute_1to2[i]];
//Continued on next slide…

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-24
Bluespec code for pure
Combinational Circuit continued
// (* continued from previous *)
stage12_out = stage12_permuted; //Later implementations will change this
//Radix 4 stage 2 (unpermuted)
for (Integer i = 0; i < 16; i = i + 1)
begin
Integer idx = i * 4;
let twid1 = getTwiddle(1, fromInteger(i));
match {.y0, .y1, .y2, .y3} = radix4(twid1,
stage12_out[idx], stage12_out[idx + 1],
stage12_out[idx + 2], stage12_out[idx + 3]);
stage23_data[idx] = y0; stage23_data[idx + 1] = y1;
stage23_data[idx + 2] = y2; stage23_data[idx + 3] = y3;
end
//Stage 2 permutation
for (Integer i = 0; i < 64; i = i + 1)
stage23_permuted[i] = stage23_data[permute64_2to3[i]];

//Repeat for Stage 3

return stage3out_permuted;
endfunction

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-25
Pipelined IFFT
in0 out0
in1 Radix 4 Permute_1 Radix 4 Radix 4 out1

Permute_2

Permute_3
in2 out2
Radix 4 Radix 4 Radix 4
x16
in3 out3
… … …
in4 out4
Radix 4 Radix 4 Radix 4
… …
in63 out63

Put a register to hold 64 complex numbers at the output of


each stage.
Even more hardware but clock can go faster – less
combinational circuitry between two stages
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-26
Bluespec code for Pipeline
Stage
module mkIFFT_Pipelined() (I_IFFT);
//Declare vectors
SVector#(64, Complex) in_data;
SVector#(64, Complex) stage12_data = newSVector();

//Declare FIFOs
FIFO#(SVector#(64, Complex)) in_fifo <- mkFIFO();
//Declare pipeline registers
Reg#(SVector#(64, Complex)) stage12_reg <- mkReg(newSVector());
Reg#(SVector#(64, Complex)) stage23_reg <- mkReg(newSVector());
//Read input
in_data = in_fifo.first();
//Radix 4 stage 1 (unpermuted)
for (Integer i = 0; i < 16; i = i + 1)
begin
Integer idx = i * 4;
w let twid0 = getTwiddle(0, fromInteger(i));
w match {.y0, .y1, .y2, .y3} = radix4(twid0,
in_data[idx], in_data[idx + 1],
//Continue as before…

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-27
Bluespec code for Pipeline
Stage

//Read from pipe register for stage 2
stage12_out = stage12_reg;

//Radix 4 stage 2 (unpermuted)


for (Integer i = 0; i < 16; i = i + 1)

//Read from pipe register for stage 3


stage23_out = stage23_reg;

rule writeRegs (True);


stage12_reg <= stage12_permuted;
stage23_reg <= stage23_permuted;
in_fifo.deq(); out_fifo.enq(stage3out_permuted);
endrule

method Action inp (Vector#(64, Complex) data);


in_fifo.enq(data);
endmethod

endmodule

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-28
Circular pipeline: Reusing the
Pipeline Stage
in0 out0
Radix 4
in1 out1

Permute_1
in2 …

64, 4-way
out2

Muxes
in3 Radix 4 out3
in4 out4

Permute_2

in63 out63

Stage
16 Radix 4s can be Counter
shared but not the three
Permute_3
permutations. Hence the
need for muxes

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-29
Bluespec Code for Circular
Pipeline
module mkIFFT_Circular (I_IFFT);
SVector#(64, Complex) in_data = newSVector();
SVector#(64, Complex) stage_data = newSVector();
SVector#(64, Complex) stage_permuted = newSVector();
//State elements
Reg#(SVector#(64, Complex)) data_reg <- mkReg(newSVector());
Reg#(Bit#(2)) stage_counter <- mkReg(0);
FIFO#(SVector#(64, Complex)) in_fifo <- mkFIFO();
//Read input
in_data = data_reg;
//Perform a single Radix 4 stage (unpermuted)
for (Integer i = 0; i < 16; i = i + 1)
begin
Integer idx = i * 4;
let twid = getTwiddle(stage_counter, fromInteger(i));
match {.y0, .y1, .y2, .y3} = radix4(twid,
in_data[idx], in_data[idx + 1],
in_data[idx + 2], in_data[idx + 3]);
stage_data[idx] = y0; stage_data[idx + 1] = y1;
stage_data[idx + 2] = y2; stage_data[idx + 3] = y3;
end
//Continued…
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-30
Bluespec Code for Circular
Pipeline
//Stage permutation
for (Integer i = 0; i < 64; i = i + 1)
stage_permuted[i] = case (stage_counter)
0: return in_wire._read[i];
1: return stage_data[permute64_1to2[i]];
2: return stage_data[permute64_2to3[i]];
3: return stage_data[permute64_3toOut[i]];
endcase;

rule writeRegs (True);


data_reg <= stage_permuted;
stage_counter <= stage_counter + 1;
endrule

method Action inp(SVector#(64, Complex) data) if (stage_counter == 0);


in_fifo.enq(data);
stage_counter <= 1;
endmethod

endmodule

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-31
Just one Radix-4 node!
in0 out0

in1
4, 16-way
Radix 4 out1

Permute_1
Muxes

in2

64, 4-way
out2

Muxes
in3 out3
in4 out4

4, 16-way

Permute_2

Index DeMuxes
in63 Counter out63
0 to 15
Stage
Counter
The two stage 0 to 2

Permute_3
registers can be
folded into one

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-32
Bluespec Code for Extreme
reuse
module mkIFFT_SuperCircular (I_IFFT);
SVector#(64, Complex) in_data = newSVector();
SVector#(64, Complex) stage_data = newSVector();
SVector#(64, Complex) permutedV = newSVector();
//State
Reg#(SVector#(64, Complex)) data_reg <- mkReg(newSVector());
Reg#(SVector#(64, Complex)) post_reg <- mkReg(newSVector());
Reg#(Bit#(2)) stage_counter <- mkReg(0);
Reg#(Bit#(5)) idx_counter <- mkReg(16);
FIFO#(SVector#(64, Complex)) in_fifo <- mkFIFO();
//Read input
in_data = data_reg;
//Do one-sixteenth of a Radix 4 stage (unpermuted)
Bit#(6) idx = {idx_counter, 2'b00}; //idx = idx_counter * 4
//Use DYNAMIC select and update of the Vector
let twid = getTwiddle(stage_counter, idx_counter);
match {.y0, .y1, .y2, .y3} = radix4(twid,
select(in_data, idx), select(in_data, idx + 1),
select(in_data, idx + 2), select(in_data, idx + 3));
//generates post_reg after writing in the 4 new values
let stage_data0 = post_reg;
let stage_data1 = update(stage_data, idx, y0);
let stage_data2 = update(stage_data1,idx + 1, y1);
let stage_data3 = update(stage_data2,idx + 2, y2);
stage_data = update(stage_data3,idx + 3, y3);
//Continued…
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-33
Bluespec Code for Extreme
reuse-2
//Permutation is based on the current stage
for (Integer i = 0; i < 64; i = i + 1)
permutedV[i] = case (stage_counter)
1: return post_reg[permute64_1to2[i]];
2: return post_reg[permute64_2to3[i]]
3: return
post_reg[permute64_3toOut[i]];
default: return in_fifo.first()[i];
endcase;

rule writeRegs (stage_counter != 0);


post_reg <= stage_data;
if (idx == 16)
data_reg <= permutedV;
idx_counter <= (idx_counter == 16) ? 0: idx_counter + 1;
if (idx_counter == 16)
stage_counter <= stage_counter + 1;
endrule
//Everything else as before…

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
Downloaded on [DD Month YYYY].
March 1, 2006 L-34
Synthesis results
Nirav Dave & Mike Pellauer
Design Area (mm2) CLK Period Throughput Latency
(1 symbol)
Comb. 1.03 15 ns 15ns 15 ns

Pipelined 1.46 7 ns 7 ns 21 ns

Circular 0.83 8 ns 24 ns 24 ns

1 Radix 0.23 8 ns 408 ns 408 ns

TSMC .13 micron; numbers reported are before place and route.

Single radix-4 node design is ¼ the size of combination design


but still meets the throughput requirement easily; clock can
reduced to 15 to 20 Mhz
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-35
Synthesis results
Steve Gerding, Elizabeth Basha & Rose Liu

Design Area (mm2) CLK Period Throughput Latency


(1 symbol)
Comb. 29.12 63 ns 63 ns 63 ns

Circular- 5.19 30 ns 90 ns 180 ns


2stages
Circular 4.57 33 ns 99 ns 99 ns

TSMC .13 micron; numbers reported are after place and route

Two stage circular pipeline design is not good.


Circular pipeline design can meet the throughput requirement
at 750KHz!

Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-36
Two circular pipelines
InputDataQ OutputDataQ
Data and
16-Node
Start Twiddle
Stage
Setup

InputDataQ OutputDataQ
Data and
16-Node
Twiddle
Stage
Setup

Steve Gerding, Elizabeth Basha & Rose Liu


Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-37

Das könnte Ihnen auch gefallen