Beruflich Dokumente
Kultur Dokumente
Bluespec
Arvind
Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-1
Chip design has become
too risky a business
Ever increasing size and complexity
Microprocessors: 100M gates ⇒ 1000M gates
ASICs: 5M to 10M gates ⇒ 50M to 100M gates
Ever increasing costs and design team sizes
> $10M for a 10M gate ASIC
> $1M per re-spin in case of an error (does not
include the redesign costs, which can be substantial)
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-2
Typical SOC Architecture
For example: Cell phone
Hardware/software
development needs to
be tightly coupled in
order to meet
performance/power/
cost goals
System validation for
functionality and
performance is very
difficult
Stable platform for
software development
IP block reuse is
essential to mitigate
development costs
IP = Intellectual Property
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
Downloaded on [DD Month YYYY].
March 1, 2006 L-3
IP re-use sounds great until
you start to use it...
data_in data_out
clk
uc h rstn
o f s le
i o n asib
cat f e
i fi s
ver ints i
i ne tra
ach o n s
o m al c
N orm
inf
These constraints are spread over many pages of
the documentation...
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-4
New semantics for expressing behavior
to reduce design complexity
Decentralize complexity: Rule-based
specifications (Guarded Atomic Actions)
Let us think about one rule at a time
Formalize composition: Modules with
guarded interfaces
Automatically manage and ensure the
correctness of connectivity, i.e., correct-by-
construction methodology
Retain resilience to changes in design or
layout, e.g. compute latency ∆’s
Promote regularity of layout at macro level
Bluespec
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-5
Bluespec promotes composition
through guarded interfaces
Self-documenting
theModuleA interfaces;
Automatic generation
theFifo.enq(value1); Enqueue
arbitration
of logic to eliminate
control conflicts in use.
theFifo.deq();
value2 = theFifo.first(); theFifo
n
enq
enab
not full
rdy
theModuleB
deq
enab FIFO
not empty
rdy
n
first
theFifo.enq(value3);
not empty
rdy
Dequeue
arbitration
theFifo.deq(); control
value4 = theFifo.first();
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-6
In Bluespec SystemVerilog (BSV) …
Power to express complex static
structures and constraints
Checked by the compiler
“Micro-protocols” are managed by the
compiler
The compiler generates the necessary
hardware (muxing and control)
Micro-protocols need less or no verification
Easier to make changes while
preserving correctness
interface
swap
y
sub
start
enab could be
module
y == 0 rdy Int#(32),
GCD
implicit UInt#(16),
conditions t Int#(13), ...
int
result
rdy
y == 0
#(type t)
interface I_GCD;
t t
method Action start (int a, int b);
method intt result();
endinterface
The module can easily be made polymorphic
Many different implementations can provide the same
interface: module mkGCD (I_GCD)
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-11
Bluespec Tool flow
Bluespec SystemVerilog source
Bluespec Compiler
Legend
Debussy
files
files
Visualization
Bluespec tools
Bluespec tools C ite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
33rdrd party
party tools
tools Downloaded on [DD Month YYYY].
March 1, 2006 L-12
Generated Verilog RTL: GCD
module mkGCD(CLK,RST_N,start_a,start_b,EN_start,RDY_start,
result,RDY_result);
input CLK; input RST_N;
// action method start
input [31 : 0] start_a; input [31 : 0] start_b; input EN_start;
output RDY_start;
// value method result
output [31 : 0] result; output RDY_result;
// register x and y
reg [31 : 0] x;
wire [31 : 0] x$D_IN; wire x$EN;
reg [31 : 0] y;
wire [31 : 0] y$D_IN; wire y$EN;
...
// rule RL_subtract
assign WILL_FIRE_RL_subtract = x_SLE_y___d3 && !y_EQ_0___d10 ;
// rule RL_swap
assign WILL_FIRE_RL_swap = !x_SLE_y___d3 && !y_EQ_0___d10 ;
...
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
Downloaded on [DD Month YYYY].
March 1, 2006 L-13
Generated Hardware
x
start
y
wen
rdy x_en y_en y
x
x predicates
rdy
swap? subtract?
x_en = swap?
y_en = swap? OR subtract?
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
Downloaded on [DD Month YYYY].
March 1, 2006 L-14
Generated Hardware Module
x
start
y
wen start_en start_en
rdy x_en y_en
x y
x
rdy
swap? subtract?
Transmitter Analog
TX MAC
TX
Channel
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-16
Transmitter Overview
headers
Interleaver Mapper
Cyclic
IFFT
Extend
Detector /
Deinterleaver
Viterbi Controller
Descrambler
compute intensive
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-18
IFFT Requirements
802.11a needs to process a symbol in 4 µsec
(250KHz)
IFFT must output a symbol every 4 µsec
i.e. perform an Inverse FFT of 64 complex numbers
Each module before IFFT must process every 4 µsec
1 frame for 6Mbps rate
2 frames for 12Mbps rate
4 frames for 24Mbps rate
Even in the worst case (24Mbps) the clock frequency
can be as low as 1Mhz.
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-19
Area-Frequency Tradeoff
We can decrease the area by multiplexing some
circuits and running the system at a higher frequency
Reuse
Twice the frequency
but half the area
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-20
Combinational IFFT
in0 out0
in1 Radix 4 Radix 4 Radix 4 out1
Permute_1
Permute_2
Permute_3
in2 out2
Radix 4 Radix 4 Radix 4
x16
in3 out3
… … …
in4 out4
Radix 4 Radix 4 Radix 4
… …
in63 out63
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-21
Radix-4 Node
k0 * + + out0
twid0
k1
* - - out1
twid1
k2 * + + out2
twid2
k3 * - *j - out3
twid3
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-22
Bluespec code: Radix-4 Node
function Tuple4#(Complex, Complex, Complex, Complex)
radix4(Tuple4#(Complex, Complex, Complex, Complex) twids,
Complex k0, Complex k1, Complex k2, Complex k3);
endfunction
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-23
Bluespec code for pure
Combinational Circuit
function SVector#(64, Complex) ifft (SVector#(64, Complex) in_data);
//Declare vectors
SVector#(64, Complex) stage12_data = newSVector();
SVector#(64, Complex) stage12_permuted = newSVector();
SVector#(64, Complex) stage12_out = newSVector();
SVector#(64, Complex) stage23_data = newSVector();
…
//Radix 4 stage 1 (unpermuted)
for (Integer i = 0; i < 16; i = i + 1)
begin
Integer idx = i * 4;
let twid0 = getTwiddle(0, fromInteger(i));
match {.y0, .y1, .y2, .y3} = radix4(twid0,
in_data[idx], in_data[idx + 1],
in_data[idx + 2], in_data[idx + 3]);
stage12_data[idx] = y0; stage12_data[idx + 1] = y1;
stage12_data[idx + 2] = y2; stage12_data[idx + 3] = y3;
end
//Stage 1 permutation
for (Integer i = 0; i < 64; i = i + 1)
stage12_permuted[i] = stage12_data[permute_1to2[i]];
//Continued on next slide…
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-24
Bluespec code for pure
Combinational Circuit continued
// (* continued from previous *)
stage12_out = stage12_permuted; //Later implementations will change this
//Radix 4 stage 2 (unpermuted)
for (Integer i = 0; i < 16; i = i + 1)
begin
Integer idx = i * 4;
let twid1 = getTwiddle(1, fromInteger(i));
match {.y0, .y1, .y2, .y3} = radix4(twid1,
stage12_out[idx], stage12_out[idx + 1],
stage12_out[idx + 2], stage12_out[idx + 3]);
stage23_data[idx] = y0; stage23_data[idx + 1] = y1;
stage23_data[idx + 2] = y2; stage23_data[idx + 3] = y3;
end
//Stage 2 permutation
for (Integer i = 0; i < 64; i = i + 1)
stage23_permuted[i] = stage23_data[permute64_2to3[i]];
…
//Repeat for Stage 3
…
return stage3out_permuted;
endfunction
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-25
Pipelined IFFT
in0 out0
in1 Radix 4 Permute_1 Radix 4 Radix 4 out1
Permute_2
Permute_3
in2 out2
Radix 4 Radix 4 Radix 4
x16
in3 out3
… … …
in4 out4
Radix 4 Radix 4 Radix 4
… …
in63 out63
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-27
Bluespec code for Pipeline
Stage
…
//Read from pipe register for stage 2
stage12_out = stage12_reg;
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-28
Circular pipeline: Reusing the
Pipeline Stage
in0 out0
Radix 4
in1 out1
Permute_1
in2 …
64, 4-way
out2
Muxes
in3 Radix 4 out3
in4 out4
Permute_2
…
in63 out63
Stage
16 Radix 4s can be Counter
shared but not the three
Permute_3
permutations. Hence the
need for muxes
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-29
Bluespec Code for Circular
Pipeline
module mkIFFT_Circular (I_IFFT);
SVector#(64, Complex) in_data = newSVector();
SVector#(64, Complex) stage_data = newSVector();
SVector#(64, Complex) stage_permuted = newSVector();
//State elements
Reg#(SVector#(64, Complex)) data_reg <- mkReg(newSVector());
Reg#(Bit#(2)) stage_counter <- mkReg(0);
FIFO#(SVector#(64, Complex)) in_fifo <- mkFIFO();
//Read input
in_data = data_reg;
//Perform a single Radix 4 stage (unpermuted)
for (Integer i = 0; i < 16; i = i + 1)
begin
Integer idx = i * 4;
let twid = getTwiddle(stage_counter, fromInteger(i));
match {.y0, .y1, .y2, .y3} = radix4(twid,
in_data[idx], in_data[idx + 1],
in_data[idx + 2], in_data[idx + 3]);
stage_data[idx] = y0; stage_data[idx + 1] = y1;
stage_data[idx + 2] = y2; stage_data[idx + 3] = y3;
end
//Continued…
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-30
Bluespec Code for Circular
Pipeline
//Stage permutation
for (Integer i = 0; i < 64; i = i + 1)
stage_permuted[i] = case (stage_counter)
0: return in_wire._read[i];
1: return stage_data[permute64_1to2[i]];
2: return stage_data[permute64_2to3[i]];
3: return stage_data[permute64_3toOut[i]];
endcase;
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-31
Just one Radix-4 node!
in0 out0
in1
4, 16-way
Radix 4 out1
Permute_1
Muxes
in2
64, 4-way
out2
Muxes
in3 out3
in4 out4
4, 16-way
…
Permute_2
…
Index DeMuxes
in63 Counter out63
0 to 15
Stage
Counter
The two stage 0 to 2
Permute_3
registers can be
folded into one
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-32
Bluespec Code for Extreme
reuse
module mkIFFT_SuperCircular (I_IFFT);
SVector#(64, Complex) in_data = newSVector();
SVector#(64, Complex) stage_data = newSVector();
SVector#(64, Complex) permutedV = newSVector();
//State
Reg#(SVector#(64, Complex)) data_reg <- mkReg(newSVector());
Reg#(SVector#(64, Complex)) post_reg <- mkReg(newSVector());
Reg#(Bit#(2)) stage_counter <- mkReg(0);
Reg#(Bit#(5)) idx_counter <- mkReg(16);
FIFO#(SVector#(64, Complex)) in_fifo <- mkFIFO();
//Read input
in_data = data_reg;
//Do one-sixteenth of a Radix 4 stage (unpermuted)
Bit#(6) idx = {idx_counter, 2'b00}; //idx = idx_counter * 4
//Use DYNAMIC select and update of the Vector
let twid = getTwiddle(stage_counter, idx_counter);
match {.y0, .y1, .y2, .y3} = radix4(twid,
select(in_data, idx), select(in_data, idx + 1),
select(in_data, idx + 2), select(in_data, idx + 3));
//generates post_reg after writing in the 4 new values
let stage_data0 = post_reg;
let stage_data1 = update(stage_data, idx, y0);
let stage_data2 = update(stage_data1,idx + 1, y1);
let stage_data3 = update(stage_data2,idx + 2, y2);
stage_data = update(stage_data3,idx + 3, y3);
//Continued…
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-33
Bluespec Code for Extreme
reuse-2
//Permutation is based on the current stage
for (Integer i = 0; i < 64; i = i + 1)
permutedV[i] = case (stage_counter)
1: return post_reg[permute64_1to2[i]];
2: return post_reg[permute64_2to3[i]]
3: return
post_reg[permute64_3toOut[i]];
default: return in_fifo.first()[i];
endcase;
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
Downloaded on [DD Month YYYY].
March 1, 2006 L-34
Synthesis results
Nirav Dave & Mike Pellauer
Design Area (mm2) CLK Period Throughput Latency
(1 symbol)
Comb. 1.03 15 ns 15ns 15 ns
Pipelined 1.46 7 ns 7 ns 21 ns
Circular 0.83 8 ns 24 ns 24 ns
TSMC .13 micron; numbers reported are before place and route.
TSMC .13 micron; numbers reported are after place and route
Cite as: Vladimir Stojanovic, course materials for 6.973 Communication System Design, Spring 2006.
MIT OpenCourseWare (http://ocw.mit.edu/), Massachusetts Institute of Technology.
March 1, 2006 Downloaded on [DD Month YYYY]. L-36
Two circular pipelines
InputDataQ OutputDataQ
Data and
16-Node
Start Twiddle
Stage
Setup
InputDataQ OutputDataQ
Data and
16-Node
Twiddle
Stage
Setup