Sie sind auf Seite 1von 40

Reconfigurable Computing

ES ZG554 / MEL ZG 554


Session 2 Pawan Sharma
BITS Pilani ps@pilani.bits-pilani.ac.in
Pilani Campus 22/01/2020
Last Lecture

• Course Overview
• Module 1: Introduction to Reconfigurable
Computing
– General Purpose Computing [T1. Sec 1]
– Domain and Application specific processors [T1. Sec 2
& 3]

2
BITS Pilani, Pilani Campus
Today’s Lecture

Module 1: Introduction to Reconfigurable Computing


• Reconfigurable Computing [T1. Sec 4]
• Fields of Application [T1. Sec 5.1 to 5.4]
Module 2: Reconfigurable Computing Hardware
An overview of programmable logic devices
• PROM
• PLA
• PLA
• CPLD

3
BITS Pilani, Pilani Campus
When to use RC?

RC devices enable design of digital circuits without fabricating a device


 Therefore, RC can be used anytime a digital circuit is needed

 Examples: ASIC prototyping, ASIC replacement,

replacing/accelerating microprocessors
 But, when should RC be used instead of alternative technologies?

Implementation Possibilities

Microprocessor RC (FPGA,CPLD, etc.) ASIC

Performance

Why not use an ASIC for everything?

BITS Pilani, Pilani Campus


When to use RC?

1. When it provides the cheapest solution


• Depends on:
• NRE Cost - Non-recurring engineering cost: Cost involved with
designing application
• Unit cost - cost of a manufacturing/purchasing a single system
• Volume - # of units
• Total cost = NRE + unit cost * volume
• RC is typically more cost effective for low volume applications
• RC: low NRE, high unit cost
• ASIC: very high NRE, low unit cost

BITS Pilani, Pilani Campus


2. When time to market is critical
– Huge effect on total revenue
3. When circuit may have to be modified
– Can’t change ASIC - hardware
– Can change circuit implemented in FPGA
Uses
– When standards change
• Codec changes after devices fabricated
– Allows addition of new features to existing devices
– Fault tolerance/recovery
– “Partial reconfiguration” allows virtual device with arbitrary size - analogous to
virtual memory
Without RC
– Anything that may have to be reconfigured is implemented in software
• Performance loss

BITS Pilani, Pilani Campus


Limitations of RC

Embedded Applications – Large Speedups Desktop Applications – No Speedup

15 15
14 14
13 13
12 12
11 11
10 10

Speedup
9
Speedup

9 8
8 7
7 6
6 5
5 4
4 3
3 2
2 1
1 0
0

1) Not all applications can be improved


2) Tools need serious improvement!
3) Design strategies are often ad hoc
4) Floating point?
– Requires a lot of area, but performance is becoming
competitive with other devices
• Already superior in terms of energy

BITS Pilani, Pilani Campus


Reconfigurable Computing
 The Ideal device should combine:
 the flexibility of the Von Neumann computer
 the efficiency of ASICs

 The ideal device should be able to


 Optimally implement an application at a given time
 Re-adapt to allow the optimal implementation of a new application.

BITS Pilani, Pilani Campus


Reconfigurable Computing
• Reconfigurable computing (RC) is the study of architectures that can
adapt (after fabrication) to a specific application or application domain
– Involves architecture, tools, CAD, design automation, algorithms,
languages, etc.
• Reconfigurable computing can be defined as the study of computations
involving reconfigurable devices.
• Spatial structure of the device is modified such as to use the best
computing approach to speed up that application
• For an application, the device structure will be modified again to
match the new application

BITS Pilani, Pilani Campus


Reconfigurable Computing
• Alternatively, RC is a way of implementing circuits without fabricating a device
• Essentially allows circuits to be implemented as “software”. Circuits are no longer
synonymous with hardware
• RC devices are programmable by downloading bits, just like microprocessors
• Difference is that microprocessor bits specify instructions, whereas RC bits
specify circuit structures
a b
Microprocessor FPGA Binaries
Binaries (Bitfile)
x c
001010010 001010010

Bits loaded into Bits loaded into logic blocks,


switch matrices, memories, etc. y
program memory

0010 0010
… Processor
Processor … FPGA
Processor

BITS Pilani, Pilani Campus


Some Fields of Application
 Rapid prototyping
 Post fabrication customization
 Multi-modal computing tasks
 Adaptive computing systems
 Fault tolerance
 High performance parallel computing

11

BITS Pilani, Pilani Campus


Rapid prototyping
• Testing hardware in real conditions
before fabrication
• Software simulation
• Relatively inexpensive
• Slow
• Accuracy ? APTIX System Explorer

• Hardware emulation
• Hardware testing under real
operation conditions
• Fast
• Accurate
• Allow several iterations
ITALTEL
FLEXBENCH

12

BITS Pilani, Pilani Campus


In-System customization
 Time to market advantage Manufacturer
o Ship the first version of a product
o Remote upgrading with new product
versions
o Remote repairing
• Mars rover vehicle (Mars Pathfinder launched
4th July 1997)

13

BITS Pilani, Pilani Campus


Multi-modal computation
Systems that handle many different
types of inputs. Control units
handles in time multiplexed manner.
• mobile phones..
service request
• Built-in Digital Camera Video
phone service Configuration

• Games,
• Internet Navigation system,
• Emergency Diagnostics
• Different standard protocols
• Monitoring
• Entertainment
14

BITS Pilani, Pilani Campus


Adaptive computing systems
• Computing systems that are able to
adapt their behavior and structure to
changing operating and
environmental conditions, time-
varying optimization objectives, and
physical constraints like changing
• protocols, new standards, or
dynamically changing operation
conditions of technical systems.
• Dynamic adaptation to environment
and threats for extended mission
capabilities
15

BITS Pilani, Pilani Campus


Fault Tolerance

• FPGAs are sensitive to SEU (Single event upset) and SET (Single
event transients) since the configuration memory of the chip can
be affected, resulting in permanent error, due to electromagnetic
noise and radiation and particularly in space applications, cosmic
rays can hit silicon-surfaces causing high-density electron-hole
pairs which may lead to transient errors
• Requires duplication or triplication of resources for combinational
logic and parity check for on-chip caches
• Triple Modular Redundancy (TMR) with a voter circuit is common
approach. Three identical hardware modules perform their
operations in parallel and their output is voted.

BITS Pilani, Pilani Campus


Conclusion

• Evolving paradigm
• RC requires more computing resources, area*power*time
products compared to ASIC
• But offers faster execution times, better power/performance ratio
• Fault tolerance
• Run time reconfiguration
• Adaptive

BITS Pilani, Pilani Campus


Programmable Logic Devices

• Before PLD:
– Digital circuits available as SSI, MSI devices
– Logic determined at time of manufacture
– Cant be changed later and large volume fabrication
– Shelves of document for all devices
– Did not meet designer’s requirements for his/her exact specifications
– Forced to use multiple devices to meet requirements
• After PLDs:
– Device supplied with no logic function programmed in device
– Quick design creation
– Allows designer to program PLD in whatever way the design requires
– Meets exact designers' specifications
– Multiple functions can be combined and programmed onto single chip – lesser board space required
– In system programmable – need not remove device from board for changing program
– No worry for device obsolescence
– But requires usage of specific tools and understanding of hardware architecture before programming

BITS Pilani, Pilani Campus


Idea: Memory as Programmable Logic

0
X A1 1
00
Y A0 01 1
10
11
0
D0
X xor Y
Address lines as inputs

Data line as output

Truth table is the content

Form minterms using AND gates and then OR the appropriate minterms for formation of the output

Circuit requires four 2-input AND gates and one OR gate that can take up-to four inputs.

19
BITS Pilani, Pilani Campus
array of AND gates – AND plane

array of OR gates – OR plane

In the AND-plane all eight minterms for the three inputs, a, b, and c are generated. AND and OR Planes
The OR plane uses only the minterms that are needed for the outputs of the circuit.

Not all generated minterms may be used.minterm 7 that is generated in the AND-plane but not used in the OR plane.
20
BITS Pilani, Pilani Campus
• AND and OR gates technologies use more area and
delay compared to NAND and NOR implementations.
• Although NOR gates are used, the left plane is still
called the AND-plane and the right plane is called the
OR-plane
• Hardware implementation with large fan-in and
routing becomes difficult.
• Take for example, a circuit with 16 inputs, which is
very usual for combinational circuits. Such a circuit has
64k (216) minterms.
• In the AND-plane, wires from circuit inputs must be
routed to over 64,000 NOR gates.
• In the OR-plane, the NOR gates must be large enough
for every minterm of the function (over 64,000
All NOR Implementation minterms) to reach their inputs.
• Such an implementation is very slow because of long
lines, and takes too much space because of the
requirement of large gates.

21
BITS Pilani, Pilani Campus
Distributed NOR of the AND-plane Distributed NOR Gate of Output y

• The solution is to distribute gates along array rows and columns.


• In the AND-plane, instead of having a clustered NOR gate for all inputs to reach to, the NOR gate is distributed along the rows of the
array.
• In left figure, the NOR gate that implements minterm 3 is highlighted. Distributed transistor level logic of this NOR gate is shown.
Below this figure, a symbolic representation is shown
• Likewise, in the OR-plane, instead of having large NOR gates for the outputs of the circuit, transistors of output NOR gates are
distributed along the corresponding output columns.
• Figure on the right shows the distributed NOR structure of the y output of circuit. A symbolic representation of this structure is also
shown on the right.
• In each case, connections are made on the inputs of the gate. For the AND-plane, the inputs of the AND gate are a, b, and c forming
minterm 3, and for the OR gate, the inputs of the gate are m2, m5 and m6.
22
BITS Pilani, Pilani Campus
• In this implementation, independent
of our outputs, we have generated all
minterms of the three inputs.
• For any other functions other than w,
x, y and z, we would still generate the
same minterms, but use them
differently.
• Hence, the AND-plane with which the
minterms are generated can be wired
independent of the functions realized.
• On the contrary, the OR-plane can
only be known when the output
functions have been determined.
• In other words, we want a fixed AND-
plane and a programmable (or
configurable) OR-plane

23
BITS Pilani, Pilani Campus
• Transistors for the implementation of
minterms in the AND-plane are fixed, but
in the OR-plane there are fusible
transistors on every output column for
every minterm of the AND-plane.
• For realization of a certain function on an
output of this array, transistors
corresponding to the used minterms are
kept, and the rest are blown to eliminate
contribution of the minterm to the output
function.
• For example, for output y, only transistors
on rows m2, m5, and m6 are connected
and the rest are fused off.
• The dots in the AND-plane indicate
permanent connections, and the crosses in
the OR-plane indicate programmable or
configurable connections

24
BITS Pilani, Pilani Campus
Memory View
PROM
• If we consider abc as the address inputs and wxyz as the data
read from abc designated address, then the circuit can be
regarded as a memory with an address space of 8 words and
data of four bits wide.
• In this case, the fixed AND-plane becomes the memory
decoder, and the programmable OR-plane becomes the
memory array.
• Because this memory can only be read from and not easily
written into, it is referred to as Read Only Memory or ROM. The
basic ROM is a one-time programmable logic array.
• Programmable ROM is a one-time programmable chip that,
once programmed, cannot be erased or altered.
• In a PROM, all minterms in the AND-plane are generated, and
connections of all AND-plane outputs to OR-plane gate inputs
are in place.
• By applying a high voltage, transistors in the OR-plane that
correspond to the minterms that are not needed for a certain
output are burned out.
• A fresh PROM has all transistors in its OR-plane connected.
When programmed, some will be fused out permanently.

25
BITS Pilani, Pilani Campus
Simple Programmable Logic Devices

• Implement 2 level logic circuits (AND/OR)


• Based on regular array structure
• Read Only Memories (ROMs and PROMs)
• Programmable Logic Array (PLA)
• Programmable Array Logic (PAL)

26

BITS Pilani, Pilani Campus


Classifying Three Basic PLDs

Fixed AND plane Programmable


INPUT (decoder) OR plane
OUTPUT

(Programmable) Read-Only Memory (ROM)

Programmable Programmable
AND plane OR plane
INPUT OUTPUT

Programmable Logic Array (PLA)

Programmable Fixed F/F


INPUT AND plane OR plane
OUTPUT
Programmable Array Logic (PAL) Devices

BITS Pilani, Pilani Campus


Example: Lookup Table
 Design a square lookup table for F(X) = X2 using ROM

X F(X)=X2 X F(X)=X2

0 0 000 000000

1 1 001 000001

2 4 010 000100

3 9 011 001001

4 16 100 010000

5 25 101 011001

6 36 110 100100

7 49 111 110001

28

BITS Pilani, Pilani Campus


Square Lookup Table using ROM

X F(X)=X2 1
X2 3-to-8 2
000 000000
3
001 000001 X1
010 000100 Decoder 4
X0
011 001001 5
6
100 010000
7
101 011001
110 100100
111 110001
F5 F4 F3 F2 F1 F0

29

BITS Pilani, Pilani Campus


Square Lookup Table using ROM

X F(X)=X2 1
X2 3-to-8 2
000 000000
3
001 000001 X1
010 000100 Decoder 4
X0
011 001001 5
6
100 010000
7
101 011001
110 100100
111 110001
F5 F4 F3 F2 F1 F0
Not Used = X0

30

BITS Pilani, Pilani Campus


Square Lookup Table using ROM

X F(X)=X2
0
000 000000
1
001 000001 X2 3-to-8 2
010 000100 3
X1
011 001001 Decoder 4
X0
100 010000 5
6
101 011001 7
110 100100

111 110001

F5 F4 F3 F2 F1 F0

31

BITS Pilani, Pilani Campus


Programmable Logic Array (PLA)

Programmable
A
OR Plane
• This is a 3 x 4 x 2 PLA (3
B inputs, up to 4 product
terms, and 2 outputs), ready
C to be programmed.
• The left part of the diagram
replaces the decoder used in
a ROM.
• Connections can be made in
the “AND array” to produce
four arbitrary products,
Programmable instead of 8 minterms as
AND Plane with a ROM.
• Those products can then
be summed together in the
C “OR array.”

B F2
32

B BITS Pilani, Pilani Campus


Regular k-map minimization

33

BITS Pilani, Pilani Campus


PLA minimization

20

BITS Pilani, Pilani Campus


PLA Example

35

BITS Pilani, Pilani Campus


Programmable Array Logic

• The PAL is the opposite of the ROM, having a programmable set of ANDs
combined with fixed ORs.
• Disadvantage
– ROM guaranteed to implement any M functions of N
inputs. PAL may have too few inputs to the OR gates.
• Advantages
– For given internal complexity, a PAL can have larger N and M
– Some PALs have outputs that can be complemented, adding POS
functions
– No multilevel circuit implementations in ROM (without external
connections from output to input).
– PAL has outputs from OR terms as internal inputs to all AND
terms, making implementation of multi-level circuits easier.

BITS Pilani, Pilani Campus


AND gates inputs
0 1 2 3 4 5 6 7 8 9
X
4-input, 3-output PAL with fixed, 3-input OR terms Product1
term
X X
What are the equations for F1 through F4? 2 F1

F1 = A’B’ + C’ A
XX X
4
X X
5 F2
F2 = A’BC’ + AC + AB X X
6
B
F3 = AD + BD + F1 = AD + BD + A’B’+ C’ X X
7
= AD + BD + A’B’ + C’ 8
X X
F3
X
9
F4 = AB + CD + F1’ = AB + CD + (A’B’ + C’)’ C
X
= AB + CD + AC + BC
X
10
X X
11 F4
X
12
D
0 1 2 3 4 5 6 7 8 9

BITS Pilani, Pilani Campus


Complex Programmable Logic Devices

• Complex PLDs (CPLD) typically combine PAL


combinational logic with Flip Flops
• Organized into logic blocks connected in an interconnect
matrix
• Combinational or registered output
• Usually enough logic for simple counters, state machines,
decoders, etc.
• CPLDs logic is not enough for complex operation
• FPGAs have much more logic than CPLDs
– e.g. Xilinx Coolrunner II, Altera MAX series etc.

38

BITS Pilani, Pilani Campus


Complex Programmable Logic Devices

• A CPLD consists of a set of macro cells,


input/output blocks and an
interconnection network.
• The connection between the input/output
blocks and the macro cells and those
between macro cells and macro cells can
be made through the programmable
interconnection network.
• A macro cell typically contains several PLAs
and flip flops.
• Despite their relative large capacity (few
hundreds thousands of logic gates),
compared to those of PLAs, CPLDs are still
too small for using in reconfigurable
computing devices. They are usually used
as glue logic, or to implement small
functions.
• Because of their non volatility, CPLDs are
used in many systems for configuration of
the main reconfigurable device at start up.

39

BITS Pilani, Pilani Campus


CPLD

BITS Pilani, Pilani Campus

Das könnte Ihnen auch gefallen