Sie sind auf Seite 1von 6

SEPT 2009 VOL 1.

VLSI JAGRITI AA M
Moonntthhllyy M
Maaggaazziinnee ffrroom
m JJBBTTeecchh IIN
NDDIIAA

IIN
NSSPPIIR
REEA
ASSPPIIR
REE
“Teachers open door, but you
must enter yourself”
“If you are not failing you are
doing nothing”
“If you have made mistakes, there
is always another chance for you.
You may have a fresh start any
moment you choose, for this thing
we call 'failure' is not the falling
down, but the staying down”

Tech Byte
Basics of FPGAs Design
PGAs offer all of the features needed to implement most

F
ield-programmable gate arrays (FPGAs) arrived in 1984 as

F an alternative to programmable logic devices (PLDs) and


ASICs. As their name implies, FPGAs offer the significant
benefit of being readily programmable. Unlike their fore
bearers in the PLD category, FPGA can (in most cases) be
programmed again and again, giving designers multiple
opportunities to tweak their circuits. There’s no large non-recurring
engineering (NRE) cost associated with FPGAs. In addition,
complex designs. Clock management is facilitated by on-
chip PLL (phase-locked loop) or DLL (delay-locked loop)
circuitry. Dedicated memory blocks can be configured as
basic single-port RAMs, ROMs, FIFOs, or CAMs. Data
Processing as embodied in the devices’ logic fabric, varies widely.
The ability to link the FPGA with backplanes, high-speed buses,
and memories is afforded by support for various single ended and
lengthy, nerve wracking waits for mask-making operations are differential I/O standards. Also found on today’s FPGAs are
squashed. Often, with FPGA development, logic design begins to system-building resources such as high speed serial I/Os,
resemble software design due to the many iterations of a given arithmetic modules, embedded processors, and large amounts of
design. Innovative design often happens with FPGAs as an memory.
implementation platform. But there are some downsides to FPGAs Initially seen as a vehicle for rapid prototyping and emulation
as well. The economics of FPGAs force designers to balance their systems, FPGAs have spread into a host of applications. They were
relatively high piece-part pricing compared to ASICs with the once too simple, and too costly, for anything but small-volume
absence of high NREs and long development cycles. They’re also production. Now, with the advent of much larger devices and
available only in fixed sizes, which matters when you’re determined declining per-part costs, FPGAs are finding their way off the
to avoid unused silicon area. prototyping bench and into production (Table 2).

What are FPGAs? Comparing FPGA Architectures


FPGAs fill a gap between discrete logic and the smaller PLDs on FPGAs must be programmed by users to connect the chip’s
the low end of the complexity scale and costly custom ASICs on the resources in the appropriate manner to implement the desired
high end. They consist of an array of logic blocks that are functionality. Over the years, various technologies have emerged to
configured using software. Programmable I/O blocks surround these suit different requirements. Some FPGAs can only be programmed
logic blocks. Both are connected by programmable interconnects once. These devices employ antifuse technology. Flash-based
(Fig. 1). The programming technology in an FPGA determines the devices can be programmed and reprogrammed again after
type of basic logic cell and the interconnect scheme. In turn, the debugging. Still others can be dynamically programmed thanks to
logic cells and interconnection scheme determine the design of the SRAM-based technology. Each has its advantages and
input and output circuits as well as the programming scheme. disadvantages (Table 3).
Just a few years ago, the largest FPGA was measured in Most modern FPGAs are based on SRAM configuration
tens of thousands of system gates and operated at 40 MHz. Older cells, which offer the benefit of unlimited re-programmability.
FPGAs often cost more than $150 for the most advanced parts at the When powered up, they can be configured to perform a given task,
time. Today, however, FPGAs offer millions of gates of logic such as a board or system test, and then reprogrammed to perform
capacity, operate at 300 MHz, can cost less than $10, and offer their main task. On the flip side, though, SRAM-based FPGAs
integrated functions like processors and memory (Table 1). must be reconfigured each time their host system is powered up,
and additional external circuitry is required to do so. Further,
because the configuration file used to program the FPGA is stored
Do’s And Dont’s For FPGA Designer
1. Do concentrate on IO Timing , not 1. Don’t synthesize unless you’ve
Do’S just the register-to-register internal frequency that
the FPGA place-and-route tools report. Frequently,
Dont’s fully and correctly constrained your
design. This includes correct clock
the hardest challenge in a complete FPGA design is the I/O timing. timing requirements, multicycle
domains,paths,
I/O and false paths. If your
Focus on how your signals enter and leave your FPGA, because synthesis tool doesn’t see exactly what you want, it can’t make
that’s where the bottlenecks frequently occur. decisions to optimize your design accordingly.
2. Do create hierarchy around vendor-specific structures and 2. Don’t
try to fix every timing problem in place and route.
instantiations. Give yourself the freedom to migrate from one Place and route offers little room for fixing timing where a
technology to another by ensuring that each instantiation of a properly constrained synthesis tool would.
vendor-specific element is in a separate hierarchical block. This
applies especially to RAMs and clock management blocks.
3. Don’t vainly floor plan at the RTL or block level hoping to
improve place-and-route results. Manual area placement can
3. Do use IP timing models during synthesis to give the true cause more problems than it might initially appear to solve.
picture of your design. By importing EDIF netlists of pre- Unless you are an expert in manual placement and floor planning,
synthesized blocks, your synthesis tool can fully understand your this is best left alone.
timing requirements. Be cautious when using vendor cores that you
can bring into your synthesis tool if they have no timing model.
4. Don’t string clock buffers together, create multiple
clock trees from the same clock, or use multiple clocks when a
4. Do design your hierarchical blocks with registered outputs simple enable will do. Clocking schemes in FPGAs can become
where possible to avoid having critical paths pass through many very complicated now that there are PLLs, DLLs, and large
levels of hierarchy. FPGAs exhibit step-functions in logic-limited numbers of clock-distribution networks. Poor clocking schemes
performance. When hierarchy is preserved and the critical path can lead to extended place-and-route times, failure to meet timing,
passes across a hierarchical boundary, you may introduce an extra and even failure to place in some technologies. Simpler schemes
level of logic. When considered along with the associated routing, are vastly more desirable. Avoid those gated clocks, too!
this can add significant delay to your critical path.
5. Don’t forget to simulate your design blocks as well as
5. Do enable retiming in your synthesis tool. FPGAs tend to be your entire design. Discovering and back-tracking an error from
register- rich architectures. When you correctly constrain your the chip’s pins during on-board testing can be extremely difficult.
design in synthesis, you allow the tool to optimize your design to On-board FPGA testing can miss important design flaws that are
take advantage of positive slack timing within the design. much easier to identify during simulation; they can be rectified by
Sometimes this can be done after initial place and route to improve modifying the FPGA’s programming.
retiming over wire load estimation.
in external memory, security issues concerning intellectual property emerge. implementing a design on an FPGA can be broken down into several stages,
Antifuse-based FPGAs aren’t in-system programmable, but rather are loosely definable as design entry or capture, synthesis, and place and route (Fig.
programmed offline using a device programmer. Once the chip is configured, it 2). Along the way, the design is simulated at various levels of abstraction as in
can’t be altered. ASIC design. The availability of sophisticated and coherent tool suites for FPGA
However, in antifuse technology, device configuration is nonvolatile with no design makes them all the more attractive.
need for external memory. On top of that, it’s virtually impossible to reverse At one time, design entry was performed in the form of schematic
engineer their programming. They often work as replacements for ASICs in capture. Most designers have moved over to hardware description languages
small volumes. (HDLs) for design entry. Some will prefer a mixture of the two techniques.
In a sense, flash-based FPGAs fulfill the promise of FPGAs in that they can Schematic-based design-capture tools gave designers a great deal of control over
`
be reprogrammed many times. They’re nonvolatile, retaining their configuration the physical placement and partitioning of logic on the device. But it’s becoming
even when powered down. Programming is done either in-system or with a less likely that designers will take that route. Meanwhile, language-based design
programmer. In some cases, IP security can be achieved using a multibit key entry is faster, but often at the expense of performance or density.
that locks the configuration data after programming. But flash-based FPGAs For Many designers, the choice of weather to use schematic or HDL Based
require extra process steps above and beyond standard CMOS technology, design entry comes down to their conception of their design. For those who think
leaving them at least a generation behind. Moreover , the many pull-up resistors in software or algorithmic-like terms, HDLs are ell suited for highly complex
result in high static power consumption. FPGAs can also be characterized as designs,
having either fine-, medium-, or
coarse-grained architectures. Fine- Table 1 : Key Resources Available In The Largest Devices From Major FPGA Vendors
grained architectures boast a large
number of relatively simple logic Features Xilinx Virtex 2 Pro Altera Stratix Actel Axcelerator Lattice ispXPGA
blocks. Each logic block usually Clock DCM PLL PLL SysCLOCK PLL
contains either a two-input logic management Up to 12 Up to 12 Up to 8 Up to 8
function or a 4-to- 1 multiplexer
Embedded Block RAM TriMatrix memory Embedded RAM SysMEM blocks
and a flip-flop. Blocks can only be
memory blocks Up to 10 Mbits Up to 10 Mbits Up to 338 k bits Up to 414 kbits
used to implement simple
functions. But fine-grained Configurable logic
architectures lend themselves to Logic elements and Logic modules (C Cell
blocks and 18-bit by Based on
execution of functions that benefit embedded multipliers and R-Cell)
18-bit multipliers programmable
from parallelism. Up to 10,000
Data processing functional unit
Up to 79,000 LEs and
Coarse-grained architectures Up to 125,000 logic
176 embedded R-Cells and 21,000
consist of relatively large logic cells and 556 multiplier Up to 3844 PFUs
multipliers C-Cells
blocks often containing two or blocks
more lookup tables and two or more
flip-flops. In most of these Programmable Select I/O
Advanced I/O
Advanced I/O support Sys I/O
architectures, a four-input lookup I/Os support
table (think of it as a 16 x 1 ROM) Embedded PowerPC DSP blocks
implements the actual logic. 405 cores
The FPGA design flow Special features
High-speed PerPin FIFOs for SysHSI for highspeed
After weighing all implementation Rocket I/O multi- differential bus applications Serial interface
options, you must consider the gigabit I/O and interface
design flow. The process of transceiver standards support
especially when the designer has a good handle on how the logic must be it’s equally important to not over-constrain the design, which will
structured. They can also be very useful for designing smaller functions when generally result in less-than-optimal results from the next step in the
you haven’t the time or inclination to work through the actual hardware implementation process—physical device placement— and interconnect
implementation. routing. Synthesis constraints soon become place-and-route constraints.
This traditional flow will work, but it can lead to numerous
1. Functional Blocks iterations before achieving timing closure. Some EDA vendors have
incorporated more modern physical synthesis techniques, which automate
Just about all FPGAs include a regular, programmable, and flexible device re-timing by moving lookup tables (LUTs) across registers to
architecture of logic blocks surrounded by input/output blocks on the balance out timing slack. Physical synthesis also anticipates place and
perimeter. These functional blocks are linked together by a hierarchy route to leverage delay information.
of highly versatile programmable interconnects.
2.The Big Picture
I/O Blocks
A “big picture” look at an FPGA design flow shows the major
steps in the process: design entry, synthesis from RTL to gate level,
and physical design. Place and route is done using the FPGA
Logic Blocks vendors’ proprietary tools that account for the devices’
architectures and logic-block structures.

Programmable
Interconnects

On the other hand, HDLs represent a level of abstraction that can isolate designers
from the details of the hardware implementation. Schematic-based entry gives
designers much more visibility into the hardware. It’s a better method for those
who are hardware-oriented. The downside of schematic-based entry is that it makes
the design more difficult to modify or port to another FPGA.
A third option for design entry, state-machine entry,
works well for designers who can see their logic design as a series of states that the
system steps through. It shines when designing somewhat simple functions, often
in the area of system control, that can be clearly represented in visual formats. Tool
support for finite state-machine entry is limited, though. Some designers approach
the start of their design from a level of abstraction higher than HDLs, which is
algorithmic design using the C/C++ programming languages. A number of EDA
vendors have tool flows supporting this design style. Generally, algorithmic design
has been thought of as a tool for architectural exploration. But increasingly, as tool
flows emerge for C-level synthesis, it’s being accepted as a first step on the road to
hardware implementation Following synthesis, device implementation begins. After netlist
After design entry, the design is simulated at the register-transfer level synthesis, the design is automatically converted into the format supported
(RTL). This is the first of several simulation stages, because the design must be internally by the FPGA vendor’s place-and-route tools. Design rule
simulated at successive levels of abstraction as it moves down the chain toward checking and optimization is performed on the incoming netlist and the
physical implementation on the FPGA itself. RTL simulation offers the highest software partitions the design onto the available logic resources. Good
performance in terms of speed. As a result, designers can perform many partitioning is required to achieve high routing completion and high
simulation runs in an effort to refine the logic. At this stage, FPGA development performance.
isn’t unlike software development. Signals and variables are observed, procedures Increasingly, FPGA designers are turning to floorplanning after synthesis
and functions traced, and breakpoints set. The good news is that it’s a very fast and design partitioning. FPGA floorplanners work from the netlist
simulation. But because the design hasn’t yet been synthesized to gate level, hierarchy as defined by the RTL coding. Floorplanning can help if area is
properties such as timing and resource usage are still unknowns. tight. When possible, it’s a good idea to place critical logic in separate
The next step following RTL simulation is to convert the RTL blocks.
representation of the design into a bit-stream file that can be loaded onto the FPGA. After partitioning and floorplanning, the placement tool tries to place the
The interim step is FPGA synthesis, which translates the VHDL or Verilog code
logic blocks to achieve efficient routing. The tool monitors routing length
into a device netlist format that can be understood by a bit-stream converter. The
and track congestion while placing the blocks. It may also track the
synthesis process can be broken down into three steps. First, the HDL code is
converted into device netlist format. Then the resulting file is converted into a absolute path delays to meet the user’s timing constraints. Overall, the
hexadecimal bit-stream file, or .bit file. This step is necessary to change the list of process mimics PCB place and route.
required devices and interconnects into
hexadecimal bits to download to the FPGA. Lastly, Table 2 :FPGA Usage
the .bit file is downloaded to the physical FPGA.
This final step completes the FPGA synthesis Emulation: 3% Prototyping: 30% Preproduction: 30% Production: 37%
procedure by programming the design onto the
Fairly high; fast Fairly high; fast Fairly high; fast Fairly high; fast
physical FPGA. Time-to-market
compile times compile times compile times compile times
It’s important to fully constrain designs before
synthesis (Fig. 3). A constraint file is an input to Performance Not stringent Not stringent Very critical Very critical
the synthesis process just as the RTL code itself.
Constraints can be applied globally or to specific Very low per Low per Moderately high per High per
portions of the design. The synthesis engine uses Volume application application application application
these constraints to optimize the netlist. However,
Table 3 Advantages/Disadvantages Of Various FPGA Technologies Modern FPGAs also incorporate a JTAG
port that, happily, can be used for more
Feature SRAM Antifuse Flash than boundary-scan testing. The JTAG
Reprogrammable? Yes (in System) No Yes (In System or offline) port can be connected to the device’s
Reprogrammable? Speed Not internal SRAM configuration-cell shift
Fast 3X SRAM register, which in turn can be instructed
(Including erasure) Applicable
to connect to the chip’s JTAG scan chain.
Volatile? Yes No No (but can be if required)
David Maliniak, Electronic Design Automation Editor

External Configuration
Yes No No If you’ve gotten this far with your design,
file?
chances are you have a finished FPGA.
Good for Prototyping? Yes No Yes There’s one more step to the process,
Instant on? No Yes Yes however, which is to attach the device to
IP Security Poor Very Good Very Good a printed-circuit board in a system. The
Large (Six appearance of 10-Gbit/s serial
Size of configuration cell Very Small Small (Two Transistors)
Transistors) transmitters, or I/Os, on the chip, coupled
Power Consumption High Low Medium with packages containing as many as
Radiation Hardness No yes No 1500 pins, makes the interface between
the FPGA and its intended system board a very sticky issue. All too
Functional simulation is performed after synthesis and before physical often, an FPGA is soldered to a pc board and it doesn’t function as
implementation. This step ensures correct logic functionality. After expected or, worse, it doesn’t function at all. That can be the result of
implementation, there’s a final verification step with full timing information. errors caused by manual placement of all those pins, not to mention the
After placement and routing, the logic and routing delays are back-annotated to board-level timing issues. created by a complex FPGA
the gate-level netlist for this final simulation. At this point, simulation is a More than ever, designers must strongly consider an
much longer process, because timing is also a factor (Fig. 4). Often, designers integrated flow that takes them from conception of the FPGA through
substitute static timing analysis for timing simulation. Static timing analysis board design. Such flows maintain complete connectivity between the
calculates the timing of combinational paths between registers and compares it system-level design and the FPGA; they also do so between design
against the designer’s timing constraints. iterations. Not only do today’s integrated FPGA to- board flows create
the schematic connectivity needed for verification and layout of the board,
3. Go With The Flow but they also document which signal connections are made to which device
The implementation flow for FPGAs begins with synthesis of the HDL pins and how these map to the original board-level bus structures.
design description into a gate-level netlist. Accounting for user-defined Integrated flows for FPGAs make sense in general, considering
design constraints on area, power, and speed, the tool performs that FPGA vendors will continue to introduce more complex, powerful, and
various optimizations before creating the netlist that’s passed on to economical devices over time. An integrated third-party flow makes it easier
place-and-route tools. to re-target a design to different technologies from different vendors as
conditions warrant.

Language Input (VHDL/Verilog) 4. Simulation Stages


FPGA simulation occurs at various stages of the design process:
Design

HDL Files Initial Optimization


after RTL design, after synthesis, and once again after
Timing analysis implementation. The latter is a final gate-level check, accounting
for actual logic and interconnect delays, of logic functionality.
Timing Optimization
Constraints

FPGA
Implement

Placement RTL Test FPGA Gate


VHDL Design Bench Library
RTL
Routing IP

Verilog RTL
IP
FPGA/PLD
HDL Simulator
Place and
Route
Once the design is successfully verified and found to meet timing, the final
step is to actually program the FPGA itself. At the completion of placement
and routing, a binary programming file is created. It’s used to configure the
device. No matter what the device’s underlying technology, the FPGA
interconnect fabric has cells that configure it to connect to the inputs and Synthesis
outputs of the logic blocks. In turn, the cells configure those logic blocks to
each other. Most programmable- logic technologies, including the PROMs for
SRAM based FPGAs, require some sort of a device programmer. Devices can
also be programmed through their configuration ports using a set of dedicated
pins.
New Chips don’t Sucks (Power)
by Jim Turley, By allowing big areas of the chip to essentially switch
Embedded Technology Journal off, Freescale slashes the passive leakage current in
those areas a big problem for small-geometry
Within a week, Intel and Freescale both
semiconductors. Modern chips often leak as much
announced new high-end embedded processors.
current as they actively dissipate, an unfortunate side
They’re both packed with multicore processors, DRAM
effect of small transistor geometry.
controllers, and PCI Express interfaces. But, for all
their similarities, they couldn’t be more different. Intel and x86
In this corner, we have Freescale’s new P1022, the For its part, Intel partially lifted the veil on a series of
sixth member of the QorIQ family. And in this corner, upcoming embedded x86 processors similar to
we have “Jasper Forest,” a mostly new family of chips Freescale’s new QorIQs. Codenamed “Jasper Forest,”
gineer from Intel. Both are more power-efficient than their the new chips are based on the venerable x86
often predecessors, though, in one case, that’s not saying processor architecture. In this case, they’ll use the
ASICs much. And both are well-supported with software and “Nehalem” processor core design that appears in some
development tools. newer Xeon chips.
based
Freescale and PowerPC Like most recent Intel designs, Nehalem emphasizes
se of power efficiency over raw clock speed. Intel likes to
If you’re not up on Freescale’s perverse brand-name
n be point out that Jasper Forest, when it arrives, will save
strategy, QorIQ is the spiritual successor to QUICC, the
times. 27 watts over today’s equivalent Xeon-based
aining company’s old communications controllers. Years ago,
configuration, largely because of the more efficient
when the QUICC name stood for “quad integrated
processor and the integrated I/O. In an interesting
ing is communications controller” and thus made a modicum
aside, an Intel representative extrapolated billions of
with a of sense. When the QUICC family traded in its 68K
dollars of energy savings if all the world’s embedded
es, IP processor core for a more modern PowerPC processor
Xeon processors were replaced with Jasper Forest.
sing a core, the name changed to Power QUICC, which still
Hey, dream big.
the made some sense.
That comparison is a big disingenuous, though,
after QorIQ (pronounced “core I.Q.”) trades on the
because Xeon is notoriously power hungry. The heat
based company’s hard-won brand equity in the letter Q, but
sinks are typically bigger than the processor. Saying
rocess otherwise makes little sense. Nevertheless, Freescale is
that Jasper Forest consumes less power than a Xeon
ndard pushing ahead with numerous QorIQ family members,
them 5500 is like saying it’s a long walk to the moon.
the P1022 and P1013 being the newest additions.
ehind. Still, Jasper Forest promises to be in the same league
The P1022 takes a dual-core PowerPC and
p as QorIQ with its multicore heart and integrated
mates it to a dizzying array of communications- and
peripherals. The decision may come down to whether
interface-related peripherals. As nice as it is to have a
you prefer the PowerPC or x86 instruction set
pair of PowerPCs under harness, the real value of the
P1022 is its peripheral mix. A set of three (count ’em) Clash of the Titans
PCI Express buses allows connection to pretty much Both chips support DDR3 directly; both have PCI
anything else in a typical system. Disks get their own Express controllers (although Intel’s has 16 lanes to
dual-SATA interfaces, and memory is handled through Freescale’s six); both have RAID disk controllers; both
a DDR2/3 controller. Dual gigabit Ethernet ports handle have expected 10-year life spans. And both come
networking, while dual USB 2.0 controllers handle the from the biggest names in microprocessors.
slower stuff. Unusually, this chip has an LCD controller, Intel isn’t giving away too many details of Jasper
so a nice user interface would be simple to add. The Forest just yet, so the chips may come with Gigabit
P1013 is identical to the P1022 but has only a single Ethernet, USB 2.0, or LCD controllers like QoriQ; we’ll
PowerPC processor core. have to wait and see.
Even with all its goodies, the P1022 is only a midrange Jasper Forest’s Nehalem-based processor core is
QorIQ chip. The existing P2020 and P4080 chips have available in single-, dual-, and quad-core
more performance, including quad-core centers, but configurations, whereas Freescale’s QorIQ chips are
are also more power-hungry. Freescale pitches the already available in four- and eight-core versions. So
P1022 (or any QorIQ chip with P1xxx in its part that makes them twice as good, right? Plus, several
number) as “balanced” or power-efficient variations. QorIQ chips are already shipping (although the P1022
That is, they’ve got high-end performance but with the itself isn’t due until January), while the first member
edge taken off to reduce power consumption a bit. of the Jasper Forest family isn’t expected until early
The power-efficiency comes from a couple of next year. Advantage: Freescale.
factors. First, the chip runs at “only” 600 MHz to 1 On the “green” front, Freescale has Intel beat, hands
GHz, whereas other QorIQ devices are rated for 1 GHz down. The dual-core QorIQ P1022 consumes about 3
and up. Second, the chip is manufactured with 45nm watts, less than one-tenth of Jasper Forest’s
silicon-on-insulator (SOI) process technology, the estimated 35-65 watts for a roughly equivalent dual-
current state of the art And finally, the chip’s circuits core configuration. That x86 instruction set exacts a
are separated into two power planes, dividing the heavy toll in power efficiency, even though both chip
silicon into areas that must stay awake and active all families are fabricated in similar 45nm silicon.
the time and those that can go to sleep.
Designing the Power of tomorrow
Training aims at providing a basic
understanding of the integrated circuit design,
by working on an industry standard project
either in Front end , Back end , Embedded ,
EDA and DSP.

Winter Training Modules


VHDL + FPGA.
Verilog + FPGA.
SPICE.
DFT.
STA.
Embedded Designing.
FPGA (Spartan, Virtex*).
Shell/Perl.
Tcl/Tk.
C/C++.
CMOS VLSI Design.
Analog VLSI Design.
Digital VLSI Design.
Linux OS.
MATLAB.

Benefits of Training
How to use FPGA (Spartan, Virtex).
From BASICs to ASICs.
From Gates to Microprocessor.
Will be able to understand IC Design.
Interaction with R&D Team.

info@jbtechindia.com www.jbtechindia.com

JBTech INDIA
VLSI Design Solutions & Project Training

JBTech INDIA
Royal Krishna Apra Plaza, D-2, F-09, Alpha-I,
Commercial Belt, Greater Noida (U.P), INDIA
Tel: +91-0120-4213142, 09911676774
Email: info@jbtechindia.com
Website: www.jbtechindia.com

Das könnte Ihnen auch gefallen