Beruflich Dokumente
Kultur Dokumente
F
I
h,
|{
l-
F
I
tF
I
(0ptional)
FPGA
0ptimization
Optimizationtechniques
areimpofiantin FPGAprogramming.
Thereare
for optimizingFPGAcode.Onereasonis to optimizethe
two mainreasons
codetbr speed.This is impofiantwhenusinghigh-frequency
loop rates.
Anotherreason
tooptimizeFPGAcodeis for spaceutilizationon theFPGA.
Opfimizingfor sizeis importantwhenFPGAcodeis exhemelycomplex
andis too lareeto fit on theFPGA.
>
l'!t
F.
f-ts\
:t
Topics
rr!|l
h
r{,
E
r{,
OptimizationTechniques
B . Benchmarking
FPGAVIs
C. BasicOptimizationTechniques
D , ArchitectureOptimizations
E. AdvancedOptimizations
1'
h
fd
FI
Ef
1'
E
r{,
-
1l
l{
h
|{
h
f'f
-
-a
a-
A Nahonat
lnsxunenlsCotpaQhan
ConDactql'andLabVlEW
fundanertatsCouseManual
Lessan9
FPGA
Optiniration(optional)
A. 0ptimization
Techniques
FPGA VIs are limited Drimarilv in two areas:
Speed-Relatesto theactualexecutionspeedofthe FPGAgates.Ifthe
codedoesnot executeat thedesiredloopratethenyoumustmodifythe
applicationso thattheloop ratecanbe increased.
Thesemodifications
canchangethe wayyour createanddesigntheFPGAVI.
Size-Relatesto theactualamountof spacetheapplicationuseson the
FPGA. If the applicationrequiresmore hardwarecomponentsthan are
physicallyavailableon theFPGA,thecompilefailsandtheapplication
cannotrun on theFPGA.
Youcanusesevenloptimizationtechniques
to tum a non-functioning
or
poorly functioningapplicationinto a workablesolution.Table9-l lists
someof themainoptimizationtechniques
andhowtheyaffectFPGAspeed
andsize.Thesetechniques
work for bothCompactRlotargetsaswell as
otherNationalInstruments
FPGAtargets.
Tableg-1. 0Dtimization
Techni0ues
OptimizationTechnique
FPGA Speed
FPGA Size
Advancedtechniques
thatrcquiredetailedknowledgeof FPGA
architecture
Conpactql0andLabVlEW
FundanatalsCou6ellanual
g2
F
F
P
P
F
P
F
F
F
F
F
F
F
rr
rl'
|r-
(qptional)
Lessan9 FPGAOptinEation
B. Benchmarking
FPGA
Vls
To determineif a changehas affectedthe FPGA VI you need a way to
measurethe executionspeedand size requirements.
Looo
BateBenchmarks
Using tick countsis the bestway to calculateloop rates,where one tick
equalsone clock cycle ofthe timebase(default40 MHz). The simplestway
to measurethe numbeaof ticks betweenone iterationand the next is to use
the Tick Count VI with a shift register
Placethe Tick Count Vl in the desiredloop, and wire the output to a shift
register,and then comparethe prior iterationstick count to the current
iterationstick count by subtractingthe differencebetweenthe two. On the
output of the subtractfunction, wire an indicator to seehow much time
elapsedbetweeniterations.
Although this techniqueusessomeFPGA space,it is typically usedduring
the developmentphaseand deleteit later.Becausean FPGA runs the code
in true parallel thereare no significant timing effectsby adding the
additionalcode.
Compile
Reporl
Benchmarks
After compilinganFPGAVI LabVIEWdisplaysa CompileReportwirh
importantinformationabouttheoverallsizeandspeedofthe application.
VlSize
TheDeviceUtilizationSummaryof theSuccessful
CompileReport
providesinfomationaboutthe numberof SLICEsused.This metricis the
mostimportantmeasure
ofthe sizeofthe programin hardware.
Thedesign
process
shouldbeaniterativeprocess.
Dudngthedesignprocess,
noticethe
increasein SLICEsusedasthe programgetslarger.
VlSpeed
The Successful
CompileReportdialogalsocontainsinformationaboutthe
clockratesof theapplication.
. RequestedRate-Displaystheclockrateat whichthecompiledFPGA
VI runs.The defaultseningis 40 MHz.
. TheoreticalMaximum-Displays thetheoreticalmaximumcompile
ratefor theFPGAVL This problemis conrnonwhenyou use
single-cycle
TimedLoops.
t{t
fl
-
<,
@National
lnstrunenbCatporation
v3
ConpaclBtq
andLabVtEW
fundanentate
Cousel/lanual
LessonS
fPGA0ptinizatian(0ptianat)
lf theTheoreticalMaximumis slowerthantheRequested
Rate,thecompiler
will errorout andthecompileprocessstops.you mustmodify the
applicalionto the pointwheretheTheoretical
Maximumis equalto or
grcaterthanthe rcquested
rate.
C.Basic
0plimization
Techniques
BasicFPGAoptimizationtechniques
aretypicallyeasyto implement,
requireno majorchanges
in thecodearchitecture,
andoftel areFPGA
programming
best-practices.
Thebasicoptimizations
coveredin thissection
primarilyaffectFPGAsize.
IimitFront
Panel0biecls
Frontpanelobjectsconsumea significantamountof spaceon the FPGA.
Not only is spacerequiredto storcthedataitself,but a considerable
amount
of FPGAlogic is requiredto implementthecommunication
betweenthe
ftont panelobiectandthehostVI.
Whenyou transferdataacrossthebusbetweentheFPGA andRT hostit
mustbe brokendowninto 32-bitpackets.This limitationis dueto the
numbetofdatatmnsferljneson thebus.Thereforeifyou havea frontpanel
elementthatusesmorethan32,bits,thatfrontpanelelementmustbebroken
downinto several32-bitchunks
andbepassed
alongthebuspiecewise.
This
breakingdown of the datarcducesoverall transferspeeds.
Usethefollowing guidelinesro optimizerheFpcA code:
. Limit the useof anays
.
Thesetechniques
arenecessary
only for frontpanelobjectsusedin the
toplevel FPGA VI; frontpanelobjectsin subvlsdo notconsumespaceon
theFPGA.
Avoid
FronlPanel
Arrays
Arraysandclustersthataregreaterthan32-bitsin sizerequircanextracopy
on theFPGA to guarantee
all thedatais read,andconsumea significant
amountof spaceon the FPGA. Array elementsare transferredto andftom
theFPGAindividually,solimitingrheuseof araysdecreases
rhelogiccells
requiredto implementthe datatransfer
ConpactBl9andLabVlEW
fundanentats
CowseMan\at
94
Lesson
9
(Aptianal)
FPGA
Optjnization
Bilpack
Boolean
Logic
Bitpackingis a techniquethatcombinesmanysmallpiecesof datainto a
thecommunication
largerpieceof data.This techniqueworksbecause
betweentheFPGAtargetandreal-timehostis in 32 bitwords.Forexample,
you canstorefour 8-bit integersasone32-bitinteger.lnsteadof four
sepamte
busesto transfertheelementsofdata,only onebusis required.
lnsttunpntsCorpoatian
A Nattonal
Fundanentals
CouEeManual
ConpactqlA
andLabVlEW
LessonI
.PGA0ptiniation (jptionat)
8"6
E.ore.n
3l
E""r"-"
@.+
F-;ohe5l
[J:!
6d'.da
F@rea;n
3l
toorean
t@
Figure
9"1.C0nvert
Boolean
Controls
toa Boolean
Array
clobal
Variables
Youcanuseglobalvariablesscopedto theFPGAto optimizecodefor size.
Usingglobalvariablesin subvls is especiallyeffecti/eiforher VIs needto
access
thedata.Ifa pieceofdatais not global,you mustuselogic to route
it to otherVIs andeachVI will require memoryto storethe data.To adda
globalvariableto your project,selectGlobalVariableftom the Stnictures
paletteon theblockdiagram.Thenlink theglobalvariableto thecontrolor
indicatoron your front panel.Youcanalsoreplacecontrolswith constants
to furtherreduceFPGAusage.
UseSmallDalaTypes
Alwaysselectthe smallestdatatypepossiblewhenusingcontrolsand
constants
on theFPGA.For example,thedefaultdatatypeof the Index
Araayfunctionis a 32-bitintegerHowever,ifyour arraywill haveno more
than256elements,
rcpresenttheindexasan unsigned8-bit integer.
Whenusinga Casestucture,only very rarelywouldyou needmorethan
256cases.
Ifyou havefewerthan256elementsin your Casestructurethen
usean 8-bitintegerwiredto theselectorterminalinsteadofthe default
32-bitintegervalue.
Similarly,if you areusingtheLoop TimerVI, Tick CountVI, or WaitVI,
configurcthemto usethe smallestSizeof Internal Counter possible.
Conpactqlo
andLabVlEW
Fundanentals
Cou$e
Manuat
94
F
F
F
F
F
F
(Aptianat)
Lessjng FPGAOptiniralon
Eliminale
Dots
Coercion
Whena coerciondot displays,it meansthecompliermustdeterminethe
properdatatypeandit canleadto inefficientdataconversion.
Insteadof
the
data
type
and
insefithe
leavingthe coercionto chance,specify desired
appropriate
coercionfunctionwhenyou seea coerciondot on yourblock
diagram.
Functions
Avoid
Large
SomeLabVIEWfunctionsconsumea significantamountof spaceon the
RotatelD Anay, andScaleBy
FPGA,suchastheQuotient& Remainder,
Powerof 2 functions.
signillcantspaceon
Quotient& Remainder This functionconsumes
you
power
theFPGA.If
needto divideby a
of two, considerusingthe
Scaleby Powerof 2 functionwith a negativeconstantwired to the
n input.
I
f'
F
tr3
rat
F
F
x,
dl
rr
I
Ir
ion
A Natnnal tnslrunenls Coryoral
Fundanentals
CauseManual
ConoactqtqandLabVlEW
LessonI
FPGAqptintzation(optionat)
0ptimize
Comparisons
The Compadson
functionsrequiresignificantspaceon theFPGA.You
mightbe ableto savespaceon theFPGAby usingthe Numberto Boolean
ArrayfunctionandtheCompoundArithmeticfunctionconfiguredwith the
Or operation,asshownin Figure9-2.
E@;"d fi#;ia
El
Et
Ft"-;]
lffit
t6l
tr
lEll
E;;;l
tEg
ta)
g-2.Uslng
Figure
Boolean
0perations
to0ptimize
Comparisons
The codein theWhile Loop on theleft is standadLabVIEWcodethat
compares
a numberto 16.Thecodein theWhileLoop on therightachieves
the sameresult,but usesapproximately
halfthe FPGA resources
and
nearly
executes
twice asfast.
Ifyou usethistechniqueto optimizeyourdata,be awarethatifyou wantto
yourcode
adjustthecomparisonvalue,you mustsignificantlyrestructure
to matchtheappropriate
Booleanlogic requiredto determinethedesired
results.
Note It is easiestto usethis techniqueto makecomparisons
to powersof two.
UsingReenlrant
andN0n-Reenlrant
subvls
Youcanconfigurea subVl asa singleinstancesharedamongmultiple
callers,alsoknownasa non-reentrant
subvl. Youalsocanconfigurea
subvl asreentranttoallowparallelexecution.
By default,VIs createdunder
an FPGAtargetarercenfant.To makea subvl non-reenftant,
select
Executionfrom the Categorypull-downmenuof the VI Propedesdialog
box andremovethe checkmarkfrom the Reentrantexecutioncheckbox.
Someapplications
usesharcdresources
thatareaccessed
by multiple
callers,suchasfunctionsor subvls in anFPGA VL Possibleshared
resources
includedigitaloutputlines,analoglines,memoryitems,FIFOS,
theinterruptline, localandglobalvariables,
andnon-reentrant
subvls.
Callersmustwait in a queueto gainexclusiveaccessto theresource.
By
contrast,an unsharcdresourceis a portionof codededicated
to a specihc
you
If
use
non-reentrant
VI,
caller.
a
subvl in an FPGA
only a singlecopy
ofthe subVlbecomeshardwareandall callerssharcthehardware
resource.
This maydecrease
the executionspeedofthe FPGA code.Ifyou usea
Fundanentah
Caurse
Manual
ConpaclqloandLabVlEW
Lessan
9
fPGAAptinization(Aptional)
Table
9-2,Reentrant
vs.N0n-reentrant
Vls
VI Tlpe
FPGA Speed
FPGA Utilitization
Non-reentiant
Reentrant
(defaultfor FPGA)
A National
lnsnunenECotpo.ztion
Canbe higherbecause
eachinstance
ofthe subvl on theblockdiagram
usesspaceon theFPGA.However,
reentantsubvls do not useFPGA
resources
lor arbitration.
Canpactql0andLabVlEW
fundanentalsCouseManual
LessonI
qptinEatian(Oplionat)
FPGA
Exercise
9-1 OplimizeFPGAVl
Goal
UsebasicFPGAoptimizations
to rcducetheapplicationsize.
Scenario
Youhavebeengivenan FPGAVI thatis meantto outputa lonethatis
on thetemperature
inputfrom an NI 9211.Unfortunately
the
dependent
FPGA VI will not compilein its presentstate.Useeachofthe techniques
leamedso far in the courseto optimize the FPGA VI so it cancompile and
useminimalspaceon theFPGA.
Design
TheFPGAVI generates
a sinewaveon AO0 ofthe NI9263 andchanges
the
wave
frequencyofthe sine
by changingthetimingbetweenanalogoutput
updates.
Thesinewaveon AO0 createsa toneon the speakeron the Sound
andVibrationSignalSimulatorwhenconnectedto AUDIO IN CHOwith the
SpeakertumedON.
The temperature
is readby readingtheThermocouple
valuesfrom AI0 on
theNI 9211.
Calculatethetimingby takingthe average
temperaturc
valuesandscaling
that information using the empirically derivedscalingfacton to generate
a time delaybetweensamplesof somewhere
between2000-120ticks of
whichgenerates
the40 MHz timebase,
an audiblefrequencybetween
40G-6700Hz.
lmplemenlalion
l. Find theappropriate
scalingfactols.
n OpentheTempemture
MonitorProject.
0 RuntheSimpleTemperature
MonitorRT HostVID Do not touchthethermocouple.
El Recordthe approximate
valueof theThemocoupleSignal
indicator:
TCcou=
ConpaclqlqandLzbVlEW
fundanenlalsCouseManual
9-10
=r
ED
LessanI
ral
TCror = -
>r
f{
>r
r{
Use the Slope lnterceptfomula (), = mr + lr) to find where the line
will crossthe Y-axis (r).
- Tccojd\=
rr = r | 20 - 2000)/ (TCHo,
Lr
rat
L
tat
L-
.PGAAptinzation(Optijnal)
tr Stopthe FPGA VL
b-
YI
tr ClosetheFPGA VI andtheTemperature
MonitorProject.Do not
saveanychanges.
-t
2. Createa newproject.
=D
1'
-t
-,
tr SelectLabVIEWFPGAInterfacein theProgramming
Modedialog
-
box.
a)
a
>
qt
-l
b
1'
E
{,
h
@National
lnstrunenECorpotatian
ConpactqloandLabVlEW
fundanentals
CouseManual
Less1n
I
(0ptianal)
FPGA
Optinization
g-3.FPGA
- 0riginal
Figure
0ptimization
Vl
Nole In theinterest
of time,do notcompilethisVI.
If you compiledthis VI, you would seethe followingeror reports:
.
Thecompilereportstatesthatthecompilationfailed.
CanpactBlq
andLabVlEW
fundanentalsCouseManual
912
LessanI
t- q-l
(1pt!onat)
.PGAAptinization
t;,;_,. f;,p I
- 0riginal
Figure9-4. FPGA
0ptimizati0n
Vl Compilati0n
Failure
.
1'
-,
- 0riglnal
Figure
9-5.FPGA
0ptimizallon
VlCompile
Error
Server
4t
When you review the error you can seethat the VI usestoo much
memory. You must review the codeto seewhat is taking up too much
memory on the FPGA.
-l
IE
A NaliondllnsnLnenlsCo.pardt
ron
913
ConDactRl0
andLabVlEW
funllanentalsCou6eManuat
Lesson9
FPGA
AptinEation(Optianat)
AvoidFrontPanel
Affays
Whenusinga CRIO-9103
with a 3M GateFPGA,youarelimitedto lessthai
200kB ofmemory to usefor thingssuchasfront panelcontols and
indicators.The fiont panelolthe FPGA Optimization- OriginalVI,
includesa SinewaveArraycontaining5,000132elements.
This arrayalone
uses200kB, soyou mustreplacethe SinewaveArray (5000Elements)
front panelcontrolwith moreefficientequivalent
code.
TheSinewaveArraycontains5,000elements,
however,whenyou look
closer,you can seelhatit contains100sinewavescomprisedof numbers
between 128and 127.
Because
this is a relativelysimplewavetbrmyou caneasilysubstitute
the
SinewaveArray (5000elements)controlwith anequivalentLook Up
Table.
L Replacethe Sinewave
Array with a Look Up Ta51e.
D Add a Look-UpTablelD ExpressVI on theblockdiagram.
Q The configuration
dialogbox opens.
You configuretheLook-UpTableto create5,000elementsof
I8 datawith 100cyclesof sinewavedata.
Figure
9-6.C0nligure
Look-Up
Table
1DDialog
Box
Click DefineTable.
CanpactBlo
andLabVlEW
fundanentals
CouaeManual
9-14
=D
Lesson
I
ra,
qptinization(Op|onal)
FPGA
=r
a
B
F
F
b,
-t
F
ED
b
I
F
t;___1 t.-",
rlF
rE
jlntf,:',aue
t ",]f.^.r
Nober of cyd6
ttG__-1
Box
0ial0q
Segment
Figure
9-8.confjgure
D Setthe Mode to sine wave andtl'leNumber of Cyclestol00
fl MakesureEnd Addressis setto 4999.
D ClickOK.
tr Click OK in the DefineTabledialogbox to acceptthe settings
D Click OK to closetheConfigureLook-UpTablelD dialogbox
f
rl
i *b__l
Dialog
Box
Table
Figure
9-7. Define
l'
.]
lnsltunenlscorpotalion
a National
courseManuat
canpactql1andLabvlEwFundanentals
(Aptianal)
Lessons FPGA0ptinization
Figure
9-9.FPGA
0ptimizaii0n
VlwithArray
Deleted
l'lole In theinterest
of time,do notcompilethisVI.
.
Starus
Iufter
Conpllati.n
ol 4 iDpur
sxccesslut
rs:
1,977 0u! oI
3,793 our ol
2,672
23.672
6t
132
Dvice ULilization
Sunnary
Nurbe! oi EUF6UUiis
Nurber oi loced BUFGUIIS
Nurber af DCIS
NunbEr .f
Nxaber
NunbEr .f
NunbEr Df
IOES
E:terlal
of loced
IOES
UUlT13X13s
FlXEl6s
Nunber of SIICES
- 0riginal
Figure9-10.
FPGA
ngthe
0ptimizati0n
Vl Compile
Rep0rt
AtterReplac
Arravwitha Look-Uo
Table
Note Dueto thenatureof theFPGACompilationprocessthecompiledoesnot always
reachthesameconclusion,
soyourresultsmaydiffer slightlyfrom thefiguresshownin
theComDileReDortsfor this exercise.
Cou$el'lanual
Con1actnt1andLabvlEWFundanenkls
9-16
Er
I
F
F
F
F
F
F
F
F
F
F
F
trr
I
F
trD
LessonI
FPGAAp n2atianlqptonal)
Eliminale
Dots
Coercion
Anotherareathatcancauseproblemsis by allowingtheVI to automatically
coercethevaluesinsteadof specifyingwhattype ofdatayou wouldlike
to have.Smallredcoerciondotsappearon theinputsto Look-UpTable
ExpressVI andtheinputto Mod3/AOoFPGAI/O Node.Youshouldalways
explicitlydefineyourdatatypeswheneverpossible.
1 Setdatatyperepresentatio[
to reduceLabVIEWcoercion.
D Right-clicktheterminalswheretheredcoerciondotsarelocatedand
selectCreate>Control.
This controlwill notbeconnected,
butwecanuseit to identifywhat
the expecteddatatypefor theinputteminal is.
tr
Right-clickthewire betweentheLook-UpTableandtheFPGA
I/O NodeandselectInserb>Numeric
Palette>ConversioD'
To Fixed-Point.
The To Fixed-Pointfunctionrequiresthatyor!wirc a valueto the
fixed-pointtypeinputtodeteminetheconfiguration
ofthe resulting
tlxed-pointvalue.
=3
tr
Et
1'
;1
h
-
E
{
andthe
n
H'
-r
<,
lnstrunenE
A Natunat
Caryoqtian
Con1adqtAandLabVtEW
fundanentalsCouEeManual
LassonI
.PAAqptinization(Optional)
D ClickOK.
tr SavetheVI.
2. Wire theblockdiagramasshownin Figure9-l l.
Figure
9-11.FPGA
0ptimization
Vl withConverted
Values
Block
Diaoram
N0le In the interestof time, do not compile this Vl.
Figureg-12.FPGA
optimizati0n
Vl withC0nverted
Vatues
ConpactRt0
andLabVtEW
Fundanentah
CouaeManuat
918
9
Lesson
FPGA
0ptinization(Optional)
AvoidLarge
Functions
thatyoucanuseis to replaceanydivision
Oneof themajoroptimizations
wherethe denominator
is a powerof2 with the ScaleBy Powerof 2
while
this
function
takesup minimalspaceon the FPGAwhen
function.
wiredwith a constant,it takessignificantspaceif wiredto a control.
functionswith the ScaleBy Power
l. ReplacetheQuotient& RemainCe.
of 2 function.
and
four samplesyou lind thesumof thefour samples
N0le Becauseyou areaveraging
divideby four; however,dividingby four is the sameasmultiplyingby 2-2.
function,the associated
constant,
tr DeletetheQuotient& Remainder
theTo Long Integerfunction,theTo Fixed-Pointfunctionandits
associated
constantfrom betweentheAdd Vls andtheFPGA
ScalingVL
O Add a ScaleBy Powerof 2 functionto theblock diagram.
D wire theoutputof theAdd VI to thex input.
to a value
O Createa constantfor then inputterminal.Settheconstant
of 1.
Avoid usingScaleBy Powerof 2 with a controlfor then input.
If youopentheFPGAScalingVI blockdiagram,you seethatit uses
the ScaleBy Powerof 2 function;however,in the top-level
to theSlopecontrol.
applicadonthis is connected
Right-clicktheSlopecontrolandselectChangeto Constant.
tr
to determineyourslope.
In step1 you touchedthe thermocouple
Enterthevalueinto theconstant.
For example,ifyou hada Tccord= -20,000anda TCgo'= 20,000,
thenyou wouldselecta Slope= -32,768to 32,767.
@l'lalionallnstrunenlsCotpoQtian
+19
f ndanenblsCourse
tlantal
ConpadRtA
andLabVlEW
LessonI
fPGAAp niration(0pti0na1)
g-13.Replace
Figure
0uotient
& Bemainder
Functions
withtheScale
ByPower
of2 Function
Nole In the interestoftime, do not compile this VI.
If you compile the VI now you will seethe following Compile Repofi.
g-14.Successful
Figure
Compile
Report
Beplacing
atter
ouotient
& Remainder
Functions
withScale
ByPower
0f2 Functions
Asyoucanseein Figure9-14,youeliminated
another106SLlCEs,
whichis almost17,of theFPGAsize.
CanDaclql0andLabVlEW
fundanentals
Cau6elilanual
9-2A
F8
P
tr
LessonI
FPGA
(optionat)
Aptinization
By eliminatingiDproperuses
of theQuotient&Remainder
functionand
by replacingcontrolsconnected
to theScaleBy Powerof 2 with
constants,
you significanrlyreducedthe sizeofthe FPGAVl.
i|
E
h
r{l
h
-
=D
6i;'.d-a
it
ir
rr
e
h
r(,
E
-
Fig[re9-15.Replace
0uotient
& Remainder
Functi0n
withIncrement
Values
Block
Diagram
af,
D DeletetheQuotient& Remainder
function,associated
constant,
and
Coerceto U16 function.
0 Deletethetunnelon theSequence
Stucturecomingfrom the
iterationcounterof theWhile Loop.
Er
I
-
E Press<Ctrl-B>to deletebrokenwrres.
:a
h
@NatonallnstrunentsCotporation
9-21
Conpa.tqtoandLdbVttWtudanenutscarrse Mcnudt
LessanI
.PAAAptiniration(Optionat)
D Right click the address input for the Look-Up Table and createa
Ul6 constant,
setthe constantto a valueof 5000 (numberof
elementsin the Look-Up Table).
tr
tr
status:
conpilarion
Conpil6tlon
suhrary
!oq1c Utllrzation
Nunber ol s]ie Fllp Flops
Nunber ol 4 in!u! ILITS
Dewlce Utilizarion
Suhnary
lunber ol BUFCItrxs
Nunbe! ol loced BUFGIIIS
[unber oi
llunber
Iuhbe' ol
Irunbei oi
3
t?
4S4
Erreinal
IoBs
df loced IOBS
ULIIT13X13S
RAUll6s
ss
96
96
1'!336
g-16.Beplace
Figure
Functi0n
withIncrement
0uotient
& Remainder
Values
Report
Successftrl
Compile
CjnpactBl0andLabVlEW
fundanentalsCaurse
Manuat
9-22
F
P
F
FD
I
Lesson9
FPGAqptinintian (1ptional)
P
I
F
I
F
ral
Benchmark
lheExecution
Speed
r4
ra
!3
iD
!r
if,
=r
e
L,
I
-
Figure9-17.AddTickCounts
to theFPGA
0ptimizati0n
Vl BtockDiagram
-t
h
h
{
h
{,
-t
-
@Nalonal
lnettunents
Cotpoahon
9-23
ConpmtRt,andLabVtEW
Fundanentats
CouseManual
(Aplional)
Lessan9 FPGAAptintration
Figure
9-18.Configure
Tick
Count
Dialog
Box
D Createa copy of the Tick Count VI and place it in the initialization
sequence.
tr
Loop Rate(usec).
Savethe VI
conpacBtAandLabVtEW
Fundanentah
Cou$a\lanual
9-21
Lesson
I
FPCAoptinizatian(qptianat)
Compile
Report
Figure
9-19.Successful
ThiscodeusesmoreFPGAspacethanit wouldifyou hadnotadded
thetick countis used
thetick countfunctionality;
however,because
you candeleteit beforeyou deploythe
only for testingpurposes
code.
3. Savetheproject.
Tesl
l
lnstrunents
@National
Coryonton
andLabVlEW
Fundanenkle
ConpactRlo
CouEeManual
Lessan
9
(optunat)
.PGAAp|nizatton
g-20.Basic
Figure
FPGA
0ptimizati0ns
FrontPanel
Challenge
The FPCA Optimizationappljcationnade somesignificant improvements,
but you could do even more?
The following list providesexamplesof possibleimprovementsyou could
makc.
I . Replacingthe Equal?function in the main level VI and the In Rangeand
Coercefunction in dte FPGA Scaling subVI with Boolean logic
functions.To do this you must conven the integernumbersto Boolean
arraysand do bitu'ise logic ol the resultantarray.
2. Optimize your timcbasesby using only U8 or U I 6 numbersinsteadof
U32 numbers
(oqpdc,plqar1tab,I/t/runo"npntat'Caa.pMdludt
9.2o
F
P
F
F
F
F
LessanI
.PGAAginization(optianal)
F
FN
P
F
F
P
F
F3
Figure9-21.Successfu
I CompileReportwith
Challenge
0ptimizations
lmplemented
Endol Exercise
9-1
!!
E
e
it
;il
=r
ir
!t
@NationallnslrunenlsCotpontion
ConpadqtqandLabvlErtl/
Fundanenta\CouseManuat
LessonI
.PGA'ptint2ation(optional)
D.Architecture
0ptimizations
Therearcseveralarchitecturc-related
FPGAoptimizations.
Theseatemore
advanced
operations
because
theyareapplication-specific
andalsorequire
a designphasepriorto implementation.
Therearesevenlimportant
conceptsto understand
beforeleamingmoreadvanced
optimization
techniques.
The mostimponantconceptis theenablechain.The enable
chainis additionallogicaddedtothe FPGAcodeto guarantee
thatdataflow
on the FPCA is consistent
with the LabVIEWdataflowparadigm.The
enablechainis a seriesof flip-flops,alsoknownasregisters,
thatrun in
parallelwith theactualflow of daraon rheblockdiagram.A flip-flop holds
a bit of dataandoutDutsthe dataon clock edses.
DalaflowwilhinlheFPGA
LabVIEWexecutes
codein a dataflowmanner.
Nodesexecutewhendatais
presenton all inputs.Whenthenodefinishesexecutiontheoutputsof the
nodepassdatato thenextnodedownstream.
Figure9-22showsanexample
of theFPGA hardware
requiredto implementa Booleanoperation.
Figure
9.22.A LabvlEW
NotVl andtheCorresponding
FPGA
Logic
thatlmplements
theNotVl
LabVIEW codeis transformed
into FPGA logicin threesections-logic,
synchronization,
andtheenablechain.Thelogic,shownin the upperthird
of Figure9-22conesponds
to the actualLabVIEWoperation.In this
example,theLabVIEWcodeis a Not functionandcorresponding
to an
inverterin theFPGAhardware.
The synchronization
registerisshownin the
middleofFigure9-22guarantees
thatdatais outputonly on risingedgesof
the clock. The final portion of FPGA codethat is generatedfrom the
LabVIEWcodeis theenablechain.The enablechainis an additional
registerthatonly outputson therisingedgeofthe clock.The enablechain
guarantees
thattheFPGA logicexecutes
in thesameorderdepictedon the
blockdiagram.
ConpacfilAandLabVlEW
Fundanentats
C1uEeManual
9-28
=9
F|
F
F
l--
f*
F
a
iil
E
ral
fA
-t
=D
iD
P
F3
=t
=r
;it
LessonI
FPGA
Aptinizaton(Apianat)
Dueto theenablechainoverhead,
eachlunctionor vl takesa minimumof
oneclock cycle.Somefunctions,suchasanaloginputoperations,
cantake
hundredsofclock cyclesdepending
uponthecomplexityof theoperation
andhardwarelimilations.
path.
A VI canrun only asfastasthe sumolthe itemsin a combinatorial
parallel
Oneadvantage
of usinganFPGA is thatcodecanrun in true
to
youcancreatecodein parallel,it is oftenbestto
anotheroperation.
Because
designcodesothatasmanyparalleloperations
cantakeplaceaspossible.
This usesthesameamountof FPGAspace,andcanincrease
theexecution
path
reducing
the
combinatodal
size.
speedby
Parallel
0peralions
Paralleloperations
area verypowerfulconceptin currentcomputer
In a standard
processor
parallel
architecture.
basedconfiguration,
programs
operations
arenottruly paraltel.In processor
bascdarchitectures,
runningon theprocessor
areslicedintomanyfragmentsandareinterleaved
The operatingsystemthendecides
with codefragments
ofotherprocesses.
thefragments
ofcode
whichprocesses
arethemostimportantandschedules
accordingly.
LabVIEW is oneofthe few programming
languages
thatnaturallylends
itselfto parallelprocessing
because
thecompilerlookstbr separate
sections
threadsasneeded.
By usingtheparallelnature
ofcode andcreatesseparate
of graphicalprogramming
andtheparallelimplementation
on theFPGA,
youcanseparate
yourcodeintodifferentsegments,
whichcanrunin parallel
andachievea fasterloopratethanwheneverythingis in oneprocess.
As you
developyourcode,startthinkingaboutlogicalplacesto createdifferent
segments
ofcode.
In LabVIEWFPGA,youcanconsiderseparate
loopsin yourtop-levelVl as
processor.
runningon theirowndedicated
Because
sepamteprccesses
FPGAhardwareallowsyou to executecodein trueparallel,parallel
on theFPGAaremoredeterministic
andcanrun at fasterloop
operations
architecture.
This is a great
rateswhencomparedto a processor-based
benefitfor safety-critical
applications
andcontrolapplications.
in LabVIEW placemultiple,independent
To createparalleloperations
While Loopson theblockdiagram.Figure9-23 showsparallelloopsfor
looprates.
acquiringanalogdataat independent
A National
lnsttunents
Coryaration
andLabVlEW
Fundanentzls
CoLtsetutanual
CanpattBlO
LessanI
FPGA
optiniration(1ptianat)
Figureg-23.TwoParailel
Lo0pswithDitferent
DataSample
Rates
Youmay losethebenefitsof paralleloperations
if you sharcresources
amongparallelloops.Memorytransfermechanisms,
suchasFIFOS,the
interruptline,va ables,andnon-rcentrant
VIs affecttheabilityof theFPGA
to executecodein trueparallel.
Anotheradvantage
ofrunningcodein parallelis thatit letssomesections
ol
coderun fasterthanothersections.
As shownin Figure9-24,onesetofcode
in a loop cansevrelylimit the speedofanotherpieceof code.In this
applicationtheanalogoutputruns35 timesslowerthanthe digitalinput.
Thiscanbecomepaticularlycriticalifthe codewerearangedsuchthatthe
digitalline corresponded
to an emergency
stopswitchandthe recog tion
ofthe response
hadto happenimmediately.
r,4odr/aor
t
@- .]Es!4!49{rl
Figure9-24.Vl Speed
LimitedbytheBateoftheSlower
A0 Functi0n
ConpactqloandLabVlEW
fundanentabCourse
Manual
9.34
ra,
F
F
F
F
F
LessonI
FPGA
Ap nization(Optional)
The top
In Figure9-25thecodeis brokeninto two paralleloperations.
rateofapproximately
sectionofcoderunsat thesameanalogoutput-limited
I MHz. However,the codein thebottomparallelloop canrun at a rateof
l0 MHz or l0 timesfaster.Oftenwhencodeis runningtoo slowlyit is
to preventunrelated
necessary
to separate
thecodeinto paralleloperations
from interferinswith oneanother.
orocesses
F
trl
g-25.Dl0N0Longer
Limited
bytheSlower
A0 Function
Figure
Pipelining
Techniques
is
Anotherimportanttechniquefor improvingFPGAperformance
pipelining.Pipeliningbrealcup codewithin a loop so thatoperations
are
perfomedin differentcyclesof the sameloop.The processofpipelining
codebeginswith identifyingcombinatoal pathsin your code.A
pathis a setof logicbetweentheoutputof oneregisterand
combinatorial
the inputof anotherregister.Becausedatain theregistersis updatedwith
between
everyrisingedgeof theclock,if therearetoo manyoperations
two registe$,a VI compilationmay fail dueto a timing error Figu'e9-26
pathinsidea single-cycle
TimedLoop.
showsanexampleof a combinatorial
L:q rc)t
Figure9-26.Vl withnoPipelining
lnstrunents
Coryontion
@National
ConDactBlo
andLabVlEW
fundanenhlsCou6eManuat
Lesson9
.PGAqptinization(A ionat)
Pipeliningshodensthe lengthbetweentheoutputandinputregistersof a
While Loop sothatyour VI meetstimingrequircments.
Youcanuseshift
registersto run portionsofyour combinatorial
pathin differentcyclesof
your loop.Pipeliningis especiallyimportantin singlecycleTimedLoops
wheretheentirepathis requiredto executein oneclockcycle.Figure9 27
illustratesa pipelinedversionofFigure 9-26.
Figure
9-27.UsePipelining
to Eliminate
Combinatorial
Path
Pipeliningincreases
systemlatencybecause
theinputofa functionis based
on theoutputof a previouscycleofthe loop.However,thelatency
disappears
whenthepipeis full. Afteronlyafew loopcycles,
pipelinedcode
is significantlymoreefficientthanidenticalcodein a normalloop.
Figure9-28illustrateslatencydueto pipeliring.
ClockCyce 1
ClockCycle3
--tqosl!,
_2LLoSq'
t-
_2 (!osj!J L_
lnPul .'}-\
L_
ourpLrl
=)(!os9-
_2LLoSq, r_
=)Gosjs)-
g-28.lncreased
Figure
Latency
Dueto Pipelining
After ClockCycle1,theoutputof subvl A is validandtheoutputof subvls
B andC areinvalid.After ClockCycle2, subvls A andB havevalidoutput
andsubVlC hasinvalidoutput.After theClockCycle3 andall subsequent
clockcycles,all outputwill be valid.
ConpactqloandLabVlEW
Fundanenlals
CowseManual
9-32
Fe
I
Fr
I
b
I
B
Lessans fPGAoptinizatian
(Optionat)
Feedback
Nodes
Feedback
Nodesareidenticalin functionalityto a shiftregister,andare
ofienpreferablefrom a userstandpoint
because
theylook similarto the
initial code.Figure9-29showsanexampleof usingFeedback
Nodes
insteadof shift registers.
t
t
=s
rrt
{
it
e
i,
=t
!3
a
?
Figure
9-29.VtwithFeedback
Nodes
Feedback
Nodesusea valuewiredto theinitializerterminalasthe initial
valuefor thefirst iterationor execution.
TheFeedback
Nodethenstoresthe
previousiterationresultfor eachsubsequent
execution.
Ifyou do not wire a
virluero rheinitiali,,er
lerminal.
lheFeedback
Nodeure. ihedelaulrvaluc
Ior thedatatypeandcontinuesbuildingon previousresultsin subsequent
execuuons.
Youcanusea Feedback
Nodeto implementa pipelineandreducelong
combinatorial
paths.Whenyou usetheFeedback
Nodeinsidea Case
structure,theFeedback
Nodeupdatesdataonly on clockcycleswhenthe
owningsubdiagram
executes.
TheFeedback
Nodeis implemented
asa registerandrequireslogic
resources
tn prcpoftionto thewidthofthe datatype.Usingtheinitialization
te.minalslightlyincreases
logic resourceusage.
Drawbacks
When you implementa pipeline,the output of{he final steplagsbehind the
inputby the numberofstepsin the pipeline and the output is invalid fbreach
clock cycle until the pipeJinefills. The numberofstepsin a pipeline is calied
the pipeline depth, and the latencyof a pipeline,measuredin clock cycles.
corespond\lo itc depth.Fora pipelineof depLhN. rheresultis invali; unlil
the Mh loop iteration,and the outputofeach valid loop iterationlass behind
the input by N I iterations.
e
a
3
A Nahonal
lnsltunentsCaeoralnn
9-33
ConpadRlAard LabVlEW
fmdanentatsCouseManuat
LessjnI
.PGAAptnizanon(Optonat)
Single-Cycle
TimedLoop
The single-cycle
TimedLoop is oneof themostpowerfulconstructs
in
FPGAprogramming.
Codeinsidethesingle-cycle
TimedLoop is more
optimized,takesup lessspaceon theFPGA,andexecutes
fasterthan
identicalcodein a srandard
While Loop.The single-cycle
TimedLoop
removes
theenablechainfromtheloopto savespaceontheFPGA.Because
all rcgistersarercmoved,all operations
in a single-cycle
TimedLoop can
completein a singleclockcycle.Furthermore,
eliminatingtheenablechain
overhead
reduces
the totalspaceusedon theFPGAbecause
theflip_flops
usedfor theenablechainarc no longerrequired.The single-cycle
Timed
Loop is a greattool for safety-critical
andcontrolapplications
wherefast
loopratesareimpofiant.
Figure9-30showsidenticalcodein a standard
WhileLoopandsingle_cycle
TimedLoop.The venicallinesindicatethe endof a clockcycle.The code
in theWhileLoop requiresfour clockcyclesto execute,in additionto two
clockcyclesof loop overhead.
Figureg-30.Single-Cycle
TimedLoopandWhjleLoopC0mparison
Becausethesingle-cycle
TimedLoop executes
in exactlyoneclock cycle,
theclockperiodmustbelongenoughtoallowall theoperations
to complete
in a singlecycle.The clock frequencycantechnicallybe from 2.5 to
conpactRt0
andLabvlEw
Fundanmtats
course
Manuat
9-U
LessonI
FPGA
ApinEation(oplional)
Figure
9-31.Single-Cycle
Timed
Loop
Used
toincrease
theSpeed
ina Porti0n
oftheCode
@ Nalianal
hslrunenlsCotpoation
Conpa.tRlqandLabVtEW
FundanentaE
Cou6eManual
Lesson9
.PGA0ptinizatian(Optional)
Combining
0ptimizalions
Combinalorial
Paths
pathis thepaththroughlogicbetweentheoutputof a
A combinatodal
registerandthe input of anotherregisteron an FPGA. A registerstoresdata
on anFPGA andupdatesthedataon therisingedgeofa clock.Long
pathstakemoretime to execute
combinatorial
andlimit themaximumclock
rateof theclockdomain.
pathsarepanicularlya problemin single-cycle
Longcombinatorial
Timed
Loopsbecause
the logic betweenthe inputregisterandtheoutputregister
mustexecutewithin oneperiodof theclockrateyou specify.ln the
single-cycle
TimedLoop,registenwithin andbetweencomponents
are
removed,
increasing
thelengthofthe combinatorial
pathbetweenregiste$.
If thecodein a combinatoal pathdoesnotexecutewithin a singleclock
cycle,LabVIEWrctumsa timing violationin the CompilationFailure
dialogbox.
Note DeeplynestedCasestructurcsalsocancauseLabVIEWto rctum a timing
violationin theCompilationFailuredialogbox.
path,first simplifythelogicasmuch
Toreducethelengthof a combinatorial
aspossible.Onceyou havereducedthe logicto its simplestform, you can
furtherreducethe lengthofa combinatorial
pathby dividingthelogic inro
discretestepsandpipeliningyour designin the single-cycle
TimedLoop.
ConDactBl9
andLabVlEW
fundanenkbCourse
Mantal
9-36
Lessan
I
(Ap anai)
FPGA
Aptinizatian
FPGA
Exercise
9-2 Archilectural
0ptimizalions
Goal
Usearchitectural
optimizations
on an FPCA Vl to reducetheapplication
sizeandincreaseapplicationspeed
Scenario
YouhavebeengivenanFPGAVI thatoutputsa tonedependent
on the
temperature
input from anNI 9211.Althoughthecodeis optimizedfor
speed,thetimingin theFPGAVI preventstheexpectedoutpul Designand
modifytheapplicationto createan audibleandchangingtonewhile still
performingthe sameactionsasodginallydesired.Useyour knowledgeof
architectural
optimizations
to optimizeboththespeedandsizeofthe FPGA
VI andcreatethe mostefficientcodepossible.
Design
TheFPGAVI generates
a sinewaveon AO0 ofthe NI9263 andchanges
the
frequencyofthe sinewaveby changingthetimingbetweenanalogoutput
Thesinewaveon AO0 createsa toneon the speakeron theSound
updates.
andVbrationSignalSimulatorwhen
connected
to AUDIO IN CHo with the
Speaker
tumedON.
The tempentureis readby readingtheThermocouple
valuesfrom AI0 on
t h eN I 9 2 1 l .
Calculatethetiming by scalingthe averagetempenturevaluesby the
empiricallyderivedscalingfactorsto generate
a time delaybetween
samples.
Thetime delayshouldbebetween2,000-120ticksofthe 40 MHz
timebase.
lmplementalion
L OpentheOptimizationproject.
Q
Open <Exercises>\conpactRlo
OoLim-zaLion OpLimiz-Lion
FundamenEaLs\
P r o p c L . l v p r o i 1 o uc r e a l e d
in Exercise9-1.
O Openthe FPGAOptimization- OriginalVL
In Exercise9-1,the codewasoptimizedfor size,but the speedof the
applicationwastoo slowbecause
theNI 921I hasa maximumreadraieof
14S/s.This causedthemaximumiateofthe entircloop to run at a rateof
14Hz, whichwasdrasticallylowerthanthe desiredrate.
@Natianal
lnstrunents
Coryorati1n
9-37
Conpa.tql0andLabVlEW
fundanentals
Cource
Manual
LessonI
fPGAoptinization(Opti,nal)
Pipe
lining
Youcouldreducethecombinatorial
pathby pipeliningthethermocouple
measurcment
andtakingtheaverageofthe p or four iterations.This
solutionshouldallow you to usetheparallelprocessing
natureof theFPGA
to executetheloop slightlyfaster
1. Pipelinetheblock diagmmasshownin Figure9-32.
g-32.FPGA
Figure
0ptimizati0n
Single
Loop
Pipeline
Block
0iagram
E Disconnect
theModl/TC0FPGAyO NodefromrhefirstAddVL
O Drag the left side of the associatedshift registerto expandit to four
elements.
tr
tr
Savethe Vl.
ConpattRl0andLabvtEW
Fundanentats
Cou.se
Manuat
9-38
Lessan
I
.PGAAptinizatian('ptianal)
l* tuc(esf ul ConpileReDort
gtarus
a.nF1I3rr.r
c.nFllarran
lEqr.
:u..Essjul
gnnnarv
Ur1I1:atron
llurLer
ol
Nunber .l
Iunbar
Nxnler .f
Nxn!r of
]tunber
NunlEr !l
NxalEr of
ltxrlEr
of
,1 lnFul
4.0tf
lUTs
EUFCIIIE
.l loced IUFC Ulls
Dafis
E:rarnal
IOES
.l loced
IoBs
UUlTltllLgs
RItiEl6s
SllcEs
9 ; Dur
9 6 .rr
.xr
3 nur
2 6 9 2. u r
ol
6f
ot
of
ol
16
)e.
3
33.a
1l
a.a
484
19r;
96
L 00 r l
trl
96
96
3l
14336 1B;
Figureg-33.FPGA
0ptimizat
0n SingleLoopPipelneSuccesslulC0mpile
Rep0rt
Comparedto the prior applicationpipclining incrcascdtbc nnmber
ol SLICES;howevcr,this applicationcar now run slishtlytastcr
becauseof thc dccreascdcombinatodal path.
However, if you lry to run this appiicationyou conlinue to seethe
sameproblemolthc NI 9211 runningat a ratcof only l4 S/s.
Although you can run a ftw clock ticks fastcrpcr itcration,that
imprcvement is impcrccptiblebecauseofthc slow acquisitionrate.
- sinqre
U r pcr oprim,/drion
I oop.vi
on0...,-. r] D
Figure
Pipelining
9-34.L11:
Single
L00pwlthout
Right:
S ngleLoopwithPipelining
Noticethatthe secondloop in Figure9 34 is slightlyf'aster.
lnslrunentsCaryotatian
Q National
9-39
Fundanenlals
CanpactBl0
andLabVlEW
CautleMenual
Lessan9
.PGAqptinization(Apional)
UseParallel
Loops
It shouldnowbe obviousthatif you leavetheNI 9211 acquisitionin rhe
sameloopastheNI 9263operations
youwill nevergetthespeedyou w.utt.
To getthespeedrequiredforyouranalogoutputyoumustputit in a separate
loopfrom theanaloginputoperations.
Whenyoucreateseparate
loops,youmustmakesurethatallthecoderclated
to theanalogoutputloopremainsin oneloopandthatthecoderelatedto the
thermocouple
inputis placedin a differentloop.
1. Createtheblockdiagramshownin Figure9-35by separating
the
application
into two loops.
Figure9-35. FPGA
0ptimization
Separate
LoopsBlockDiagram
0 Placea newWhileLoop andwire a Falseconstantto theconditional
rcrmnal.
Canpacnlo
andLabVlEW
tundanentats
Manual
Course
9-44
LessonI
(Optional)
FPOAOptinization
tr
RenameActual
Rate (r.rsec).
Loop
4. Wirethecodeasin Figure9-35.
El Press<Ctrl-B>to deleteanyremainingbrokenwires.
tr Savethe Vl.
5. Add interJoopcommunication.
In thecunentVI, thercis no wayto changetheloopspeedfor theoutput
Ioop.Thenewloop ratesarecalculated
ir theacquisitionloop, sowe
needameansofsharingtheinformationbetweenthetwo loops.Thebest
wayto sharethatdatafor our applicationwill be to usea LocalVariable
for theWait (Ticks)indicator
O Right-clicktheWait (Ticks)indicatorin theAcquisitionLoop and
selectCreate>LocalVariable.
Q Placethelocalvadablenextto theinputof theLoopTimerfunction
in the OutputLoop.
A Naanallnstrunents
Caryonton
ConoactBlq
andLabVtEW
fundanentalsCowseManual
Lesson9
B Savethl- VL
Nole In the interestoftime, do not compile this Vl.
Adlan.ed
starus
conlilarion
""' "''"'
-'..:1''i' '
succE.sfuL
^,
!:1:r11r:'!TTt:
l.q!.
rrrrrizarlon
Ilibei
6f 5l1ce FIlp Fl.!s:
lluiber of 4 lnDur ILITS:
I!frbe! of
Iuhbe!
Il hbe. of
Innbe. Bf
Nunbei
Nunber .f
nrnbe! oi
xlnbe. ot
"
l
2.671 ott at
4,018 out ol
IUFCUUXS
of loced BUIGIUXS
DCUS
Etternal
10ls
.l loced loEs
l4ulT1eli1os
!AuE16s
srICEs
3
1
1
96
96
6
oui
our
our
our
our
our
2e,612
28.672
Ei
.l
.f
of
of
.f
16
3
t2
434
96
96
9,
L4,
]
]
LZ
332
A,
l9z
Ttaz
6/.
*
___"1
hero l
Figure
9-36.FPGA
0ptimizations
Separate
L0ops
Successful
Compile
Report
This applicationuses138moreSLICESthanthe prior version,but
mostof thatis relatedto the additionalTick CountVIs alongwith
the additionalcoderequiredlor havingmultipleloops.
By separating
thecodeinto two differentloopsyou getvery
differentbehavjorfrom the application.
ConaactqloandLabVlEW
Fundanentals
Cjuae Manuat
942
Lessan
I
.PGAqptinizatian(qptionat)
Front
Panel
Separate
Loops
Figure
0-37.FPGA
0ptimization
Because
the OutputLoop is no longerlimitedby thespeedof the
NI 92I I acquisition,
thisis thefirst applicationyouhavecreatedthat
runstheOutputLoop quicklyenoughthatto meettheinitial design
ofproducinga toneon thespeakerof theSoundand
requirements
VibrationSignalSimulator.
But canyou do better?
TimedLoop
Single-Cycle
Onetechniquethatyou coulduseto incrcasethe executionspeedof the
applicationis to placesomeof theanalysisfunctions(Averagingand
asquickly
TimedLoopso thatthecodeexecutes
Scaling)in a single-cycle
aspossible.
becausethe analysisis in parallelwith theNI 9211
In this application,
TimedLoop is a bit morethanis really
acquisition,usinga single-cycle
the
thesizeof your applicationbecause
necessary,
but it candecrease
the
single-cycle
TimedLoop reducestheFPGA sizeby eliminating enable
chainoverhead.
lnstunentscotporalion
@National
9-43
Cotrse
Manual
andLabvlEhl
fundanenkls
Conpa.tRlo
(1ptonal)
Lessong FPGAoptinization
l. Placeanalysisfunctionsin a single-cycle
TimedLoop.
tr Modify theblockdiagramto resemble
Figure9 38.
Figure
9-38.FPGA
0ptirnization
Single"Cycle
Timed
LoopBlock
Diagram
TimedLoop aroundthe averaging
tr Add a single-cycle
andscaling
lunctionality.
Tip Presstl'le<Ctrl> buttonanddragtheWhileLoopto makeroomfor theTimeCLoop
beforeplacingit on theblockdiagramto avoidunusualshetchingof theblock diagram.
O Wire a Trueconstantto theconditionterminalof the single-cycle
TimedLoop.
Nole In the interestoftime, do not compilethis VI.
Ifyou compilethis VI you will seethe followingreport.
Fundanentah
CouseManual
ConpactRloandLabVtEW
F
P
P
F
P
F
F
F
F
F
a
!,
D
E8
a
=r
=r
LessonI
.
.
:
:
I
:
FPGA
9ptinization(0plional)
easier
'RErel
t6 tLe labvlElt lielp tor iore iDrornation
abour lesolving
clr.k
rhe HeIp burron ro display rLe labVIEIi IteI!
.dFllarron
elrols
Figure
9-39.FPGA
0ptimization
Singie-Cycle
Timed
Loop
Compilation
Failure
0ialog
Box
Theerroroccurredbecause
thecombinatorial
pathwastoo long for
the singlecycleTimedLoop to executewithin one40 MHz clock
cycle.
Is thereanythingyoucando to allowthesingle-cycle
TimedLoopto befast
enoughto compileappropriately?
Combining
0ptimizations
l. Pipelinethecodein the single-cycle
TimedLoop.
Oneof theprimaryreasonsthatcodewill not compilewithin a
single-cycleTimed Loop is becausethe combinatorialpath is too long
to completein one clock cycle. The bestway to get aroundthis is to use
pipelininglo allow for lasrercyclerimes.
EO
:l
EO
E3
||Er
i-
ANationdllnslrunpnlscorporahan
9-45
Conpactql9andLabVlEW
Fundanentats
Cau$eManual
Leesan
9
.PGAOptinirdtion(A ionrl)
g-40.FPGA
Figure
optimization
Single-Cycl
Timed
LoopPipelined
BlockDiagram
D Add a Feedback
Nodebetweentheoutputof the ScaleBy Power
of 2 functionandthe AvemgeTemperature
inputof theFPGA
ScalingVI.
D Right-clickthewire andselectInserb>AllPalettes>Structures>
FeedbackNodetr Savethe VI.
2. CompiletheVl.
B Click theRun buttonto compiletheFPGAOptimizationVI.
Nole Do not makeanychangesto theVI afteryou havebegunthecompileptocess.
Cjnpdctql0dndLabVlEW
Fundanentals
Cou.seManuat
946
F
F
p
p
p
p
Lesson
I
(1ptional)
FPGA
Optinization
b
b
f.-
Figure
9-41.FPGA
0ptimization
Single-Cycle
Timed
L00p
Pipelined
Successf
ulCompile
Report
b
b
0 Savethe Project.
Testing
l.
VerifythatAO0 is connected
to AUDIO IN CHo andthatthe Speaker
switchis setto ON.
tnstrunentsCorpaqtion
@ National
Conpactql0andLabVlEW
fundanentalsCooBeManual
.)
LessonI
FPAA2ptlniraton(Apionat)
tr
4 . S r o pr h eV l b y c l i c l i n gt h eA b o r l b u l l o n .
Challenge
As with any optimization you can always find additional ways to optimize
the application.Use some ofthe lbllowing techniquesto createan even
more highly optimized application.
I . Eliminate the Tick Count VIs and any luncLionsusedto de ve the Loop
Rates.
2. Use the optimizationssuggestedin the CraLlen?esecrionof
Exercise9-l and apply them to this application.
By implementingtheseoptimizationsyou can reducethe code evenfurther
to only useabout 2431 SLICESwhich is 382 SLICES(-37o) lessrhaneven
the most optimizedformat createdin the main part ofthe exercisewhile still
retainingthe sametiming chaGcteristicsas the desiredmodel.
Suhaary
Device Utrllzatlon
Suhaary
Iurbe! of EUFGUIIS
llurbe! .l E:terral
loEs
Nuiber ol loced IOES
llunber ol IUlTlSIlBs
llunler El RAUBl6s
l{unber ol SIICES
ClDcL Ratesr
Base clock
14336
Theoierical
u3:iDun
40 52r:349tfiz
Endol Exercise
9-2
ConpactRlo
andLabVlEW
Fundanentah
CouseManual
9-48
rlrrer
g:
3nd acculacvl
LessonI
.PGA0ptininti1n (qptionat)
E.Advanced
0plimizations
Advancedoptimizationtechniques
areavailablefor experienced
userswho
areveryfamiliarwith FPGAprogramming.
Major erors in theFPGAcan
resultif advanced
oDtimizations
aredoneincorlectlv.
0plimizing
Arbilralion
LabVIEWusesarbitrationto managesharedresources
on theFPGA.This
ensurcsthat
only onecalleraccesses
aresource
at anygiventime.Removing
arbitrationsavessignificantspaceon theFPGAandcanallowsomeFPCA
I/O functionsandFIFO operations
to executein oneclockcycle.Referto
Lesson'7,Win.lotrsPC l1orl, for moreinformationaboutarbitration.
A Nallonallnsx nenlsCaeo.zrion
9.49
ConpaclRl0
andLa,VIEWfundanentalsCousei|anual
LessonI
qPGA0ptinizatian(0ptianat)
SellReview:
0uiz
1. Whichof the followingareFPGA optimizationtechniques?
a. Eliminatearrayson thefront panel.
b. Decrease
lheblockdiagramsize.
c. Pipelinelargecombinatodal
paths.
d. Usethe ScaleBy Powerof 2 functionwith a controlon then input.
e. Replaceall loopswith single-cycle
TimedLoops.
2. How doesthesingle-cycle
TimedLoopcreatea smallerFPGAfootprint
andexecutewithin oneclocktick?
a. By usingotherVI logicfunctionswhentheyarenot in use.
b. By eliminatingtheenablechainoverhead.
c. By passingthedatato theRT controllerto prccess.
d. By skippingsomefurctionsandhavingincompletefunctionality.
9-51
ConpactqtoandLabVlEW
fundanentals
Cotnselllanual
Lesson
I
|PGA1ptinization(qptianal)
SelfReview:
0uizAnswers
l . Whichof the followingareFPGAoptimizationtechniques?
A Na onallnntunenlsCoqoralion
Manual
fundanentalsCourse
ConpactBl1andLabVlEW
LessonI
qptinizatian(0ptionai)
FPGA
Noles
Fundanentals
llanual
ConpaalBloandLabVlEW
Course
v54