Sie sind auf Seite 1von 22

ARM Implementation

Datapath Control unit (FSM)

2-phase non-overlapping clock scheme


Most ARMs o not operate on e ge-sensitive registers Instea the esign is !ase aroun 2-phase non-overlapping clocks "hich are generate internall# $rom a single clock signal Data movement is controlle !# passing the ata alternativel# through latches "hich are open uring phase 1 or latches uring phase 2
phase 1 phase 2 1 clock cycle
2

ARM atapath timing


Register rea
Register rea !uses % #namic& precharge uring phase 2 During phase 1 selecte registers ischarge the rea !uses "hich !ecome vali earl# in phase 1

Shi$t operation
secon operan passes through !arrel shi$ter

A'( operation
A'( has input latches "hich are open in phase 1& allo"ing the operan s to !egin com!ining in A'( as soon as the# are vali & !ut the# close at the en o$ phase 1 so that the phase 2 precharge oes not get through to the A'( A'( processes the operan s uring the phase 2& pro ucing the vali output to"ar s the en o$ the phase the result is latche in the estination register at the en o$ phase 2
3

ARM datapath timing (contd)


ALU operands latched phase 1 register read time shift time phase 2 read bus valid precharge invalidates shift out valid buses ALU time register write time

Minimum Datapath Delay = Register read time + Shifter Delay + ALU Delay + Register write set-up time + Phase 2 to phase

ALU out

non-o!erlap time
4

)he original ARM1 ripple-carr# a

er

Carr# logic* use CM+S A+I (An -+r-Invert) gate ,ven !its use circuit sho" !elo" + !its use the ual circuit "ith inverte inputs an outputs an A-D an +R gates s"appe aroun .orst case path* !out /2 gates long
A

sum

!in
5

ARM2 0-!it carr# look-ahea scheme


Carr# 1enerate (1) Carr# 2ropagate (2) Cout3/4 5Cin36472 8 1 (se A+I an alternate A-D9+R gates .orst case* : gates long
A#3$%& ) ( #3$%& 4'bit adder logic sum#3$%&

!out#3&

!in#%&
"

)he ARM2 A'( logic $or one result !it


A'( $unctions
ata operations (a & su!& 777) a ress computations $or memor# accesses !ranch target computations f s$ 5 %1 23 !it-"ise logical + operations bus 777

carry logic )

ALU bus ( +A bus

ARM2 A'( $unction co es

)he ARM; carr#-select a


Compute sums o$ various $iel s o$ the "or $or carr#-in o$ <ero an carr#-in o$ one Final result is selecte !# using the correct carr#-in value to control a multiple=er
a/b#3$%& . c

er scheme
a/b#31$2,&

./ .1 ./ .1 s s.1 mu0

mu0

mu0 sum#3$%& sum#*$4& sum#15$,& sum#31$1"&

"orst #ase$ %&log2'word width() gates long

+ote$ e careful1 2an'out on some of these gates is high so direct comparison with previous schemes is not applicable3 -

The ARM6 ALU organization


+ot easy to merge the arithmetic and logic functions 45 a separate logic unit runs in parallel with the adder/ and multiple0or selects the output
A operand latch invert A 9:; gates operand latc h 9:; gates invert

func tion

logic func tions

adder

! in ! 7

logic 8arithmetic

result mu0 <ero detec t result

+ 6

1%

ARM9 carry arbitration encoding


!arry arbitration adder
A 6 6 1 1 > 6 1 6 1 C 6 unkno"n unkno"n 1 u 6 1 1 1 v 6 6 6 1

11

The cross-bar switch barre shi!ter


=hifter delay is critical since it contributes directly to the datapath cycle time !ross'bar switch matri0 >32 0 32? (rinciple for 404 matri0
right 3 right 2 right 1 no shift in#3& in#2& in#1& in#%& left 1 left 2 left 3

out#%& out#1& out#2& out#3& 12

The cross-bar switch barre shi!ter (contd)


(recharged logic is used 45 each switch is a single +@:= transistor (recharging sets all outputs to logic %/ so those which are not connected to any input during switching remain at % giving the <ero filling reAuired by the shift semantics 2or rotate right/ the right shift diagonal is enabled . complementary shift left diagonal >e3 g3/ Bright 1C . Bleft 3C? Arithmetic shift right$ use sign'e0tension 45 separate logic is used to decode the shift amount and discharge those outputs appropriately

13

M" tip ier design


All ARM processors apart $rom the $irst protot#pe have inclu e har "are support $or integer multiplication7 )"o st#les o$ multiplier have !een use * ? +l er ARM cores inclu e lo"-cost multiplication har "are that supports onl# the /2-!it result multipl# an multipl#-accumulate instructions7 ? Recent ARM cores have high-per$ormance multiplication har "are an support the ;0-!it result multipl# an multipl#-accumulate instructions7
14

)he lo"-cost support uses the main atapath iterativel#& emplo#ing the !arrel shi$ter an A'( to generate a 2-!it pro uct in each clock c#cle7 ,arl#-termination logic stops the iterations "hen there are no more ones in the multipl# register7 )he multiplier emplo#s a mo i$ie >ooth@s algorithm to pro uce the 2-!it pro uct7 )his multiplication uses the e=isting shi$ter an A'(& the a itional har "are it reAuires is limite to a e icate t"o-!itsper-c#cle shi$t register $or the multiplier an a $e" gates $or the >ooth@s algorithm control logic7
15

#arry-propagate (a) and carry-sa$e (b) adder str"ct"res


>a? A

!out =

!in

!out

!in =

!out

!in =

!out

!in =

>b?

!out =

!in

!out

!in =

!out

!in =

!out

!in =

1"

ARM high-speed m" tip ier organization

initiali<a tion f or @LA

registers

;s 55 , bits8cycle ;m
rotate sum and carry , bits8cy cle

carry'save adders

partial sum partial carry ALU >add partials?

1*

ARM% register ce circ"it

write ALU bus A bus bus

read read A

1,

ARM register ban& ! oorp an

A bus read decoders bus read decoders 7dd 7ss ALU bus (! bus D+! bus (! register cells ALU bus A bus bus write decoders

1-

The ARM coprocessor inter!ace

!oprocessor architecture
=upport up to 1" coprocesors Each coprocessor can have up to 1" registers !oprocessor instructions
o Dnternal operations on coprocessor registers o Load8store registers from8to memory o @ove data to8from an A;@ register

The ARM coprocessor inter!ace

F A;@*GH@D interface
I cpi >from A;@ to all coprocessors?$ A;@ identifies a coprocessor instruction and wishes to e0ecute it I cpa >from coprocessors to A;@?$ coprocessor absent that there is no coprocessor present that is able to e0ecute the current instruction I cpb >from coprocessors to A;@?$ coprocessor busy/ cannot e0ecute the instruction yet

The ARM coprocessor inter!ace


A;@ decides not to e0ecute it >eg3 condition not satisfied?$ do not assert cpi/ and the instruction will be discarded A;@ decides to e0ecute it >assert cpi? but coprocessor absent >cpa active?$ A;@ takes the undefined instruction trap A;@ decides to e0ecute it >assert cpi? but coprocessor present >cpa inactive? and busy >cpb active?$ A;@ will busy'wait until cpb inactive/ stalling the instruction stream A;@ decides to e0ecute it >assert cpi?/ and the coprocessor accepts it >cpa and cpb inactive?/ both sides commit to complete the instruction

Das könnte Ihnen auch gefallen