Sie sind auf Seite 1von 49

Control Unit :

Hardwired vs. Microprogrammed


Approach

Dr Shankar Balachandran
Indian Institute of Technology Madras
shankar@cse.iitm.ernet.in
14 October 2006
Two Major Blocks in a CPU
 Datapath
 Adders, multipliers, dividers
 Shifters, Registers
 Anything that changes or stores data
 Control Unit
 Controlsthe data
 How data is stored?
 Where is it stored?
 When should data be available?
Control Unit
 Correct sequencing of control signals
 Much like human brain controlling various
parts of body
 Sequence and timing is the key
 Any aberration will result in wrong operation
A Simplified Control Unit
Fetch
Fetch Unit

Decode
Decode Unit

Control Unit Execute


Execution Unit

Write Back
Write Back Unit
A Possible Implementation

Mod-3
Counter 2 to 4
Decoder

CLK
Timing Diagram
CLK

Fetch

Decode

Execute

Write Back
Let’s Sample The Signals

1 0 0 0

0 1 0 0

0 0 1 0

0 0 0 1
Another Way to Generate Signals

1000
0100
0010
0001
Hardwired vs Microprogrammed
 Hardwired
 Use gates to generate signals
 Squeeze out the juice for performance
 Different logic styles possible

 Microprogrammed
 Store the control signals in the sequence
 Just read from the memory every clock cycle
A Model Computer
(Richard Eckert, SIGCSE Bulletin, Vol. 20, No. 3, September 1988)

IP 8 12 LA
PC Accumulator
LP EA
EP 12
12
S
12
LM MAR A
ALU
8 EU

R 12
RAM 12
W LB
Register B
12
LD 12 LI
MDR IR
ED EI
4

Bus Control
More Details
 L = Load IP
PC ACC
LA
EA
LP

 E = Copy to bus EP

S
LM MAR ALU A
 A,S = Add and Subtract EU

R RAM
 Sign bit to control unit W
B
LB

LD LI
 IP = Increment PC MDR IR
EI
ED

Bus Control
Active
Mnemonic Opcode Action Register Transfers Controls
1. MAR ←IR EI,LM
LDA 1 A←(Mem) 2. MDR ←M(MAR) R
Load
Accumulator 3. A ←MDR ED,LA

STA 1. MAR ←IR EI,LM


Store 2 (Mem) ←A 2.MDR ←A EA,LD
Accumulator
3. M(MAR) ← MDR W
ADD 3 A ←A+B 1. A←ALU(Add) A,EU,LA
SUB 4 A ←A-B 1. A←ALU(Sub) S,EU,LA
MBA 5 B ←A 1. B←A EA,LB
JMP 6 PC ←Mem 1. PC←IR EI,LP
JN 7 PC ←Mem 1. PC←IR if NF is set NF : EI,LP
If –ve flag
is set
HLT 8-15 Stop Clock

“Fetch” IR ←Next 1. MAR ←PC EP,LM


Instruction 2. MDR ←M(MAR) R
3. IR ← MDR ED,LI,IP
Hardwired Unit
CLK
IR Ring Counter
Opcode T5 T1

LDA
STA
ADD

Decoder
SUB Control
MBA Matrix
JMP

JN
Halt NF
Control Signals
Table with Sequencing
IP LP EP LM R W LD ED LI EI LA EA A S EU LB

Fetch T2 T0 T0 T1 T2 T2

LDA T3 T4 T5 T3 T5

STA T3 T5 T4 T3 T4

MBA T3 T3

ADD T3 T3 T3

SUB T3 T3 T3

JMP T3 T3

JN T3 T3
*F *F
IP = T2; R=T1+T4*LDA; LI=T2;
LP = T3*JMP+T3*JN*NF; W=T5* STA; A = T3*ADD;
EP = T0; LD = T4*STA; S = T3*SUB;
LM = T0+T3*LDA+T3*STA ED=T2+T5*LDA; …..
Control Matrix
 Implement using discrete gates
 Usually done using PLAs
 Large control matrices are implemented
hierarchically
 For speed
 A well known process and design flows
are widespread
An Alternate Implementation
MAP
4-bit
opcode CD
Starting &
1* +
IR Address NF
Generator 01

00
CLK
uPC

Map CD Meaning +1
32 x 24
1 * From IR
Control
Unconditional
Control ROM Store
0 0 Branch within
Microprogram
Jump Address
NF=0 =>
Increment Microinstruction
0 1
NF=1 => Register
Conditional Branch

HLT Control
Control Store
uInstruction
Instruction Op-Code Address Control Signals CD MAP HLT Addr. Of Next
00 0011000000000000 0 0 0 01
Fetch 0 01 0000100000000000 0 0 0 02
02 1000000110000000 0 1 0 XX
LDA 1 03 0001000001000000 0 0 0 04
04 0000100000000000 0 0 0 05
05 0000000100100000 0 0 0 00
STA 2 06 0001000001000000 0 0 0 07
07 0000001000010000 0 0 0 08
08 0000010000000000 0 0 0 00
ADD 3 09 0000000000101010 0 0 0 00
SUB 4 0A 0000000000100110 0 0 0 00
MBA 5 0B 0000000000010001 0 0 0 00
JMP 6 0C 0100000001000000 0 0 0 00
JN 7 0D 0000000000000000 1 0 0 0F
0E 0000000000000000 0 0 0 00
0F 0100000001000000 0 0 0 00
Expansion 8-E 10-1E
HLT F 1F 0000000000000000 0 0 1 XX
Control Word
Example 1 – MBA followed by I
P
L
P
E
P
L
M
R W L
D
E
D
L
I
E
I
L
A
E
A
A S E
U
L
B
ADD
00 0011000000000000 0 0 0 01
Fetch 0 01 0000100000000000 0 0 0 02
02 1000000110000000 0 1 0 XX 0B
09
LDA 1 03 0001000001000000 0 0 0 04
04 0000100000000000 0 0 0 05
05 0000000100100000 0 0 0 00
STA 2 06 0001000001000000 0 0 0 07
07 0000001000010000 0 0 0 08
08 0000010000000000 0 0 0 00
ADD 3 09 0000000000101010 0 0 0 00
SUB 4 0A 0000000000100110 0 0 0 00
MBA 5 0B 0000000000010001 0 0 0 00
JMP 6 0C 0100000001000000 0 0 0 00
JN 7 0D 0000000000000000 1 0 0 0F
0E 0000000000000000 0 0 0 00
0F 0100000001000000 0 0 0 00
Expansion 8-E 10-1E
HLT F 1F 0000000000000000 0 0 1 XX
Sequence for MBA,ADD
MOV B,A  1. MAR ←PC 0011000000000000

2. MDR ←M(MAR) 0000100000000000


3. IR ← MDR 1000000110000000
 B←A 0000000000010001

ADD  1. MAR ←PC 0011000000000000


2. MDR ←M(MAR) 0000100000000000
3. IR ← MDR 1000000110000000
 A←ALU(Add) 0000000000101010
I
P
L
P
E
P
L
M
R W L
D
E
D
L
I
E
I
L
A
E
A
A S E
U
L
B
Example 2 – JN with
CD Flag Set
00 0011000000000000 0 0 0 01
Fetch 0 01 0000100000000000 0 0 0 02
02 1000000110000000 0 1 0 XX 0D
LDA 1 03 0001000001000000 0 0 0 04
04 0000100000000000 0 0 0 05
05 0000000100100000 0 0 0 00
STA 2 06 0001000001000000 0 0 0 07
07 0000001000010000 0 0 0 08
08 0000010000000000 0 0 0 00
ADD 3 09 0000000000101010 0 0 0 00
SUB 4 0A 0000000000100110 0 0 0 00
MBA 5 0B 0000000000010001 0 0 0 00
JMP 6 0C 0100000001000000 0 0 0 00
JN 7 0D 0000000000000000 1 0 0 0F
0E 0000000000000000 0 0 0 00
0F 0100000001000000 0 0 0 00
Expansion 8-E 10-1E
HLT F 1F 0000000000000000 0 0 1 XX

If negative FLAG is set, jump to a new location by skipping to uInstruction at 0F


I
P
L
P
E
P
L
M
R W L
D
E
D
L
I
E
I
L
A
E
A
A S E
U
L
B
Example 3 – JN with
CD Flag Not Set
00 0011000000000000 0 0 0 01
Fetch 0 01 0000100000000000 0 0 0 02
02 1000000110000000 0 1 0 XX 0D
LDA 1 03 0001000001000000 0 0 0 04
04 0000100000000000 0 0 0 05
05 0000000100100000 0 0 0 00
STA 2 06 0001000001000000 0 0 0 07
07 0000001000010000 0 0 0 08
08 0000010000000000 0 0 0 00
ADD 3 09 0000000000101010 0 0 0 00
SUB 4 0A 0000000000100110 0 0 0 00
MBA 5 0B 0000000000010001 0 0 0 00
JMP 6 0C 0100000001000000 0 0 0 00
JN 7 0D 0000000000000000 1 0 0 0F
0E 0000000000000000 0 0 0 00
0F 0100000001000000 0 0 0 00
Expansion 8-E 10-1E
HLT F 1F 0000000000000000 0 0 1 XX
Let’s Review the
Microprogramming Model
 Store the microprogram in control store
 Fetch the instruction
 Get the set of control signals from the
control word
 Move the microinstruction address
 Lather, Rinse, Repeat
What is Microcode?
 Michael Slater's "Microprocessor Based Design" (pg.42):

Microcode tells the processor every detailed step


required to execute each machine language instruction.
Microcode is thus at an even more detailed level than
machine language, and in fact defines the machine
language. In a standard microprocessor, the microcode
is stored in a ROM or a programmable logic array (PLA)
that is part of the microprocessor chip and cannot be
modified by the user.'
Thought Experiment

 Why is the design a little clumsy?


 What can we do about it?
Reason for Clumsiness
 JN – Conditional Flag check
 Without any condition check, the whole
process is very smooth
 Solution – Avoid all conditional checks
Real Life
 A little American Football Story
 Theory vs. Practice
 In theory, there is no difference between
theory and practice
 In practice, theory and practice are two
different things altogether
 Live with condition checks
 Keep designs as clean as possible
A General Approach

Starting External Inputs


and Branch
IR Address Conditional Codes
Generator

uPC

Control Control Word


Store
Format of Microinstructions
 Pick yours
 Your choice is as best as your neighbor’s
 What we did :
 One bit position per control signal
 Order of the bits ?
 Don’t matter
 Can result in long microinstructions
 Not the number of microinstructions, but the width
A Note About Density
 Observe that only a few bits are set to 1
 Poor usage of bit space
 This scheme is called Horizontal
Microprogram
 Alternate Version : Encode the bits
 Vertical Microprogram
Vertical Microprogram
 Encode the bits by grouping similar
elements together
 General Idea :
 Group similar resources together
 There can be only one source or destination
register
 Some operations are mutually exclusive
 Read vs Write of memory
Design Issues
 Encoding reduces the bit-space
 But requires decoders
 Cost of decoder vs bit-space
 Usually decoder cost is very low
Another Idea
 Group concuurently active signals
 Every meaningful combination gets a code
 Complex decoder to interpret every code
Vertical vs Horizontal
 Horizontal
 Faster
 More area
 More common currently
 Cheap transistors
 Vertical
 Slower
 More microinstructions
Microsequencing
 Other ways to save on hardware
 Every instruction had its own
microprogram sequence
 Also, instructions have several addressing
modes
 Only the first few microinstructions differ
 Can we share microcode?
A Powerful Technique in Sharing
 Bit-ORing
 Example
 Two instructions share some microcode
 Eventually, must branch
 The default branch (one instruction’s) is X0
 The other branch is stored at X1
 Change the least significant bit(s?) to get a new address
 Compare that with :
 Having two conditional branches
 Store two fields, one for each branch
 Both very unclean
Thought Experiment :
 What if we provided explicit branch instead
of storing next field in our microprogram?
 Typical instruction set will need a lot of
branches
 Lot of time will be wasted on branching
A Pat on Our Back
 We provided explicit field for address
 Branch location is now data
 It is already saved

 Caution :
 Microinstruction can get very wide
 Solution :
 There is no free lunch.
Can we pipeline microfetch?
 A neat idea :
 Why wait till the current micro-op is over?
 Branch field gives next operation
 Get the next op
 Caveat :
 External inputs and status flags may change the order
 What about interrupts?
 They are going to follow you everywhere
 Should have a mechanism that can invalidate microcode
prefetch
 Similar to pipeline flush for instructions
 Commonly used
Historical Perspectives
 Hardwired Logic
 Popular before 60’s
 Only way people did it
 Popular now
 Speed Benefits
 Microprogram
 Popular in 70’s
 Memory was slower than CPU
 No on-chip cache
 Best way is to store the microcode
 Now – Depends on who you ask?
 Shades of gray :
 Extremes of spectrum are harder to find nowadays
Tools for Design
 Hardwired
 Any state machine optimizer
 Assigning states, minimizing tranisitions, races,
hazards,……..
 Microcoding
 Small ones can be in binary
 Large ones – Use microassembler
 Very useful debug tool
 Can use microassembler simultaneously with actual
hardware development
Hardwired vs Microcoding
 Hardwired units are faster and smaller
 Emulation is easy with microcoding
 Hardwired design is complex if large
 Bugs in hardwired design cannot be fixed
in field
 Hardwired control is not suited for loops
 Looping with microcode can be made as fast
Hardwired vs Microcode vs RISC
 RISC
 Simplerinstruction set
 Hardwired Implementation

 RISC instructions are like microcodes


 Instructions come from I-Cache instead of Control
Store
 Difference :
 Contentsare not fixed
 Advantage : Only load what you want on the I-Cache
 Keeps size smaller as compared to Control Stores
Microprogram vs Software
 Imagine Floating Point Division
 Solution 1 : Write in software
 Long process
 Error prone
 Many fetches repeatedly from memory for the given
sequence of operations
 Solution 2 : Microcode
 Long process too – but designer’s not programmers
 Relatively error free – more thorough design
 Requires many cycles but fetched and used locally
Emulation
 A very common use of microcoding
 IBM System/360
 32 bit architecture
 16-bit registers
 Secret :
 Most implementations were 8-bit
 Keep cost low
 Heavy microcoding
 Programmers oblivious
 In 1992, International Meta Systems (IMS) announced
the 3250
 Designed to emulate the x86, 68K, and 6502 architectures
 Uses customizable microcode, among other techniques
 Went bust, never released
Another Interesting Note
 Writable Control Store
 What if you, a programmer, can write your
own control store?
 Not a mad scientist thought
 Implemented in
 VAX 8800
 PDP-11/60
 IBM System/370
Current Trends
 Microcode Update
 Linux Utility - microcode_ctl
 Companion to IA32 microcode driver
 It decodes and sends new microcode to the kernel
driver to be uploaded to Intel IA32 processors
 Update is volatile – lost on reboots

 Microcode updates are also rolled into BIOS


updates typically
 Ready even before an OS is loaded
Intel Said…..
The Pentium(R) Pro processor and Pentium(R) II processor may
contain design defects or errors known as errata that may cause the
product to deviate from published specifications. Many times, the
effects of the errata can be avoided by implementing hardware or
software work-arounds, which are documented in the Pentium Pro
Processor Specification Update and the Pentium II Processor
Specification Update. Pentium Pro and Pentium II processors include a
feature called "reprogrammable microcode", which allows certain types
of errata to be worked around via microcode updates. The microcode
updates reside in the system BIOS and are loaded into the processor
by the system BIOS during the Power-On Self Test, or POST.
Current Trends
 Hyperthreading in P4
 A secondlogical CPU
 Complete state of the system in both CPUs

 Microcoding in P4
 Two pointers control flow independently
 Both processors share the ROM entries
 Access is alternated between the CPUs
Thank You

Das könnte Ihnen auch gefallen