Advanced Microprocessor Architecture

EC6013-ADVANCED
MICROPROCESSOR AND
MICROCONTROLLER
Objectives
 Study the fundamentals of microprocessor
architecture .
 Learn the advanced features in microprocessors and

microcontrollers.
 Study the Architecture of Various microcontrollers.
2
Syllabus
 UNIT I -HIGH PERFORMANCE CISC ARCHITECTURE –
PENTIUM 9
• CPU Architecture- Bus Operations – Pipelining – Branch prediction –
floating point unit- Operating Modes –Paging – Multitasking – Exception
and Interrupts – Instruction set – addressing modes – Programming the
Pentium processor.
• UNIT II-HIGH PERFORMANCE RISC ARCHITECTURE – ARM

9
• Arcon RISC Machine – Architectural Inheritance – Core &
Architectures - Registers – Pipeline - Interrupts – ARM organization -
ARM processor family – Co-processors - ARM instruction set- Thumb
Instruction set - Instruction cycle timings - The ARM Programmer‟s
model – ARM Development tools ARM Assembly Language
Programming - C programming – Optimizing ARM Assembly Code –
Optimized Primitives.
3
Syllabus
• UNIT III-ARM APPLICATION DEVELOPMENT (9)
• Introduction to DSP on ARM –FIR filter – IIR filter – Discrete fourier
transform – Exception handling – Interrupts – Interrupt handling schemes-
Firmware and bootloader – Embedded Operating systems – Integrated Development
Environment- STDIO Libraries – Peripheral Interface – Application of
ARMProcessor - Caches – Memory protection Units – Memory Management
units-Future ARM Technologies.
• UNIT IV - MOTOROLA 68HC11 MICROCONTROLLERS (9)

• Instruction set addressing modes – operating modes- Interrupt system- RTC-
Serial Communication Interface – A/D Converter PWM and UART.
•
• UNIT V - PIC MICROCONTROLLER (9)
• CPU Architecture – Instruction set – interrupts- Timers- I2C Interfacing –
UART- A/D Converter –PWM and introduction to C-Compilers. TOTAL: 45
PERIODS
4
Text Books
• [1] Andrew N.Sloss, Dominic Symes and Chris

Wright “ ARM System Developer‟s Guide :
Designing and Optimizing System Software” , First
edition, Morgan Kaufmann Publishers, 2004.
5
References
• 1.Steve Furber , “ARM System –On –Chip architecture”, Addision
Wesley, 2000.
• 2.Daniel Tabak , “Advanced Microprocessors”, Mc Graw Hill. Inc., 1995
• 3.James L. Antonakos , “ The Pentium Microprocessor”, Pearson

Education, 1997.
• 4.Gene .H.Miller, “Micro Computer Engineering”, Pearson Education ,
2003.
• 5.John .B.Peatman , “Design with PIC Microcontroller”, Prentice Hall,

1997.
• 6.James L.Antonakos, “An Introduction to the Intel family of

Microprocessors”, Pearson Education, 1999. 6
UNIT I -HIGH PERFORMANCE CISC ARCHITECTURE – PENTIUM 9
Objective
 Study the Architecture of Pentium processor
 Programming the pentium Processor.
7
• Overview
 Introduction to Pentium
 Pentium Architecture
 Addressing Modes
 Instruction Set
 Assembly Language Programming
 Bus Operations
 Pipelining
 Branch Prediction
 Exception and Interrupts
 Floating point unit
 Operating Modes.
 Paging and Multitasking. 8
INTRODUCTION
MICROPROCESSOR
• A microprocessor is a computer processor which incorporates the
functions of a computer's central processing unit (CPU) on a
single integrated circuit (IC),or at most a few integrated circuits.
• Microprocessor might only include an arithmetic logic unit (ALU) and
a control logic section. The ALU performs operations such as addition,
subtraction, and operations such as AND or OR.
MICROPROCESSOR
MICROCONTROLLER
• A microcontroller (or MCU, short for microcontroller unit) is a
small computer (SoC) on a single integrated circuit containing a
processor core, memory, and programmable input/output peripherals.
• Microcontrollers are used in automatically controlled products and
devices
BLOCK DIAGRAM
DIFFERENCE BETWEEN μP & μC
1. Microprocessor contains only a CPU. In contrast Microcontroller
contains few other components apart from CPU, which includes
RAM, ROM and other peripherals like ports, clock, timer, UART
(Universal Asynchronous Receiver Transmitter), ADC (Analog to
digital converter), DAC (Digital to analog converter), Drivers for LCD,
etc.,
2. Microprocessor can be considered as just the processor, while
microcontroller can be seen as a small computer which is
embedded on a single IC (Eg. 8051).
So to summarize, we can state the difference between both as:
“Microprocessor is present inside a Microcontroller”.
This is valid to some extent because:
Microcontroller = Microprocessor + Few Extra components

Then,
ADVANCED
MICROPROCESSOR &
MICROCONTROLLER
MEANS?
With added features like
• High memory capacity
• More number of I/O pins
• High performance
• More external interfacing options etc.,
Example for microprocessor:
Intel Pentium, Pentium-I, Pentium-II…,i3,i5,i7,8085 & 8086
Example for microcontroller:
8051,PIC, ARM, Arduino…
MICROPROCESSOR DEVELOPMENT CYCLE
INTEL MICROPROCESSOR DEVELOPMENT CYCLE
MICROCONTROLLER DEVELOPMENT CYCLE
Pentium is a brand used for a series of x86-compatible microprocessors produced by Intel since 1993.
In its current form, Pentium processors are considered entry-level products that Intel rates as "two stars", meaning that they are
above the low-end Atom and Celeron series but below the faster Core i3, i5 and i7.
• Pentium-branded processors
• P5 microarchitecture based
• Pentium
• P6 microarchitecture based
• Pentium Pro
• Pentium II
• Pentium III
• Netburst microarchitecture based
• Pentium 4
• Pentium D
• Pentium M microarchitecture based
• Pentium M
• Pentium Dual-Core
• Core microarchitecture based
• Pentium Dual-Core
• Pentium (2009)
CPU ARCHITECTURE
The two integer pipelines, the U pipeline and V pipeline are responsible
for executing the 80x86 instructions.
The floating point unit is included on the chip to execute mathematical
functions.
The Pentium communicates with the outside world Via 32 bit address bus
and 64 bit data bus.
An 8KB instruction cache is used to provide quick access to frequently
used instructions. When an instruction is not found in cache , it is read
from the external data bus and copy paste into the instruction cache for
future reference.
Branch target buffer and prefetch buffers: work together with instruction
cache to fetch instruction as fast as possible.
Prefetch buffers maintains the copy of next 32 bytes of prefetched
instruction code.
Branch prediction: Technique to maintain steady flow of instructions
into pipeline.
To support branch prediction, the branch target buffer maintains a
copy of instruction in a different parts of the program located at the
address called branch address.
Example:
CALL XYZ Branch target buffer stores the copy of the

memory location
A separate 8KB data cache stores a copy of the most frequently
accessed memory data.
The Pentium: A CISC Architecture
What is CISC?
• CISC stands for Complex Instruction Set Computer
• CISC takes its name from the very large number of instructions
(typically hundreds) and addressing modes.
History: CISC
• The first PC microprocessors developed were CISC chips, because all
the instructions the processor could execute were built into the chip.
• Memory was expensive in the early days of PCs, and CISC chips saved
memory because their programming could be fed directly into the
processor.
History: CISC
• CISC chips were improved mainly by adding more instructions to
the processor design. This also meant that programming
changed with new CISC designs. CISC designs grew complex and
somewhat bulky
Examples of CISC Processors
Examples of CISC processors are
• VAX
• PDP-11
• Motorola 68000 family
• Intel x86/Pentium CPU’s
Advantages of CISC
• CISC has varying lengths to reduce wasted space in memory.
• Has developed a process to manage power which adjusts clock speed
and voltage.
• Uses less instructions to perform similar instructions than RISC
Disadvantages of CISC
• CISC chips are relatively slow (compared to RISC chips) per instruction.
• CISC chips require many more transistors than comparable RISC
designs .
• Harder to pipeline using CISC architecture.
• Expensive to produce.
RISC vs CISC
• RISC puts a greater burden on the software. Software needs to
become more complex and Software developers need to write more
lines of code to perform similar tasks.
• But by doing this RISC architecture takes the burden away form the
hardware resulting in an increase in performance(mainly speed).
OPERATING MODES
Real mode and Protected mode
Real mode: The advanced microprocessors, including the Pentium,
simply operate like 8086 with associated 1MB memory. Real mode is
automatically selected upon power up. So Pentium boots up into DOS
operating system in real mode.
Protected mode: The full 4 GB of memory is available to the processor,
as are special privileged instruction and architectural goodies, including
multitasking, virtual memory addressing, memory management and
control over internal data and instruction cache. Writing program in
protected mode needs special knowledge.

Software model of Pentium
Software model of Pentium
Pentium Microprocessor: Registers
• Registers
– Registers are in the CPU and are referred to by specific
names
– Data registers
 Hold data for an operation to be performed
 There are 4 data registers (EAX,EBX, ECX, EDX)
 All are 32 bit wide.
 Lower 16 bit registers are called AX,BX,CX,DX.
 May be Split up into halves of 8 bits each.
– Address registers
 Hold the address of an instruction or data element
 Segment registers (CS, DS, ES, SS,FS,GS)
 Pointer registers (ESP, EBP, EIP)
 Index registers (ESI, EDI)
– Status register
 Keeps the current status of the processor
38
 The status register is called the FLAG register
Data Registers: EAX,EBX, ECX,EDX
• Instructions execute faster if the data is in a
register.(E---Stands for Extended)
• Data Registers are general purpose registers but
they also perform special functions
• AX, BX, CX, DX are the 16 bit data registers.
• Low and High bytes of the data registers can be
accessed separately
– AH, BH, CH, DH are the high bytes
– AL, BL, CL, DL are the low bytes
8086 Architecture (continued…) 39

 AX
– Accumulator Register
– Used in Arithmetic, Logic and Data Transfer instructions
– Used in Multiplication and Division operations
– Used in I/O operations
 BX
– Base Register
– Also serves as an address register
– Used in array operations
– Used in Table Lookup operations (XLAT)
 CX
– Count register
– Used as a Loop Counter
– Used in shift and rotate operations
 DX
– Data register
– Used in Multiplication and Division
– Also used in I/O operations
Pointer and Index Registers
 Contains the offset addresses of memory

locations
 Can also be used in Arithmetic and other
operations
 SP: Stack pointer
– Used with SS to access the stack segment
 BP: Base Pointer
– Primarily used to access data on the stack
– Can be used to access data in other segments

Pointer and Index Registers
 SI: Source Index register

– is required for some string operations
– SI is associated with the DS in string operations.
 DI: Destination Index register
– is also required for some string operations.
– DI is associated with the ES in string operations.
• The SI and the DI registers may also be used to

access data stored in arrays

Segment Registers - CS, DS, SS and ES
 CS: Code segment---Used during instruction fetches.
 DS:Data Segment---Used when reading or writing data.
 SS:stack Segment---During stack operations such as
subroutine calls and returns.
 ES:Extra Segment---Used for anything the Programmer
wishes.
 GS and FS:---Used for anything the Programmer wishes.

Segment Registers - CS, DS, SS and ES
 Are Address registers
 Stores the memory addresses of instructions and data
 Memory Organization
– 20 bit address line addresses 1 MB of memory
– Each byte in memory has a 20 bit address
– Addresses are expressed as 5 hex digits from 00000 -
FFFFF
– Problem: 20 bit addresses are TOO BIG to fit in 16 bit
registers!
– Solution: Memory Segment
 A segment number is a 16 bit number
 Segment numbers range from 0000 to FFFF
 Block of 64K (65,536) (i.e 216)consecutive memory bytes
 8086
Within a segment, Architecture
a particular (continued…)
memory 44
location is specified with
Segmented memory addressing:
Absolute Address = Four bit left shifted16-bit segment value
added to a 16-bit offset
1 MB Memory Space
F0000
E0000
5000:FFFF
D0000
C0000
B0000
Starting
A0000
Address
90000
of each 5000:025
80000
segment 0
70000
60000
50000 5000:000
40000 0
SegAddr:Offset
30000
20000
10000
00000

Physical Memory Address Generation
 The BIU has a dedicated adder for determining
Physical memory addresses
Offset Value or Effective address (16
bits)
Segment Register (16 bits) 0 0 0 0
Adder
Physical Address (20 Bits)

Physical Memory Address Generation
• Logical Address is specified as Segment:Offset
• Physical address is obtained by shifting the segment
address 4 bits to the left and adding the offset address
• Thus the physical address of the logical address A4FB:4872
is
A4FB0  1010 0100 1111 1011 0000
+ 4872  0100 1000 0111 0010
A9822 1001 1001 1000 0010 0010

Advantages of using Segment Registers
1. Even though addresses associated with the
instructions are 16 bits only, allows the memory
capacity to be 1MB
2. Permit a program and/or its data to be put into
different areas of memory each time the program is
executed.

Priority level of current
task Flags
current task is nested Carry flag

Overflow flag Parity flag
Direction flag
Interrupt enable Auxiliary flag
Trap flag Zero flag
6 - status flags
3 - control flags Sign flag
49
Flags
 Flags:
• - 32 bit flag register.
• -Used only in Protected mode.
 Status or Conditional flags:

– These are set according to the results of the arithmetic or
logic operations.
– Need not be altered by the user.
 Control flags:
– Used to control some operations of the MPU.
– These flags are to be set by the user, in order to achieve
some specific purposes.
Status or Conditional or Condition Code Flags
 CF (carry) Contains carry from leftmost bit
following arithmetic, also contains last bit from a
shift or rotate operation.
 PF (parity) Indicates the number of 1 bits that result
from an operation.(1=even)
 AF (auxiliary carry) Contains carry out of bit 3 into
bit 4 for specialized arithmetic (BCD).
 ZF (zero) Indicates when the result of arithmetic or
a comparison is zero. (1=yes)
 SF (sign) Contains the resulting sign of an
arithmetic operation (1=negative)
 OF (overflow) Indicates overflow of the leftmost bit
during arithmetic.8086 Architecture (continued…) 51
Control flags:
 DF (direction) Indicates left or right for moving or
comparing string data.
 IF (interrupt) Indicates whether external

interrupts are being processed or ignored.
 TF (trap) Permits operation of the processor in

single step mode.

Example
 Assume that the previous instruction performed the
following addition,
0010 0011 0100 0101 SF= 0 ZF= 0 AF= 0

0011 0010 0001 1001
0101 0101 0101 1110 PF= 0 CF= 0 OF= 0
0101 0100 0011 1001 SF= 1 ZF= 0 AF= 1

0100 0101 0110 1010
1001 1001 0101 0011 PF= 1 CF= 0 OF= 0

Addressing Modes
55
Addressing Modes
 Various methods used to access instruction operands is called
as Addressing Mode
 General Instruction Format

OPCODE Operand  Operand
 Operands may be contained in

 Registers,
 Memory
 I/O ports.
 Three basic modes of addressing are
 Immediate
 Register
 Memory
56
Addressing Modes
Example:
If CS=24F6h & IP=634Ah, show the;
1- The logical address
2- The offset address
3- The physical address
4- The lower range of the segment
Solution: 5- The upper range of the segment
1- The logical address is the CS: IP content which is: 24F6:634A

2- The offset address is the content of the IP register which is: 634A
3- The physical address:
57
Addressing Modes (continued...)
 Addressing modes - classified according to flow of
instruction execution
A. Sequential flow instructions
 Arithmetic
 Logical
 Data transfer
 Processor control
B. Control transfer instructions
 INT
 CALL
 RET
 JUMP
58
A. Sequential flow instructions
1. Implied Addressing mode
2. Immediate addressing mode
3. Direct addressing mode
4. Register addressing mode
5. Register Indirect addressing mode
6. Indexed addressing mode
7. Register Relative addressing mode
8. Based Indexed addressing mode
9. Relative Based Indexed addressing mode
B. Control transfer instructions
1. Intersegment Direct addressing mode
2. Intersegment Indirect addressing mode
3. Intra segment Direct addressing mode
4. Intra segment Indirect addressing mode
59
Sequential Flow Instructions
1. Implied Addressing - The data value/data

address is implicitly associated with the
instruction.
◦ AAA
◦ AAS
◦ AAM
◦ AAD
◦ DAA
◦ DAS
◦ XLAT
60
Sequential Flow Instructions
2. Immediate Addressing – Data / operand is part
of the instruction Destinatio
Source
n
16 Bit
• MOV AX, 25BF[ ;AX25BF H ] Data
• MOV AL, 8EH ; [ AL8E ] 8 Bit Data
3. Direct Addressing – Data is pointed by 16 bit

offset value specified in the instruction
• MOVEffective
AX, [5000H]
Addr = ;5000
PhyAddr = 10H*DS + 5000H
61
4. Register Addressing – Data is in the register
specified in the instruction
No PhyAddr, since data is in regr

• MOV BX, AX
•
•16 BIT Operand Registers - AX, BX, CX,DX, SI, DI, SP, BP
•
8 BIT Operand Registers - AL, AH, BL, BH, CL, CH, DL, DH
62
5.Register Indirect Addressing – Data is pointed by
the offset value in the register, specified in the
instruction
MOV AX, [BX] Default Segment - DS or ES

Offset – BX or SI or DI
DS BX
PhyAddr = 10H *
ES
+ SI
DI
If DS=5000H; BX=10FF;
Then EffectiveAddr = 10FF
and PhyAddr = 10H*5000H + 10FFH = 510FFH
63
6.Indexed Addressing
Data is pointed by the offset in the index register specified
in the instruction
DS is the default segment register for SI and DI
MOV AX, [SI] Data is available in the logical

address [DS:SI]
Effective Addr = [SI]

SI
PhyAddr = 10H * DS + DI
64
7.Register Relative Addressing
Data is pointed by the sum of 8 bit or 16 bit displacement
specified in the instruction plus
Offset specified in the registers –BX, BP, SI, DI
Default segment registers – DS, ES
MOV AX, 50H [BX]
EffectiveAddr = 50H+[BX]
BX
DS BP
PhyAddr = 10H * ES + SI
DI
65
8.Based Indexed Addressing
Data is pointed by content of base register specified in the
instruction plus
Content of index register specified in the instruction
MOV AX, [BX] [SI]

BX SI
EffectiveAddr = BP + DI
DS BX SI
PhyAddr = 10H * ES
+ BP
+ DI
66
9.Register Relative Addressing
Data is pointed by the sum of 8 bit or 16 bit displacement
specified in the instruction plus
Offset specified in the base registers –BX, BP plus
Offset specified in the index registers – SI, DI
8 bit BX SI
EffectiveAddr = 16 bit + BP + DI
DS 8 bit BX SI
PhyAddr = 10H * ES +
16 bit
+ BP + DI
67
BUS OPERATION
• The Pentium processor perform a number of different operations over
its address and data buses.
• Data transfer, Interrupt acknowledgement, Inquire cycle for examining
the internal code and data cache, and I/O operations.
Decoding a bus cycle:
The Pentium bus logic indicates the type of bus cycle, currently with
the use of its cycle definition signals.
The signals are M/IO,D/C,W/R,CACHE,KEN
• Special bus cycle requires additional decoding and use the byte
enable outputs for selection.
Bus cycle states:
• There are six possible states the Pentium bus may be in, depending on
what type of cycle is being processed.
• The states are Ti,T1,T2,T12,T2P,TD.
• Ti: This is the bus idle state. In this state, no bus cycles are being run.
The processor may or may not be driving the address and status pins
• T1: This is the first clock of a bus cycle. Valid address and status are
driven out
• T2: This is the second and subsequent clock of the first outstanding
bus cycle. In state T2, data is driven out (if the cycle is a write), or data
is expected
• T12: This state indicates there are two outstanding bus cycles, and
that the processor is starting the second bus cycle at the same time
that data is being transferred for the first. In T12, the processor drives
the address and status
• T2P: This state indicates there are two outstanding bus cycles, and
that both are in their second and subsequent clocks. In T2P, data is
being transferred
• TD: This state indicates there is one outstanding bus cycle, that its
address, status already been driven sometime in the past (in state
T12) (DEAD LOCK TIME)
Processor bus control state machine:
0: No bus cycle requested
1: New bus cycle started. ADS is taken low.
2: Second clock cycle of current bus cycle.
3: Stay in T2 until BDRY is active or new bus cycle is
requested
4: Go back to T1 if a new request is pending.
5: Bus cycle complete; go back to idle state.
6: Begin second bus cycle
7: Current cycle is finished and no dead clock is needed.
8: A dead clock is needed after the current cycle is finished.
9: Go to T2P to transfer data
10: Wait in T2P until data is transferred.
11: Current cycle is finished and no dead clock is needed.
12: A dead clock is needed after the current cycle is finished.
13: Begin a pipelined bus cycle if NA is active
14: No new bus cycle is pending
SINGLE TRANSFER CYCLE:
This cycle transfers up to 8 bytes of non cacheable data between processor
and memory.
 The cycle begins during clock cycle T1, when ADS goes low CACHE is taken
high to indicate to external circuitry that the data is not going to, or coming
from the internal cycle.
If BDRY goes low during the T2 clock cycle, the data will be transferred and
operation completes during clock cycle Ti.
If BDRY is not low during T2, addition T2 clock cycle are generated, these
extra clock cycle are called WAIT CYCLE.
BURST CYCLE:
Supports burst read and write of 32 bytes.
The cache uses burst cycle for line load and write back.
During a burst operation, a new eight byte chunk can be transferred every clock
cycle.
LOCKED OPERATION:
Many operating systems processes depend on what is called atomic access to data
stored in memory.
An atomic operation cannot be broken down into smaller sub-operations.
The data accessed during the atomic operation often comes in the form of a
semaphore.(uninterruptable operation).
Example:
XCHG instruction
BOFF:
 The BOFF input provides a way for other processors in a multiprocessor
system to instantly take over the Pentium buses.
BOFF low put bus into high impedance state and allows the other
processor to use bus.
BOFF high allows the Pentium to use bus(interrupts the process in
between if BOFF goes high)
BUS HOLD:
The HOLD input provides a second way for a different bus master to take
control of the Pentium’s buses.
 Unlike BOFF, HOLD completed the current bus cycle.
INTERRUPT ACKNOWLEDGE:
The processor runs two interrupts acknowledge cycles in response to an INTR
request. Both cycles are locked.
 To maintain hardware compatibility with earlier 80x86 machines, the data is
ignored by the processor during the first interrupt acknowledge and accept during
the second acknowledge.
SHUTDOWN:
If the Pentium detects an internal parity error, a shutdown cycle is run. Execution is
suspended while in shutdown.
Until the processor receives an NMI,INIT or RESET request.
HALT:
Similar to shutdown, except that the INTR signal may also be used to resume
execution.
PIPELINED CYCLE:
It process the second cycle before the current one is completed. It does so
through pipelined read and write logic. In response to a request on NA
input.
INQUIRE CYCLE:
 Maintain cache coherency in a multiprocessor system. The Pentium
processor is able to watch the system bus in multiprocessor system. This is
called BUS SNOOPING.
 If the Pentium detects a memory read/write operation being performed
by another CPU, it runs an internal inquire cycle to determine whether the
address in the bus is stored in one of its internal caches. If so, the cache
may need to be updated.
PIPELINING
Integer Pipeline
Integer Pipeline
• The pipelines are called “u” and “v” pipes.
• The u-pipe can execute any instruction, while the v-pipe can execute
“simple” instructions as defined in the “Instruction Pairing Rules”.
• When instructions are paired, the instruction issued to the v-pipe is
always the next sequential instruction after the one issued to
u-pipe.
Integer Instruction
Pairing Rules
Integer Instruction
Pairing Rules
• To issue two instructions simultaneously they must
satisfy the following conditions:
• Both instructions in the pair must be “simple”.
• There must be no read-after-write(RAW) or write-after-
write register(WAW) dependencies
RAW:
i1. R2  R1 + R3
i2. R4  R2 + R3
WAW:
i1. R2  R4 + R7
i2. R2  R1 + R3
• The following integer instructions are considered simple
and may be paired:
1. mov reg, reg/mem/imm
2. mov mem, reg/imm
3. alu reg, reg/mem/imm
4. alu mem, reg/imm
5. inc reg/mem
6. dec reg/mem
7. push reg/mem
8. pop reg
9. lea reg,mem
10. jmp/call/jcc near
11. nop
12. test reg, reg/mem
13. test acc, imm
Instruction Issue Algorithm
• Decode the two consecutive instructions I1 and I2
• If the following are all true
– I1 and I2 are simple instructions
– I1 is not a jump instruction
– Destination of I1 is not a source of I2
– Destination of I1 is not a destination of I2
• Then issue I1 to u pipeline and I2 to v pipeline
• Else issue I1 to u pipeline
PIPELINE STAGES:
• Prefetch. During Prefetch, the next instruction to be executed is copied
from cache memory to the CPU.
• Instruction Decode, Part 1
• Instruction Decode, Part 2
• Execution.
• Write Back. Registers and memory locations are updated.
Integer Pipeline
• The integer pipeline stages are as follows:
1. Prefetch(PF) :
– Instructions are prefetched from the on-chip instruction
cache or memory.
2. Decode1(D1):
– Two parallel decoders attempt to decode and issue the
next two sequential instructions
– It determines the current pair of instruction can execute
together.
Integer Pipeline
3. Decode2(D2):
• Decodes the control word
• Address of memory resident operands are calculated
4. Execute (EX):
• The instruction is executed in ALU
• Data cache is accessed at this stage
• For both ALU and data cache access requires more than one
clock.
5. Writeback(WB):
• The CPU stores the result and updates the flags
C1 C2 C3 C4 C5 C6 C7 C8 C9
Pipeline Stalls:
 When paired instruction reach the EX stage, it is possible that one or other will stall and require
additional cycles to execute. A pipeline stall lowers performance, since no work is done during stall
 Instruction stall for various reasons, most notably when their operands are not available in data
cache.
 If the instruction in the U pipeline stalls, then V-pipeline does the same.
 If the V pipeline stalls, the instruction in the U-pipeline may continue executing. Both instructions
must process to the WB stage before another pair may enter the EX stage.
Branch Prediction
Logic
Flushing of pipeline problem
• Performance gain through pipelining can be reduced
by the presence of program transfer instructions
(such as JMP,CALL,RET and conditional jumps).
• They change the sequence causing all the instructions
that entered the pipeline after program transfer
instruction invalid.
• Suppose instruction I3 is a conditional jump to I50 at
some other address(target address), then the
instructions that entered after I3 is invalid and new
sequence beginning with I50 need to be loaded in.
• This causes bubbles in pipeline, where no work is
done as the pipeline stages are reloaded.
• To avoid this problem, the Pentium uses a scheme
called Dynamic Branch Prediction.
• In this scheme, a prediction is made concerning the
branch instruction currently in pipeline.
• Prediction will be either taken or not taken.
• If the prediction turns out to be true, the pipeline will
not be flushed and no clock cycles will be lost.
• If the prediction turns out to be false, the pipeline is flushed and
started over with the correct instruction.
• It results in a 3 cycle penalty if the branch is executed in the u-
pipeline and 4 cycle penalty in v-pipeline.
Dynamic Branch Prediction
Mechanism
• It is implemented using a 4-way set associative cache with 256
entries. This is referred to as the Branch Target Buffer(BTB).
• The directory entry for each line contains the following
information:
• Valid Bit : Indicates whether or not the entry is in use
• History Bits: track how often the branch has been taken
• Source memory address that the branch instruction was fetched from
(address of I3)
If its directory entry is valid, the target address of the branch is stored in

corresponding data entry in BTB
Mechanism
• The first time that a branch instruction enters either pipeline, the BTB
uses its source memory address to perform a lookup in the cache.
• Since the instruction has not been seen before, this results in a BTB
miss.
Mechanism
• It means the prediction logic has no history on
instruction.
• It then predicts that the branch will not be taken and
program flow is altered.
• Even unconditional jumps will be predicted as not
taken the first time that they are seen by BTB.
Mechanism
• When the instruction reaches the execution stage, the branch
will be either taken or not taken.
• If taken, the next instruction to be executed should be the one
fetched from branch target address.
• If not taken, the next instruction is the next sequential memory
address.
Mechanism
• When the branch is taken for the first time, the execution unit
provides feedback to the branch prediction logic.
• The branch target address is sent back and recorded in BTB.
• A directory entry is made containing the source memory
address and history bits set as strongly taken
Mechanism
Strongly Weakly
Taken Taken
Strongly Weakly
Not Not
Taken Taken
Mechanism
History Resulting Prediction If branch is If branch is
Bits Description Made taken not taken
11 Strongly Branch Remains
Strongly
Downgrades to
Weakly Taken
Taken Taken
Taken
10 Weakly Branch Upgrades to Downgrades to
Taken Taken Strongly Weakly Not
Taken Taken
01 Weakly Not Branch Not Upgrades to Downgrades to
Taken Taken Weakly Taken Strongly Not
Taken
00 Strongly Not Branch Not Upgrades to Remains
Taken Taken Weakly Not Strongly Not
Taken Taken
FLOATING POINT UNIT(FPU)
Floating-Point Pipeline
• The floating point pipeline has 8 stages as follows:
1. Prefetch(PF) :
– Instructions are prefetched from the on-chip instruction
cache
2. Instruction Decode(D1):
– Two parallel decoders attempt to decode and issue the
next two sequential instructions
– It decodes the instruction to generate a control word
3. Address Generate (D2):
• Decodes the control word
• Address of memory resident operands are calculated
4. Memory and Register Read (Execution Stage) (EX):
• Register read, memory read or memory write performed
as required by the instruction to access an operand.
5. Floating Point Execution Stage 1(X1):
• Information from register or memory is written into FP
register.
• Data is converted to floating point format before being
loaded into the floating point unit
6. Floating Point Execution Stage 2(X2):
• Floating point operation performed within floating point
unit.
7. Write FP Result (WF):
• Floating point results are rounded and the result is
written to the target floating point register.
8. Error Reporting(ER)
• If an error is detected, an error reporting stage is entered
where the error is reported and FPU status word is
updated
Instruction Issue for Floating
Point Unit
• The rules of how floating-point (FP) instructions get issued
on the Pentium processor are :
1. FP instructions do not get paired with integer instructions.
2. When a pair of FP instructions is issued to the FPU, only
the FXCH instruction can be the second instruction of the
pair.
The first instruction of the pair must be one of a set F where F =
[ FLD,FADD, FSUB, FMUL, FDIV, FCOM, FUCOM, FTST, FABS,
FCHS].
3. FP instructions other than FXCH and instructions
belonging to set F, always get issued singly to the FPU.
4. FP instructions that are not directly followed by an FXCH
instruction are issued singly to the FPU.
Bypass1
Floating –point
registers
ST(0)
Read port 1 Write port 1 X1
Ex 80 bits
Read port 2 Write port 2
ST(7)
WF
Bypass2
FPU Register File
PAGING
Paging
• The Pentium supports translation of virtual (linear) addresses into physical addresses
through the use of special tables that map portions of the virtual address into actual
physical memory locations.
• Physical memory is divided into fixed-size page frames of 4KB each.
• Paging is controlled by three flags in the processor’s control registers:

• Paging is enabled by making PG = 1 in CR0 register (required in
multitasking in virtual 8086 model)
• In Pentium no bit mode to disable segmentation
• PSE (page size extensions) flag, bit 4 of CR4. { set => page size 2MB or 4MB
• PAE (physical address extension) flag, bit 5 of CR4).
Paging
• Page directory—An array of 32-bit page-directory entries contained in a 4-KByte page. Up to 1024 page-
directory entries can be held in a page directory.
• Page table—An array of 32-bit page-table entries contained in a 4-KByte page. Up to 1024 page-table
entries can be held in a page table. (Page tables are not used for 2-MByte or 4-MByte pages. These page
sizes are mapped directly from one or more page directories.)
• Page—A 4-KByte, 2-MByte, or 4-MByte flat address space.
Paging
32-bit virtual (linear) addresses generated by a running task select entries in the
systems page directory and page table, which translate the upper 20 bits of the virtual
address into the actual physical address where a page frame is located.
The lower 12 bits of the virtual address are not translated and point to one of 4,096
byte locations within a page frame.
• How is a 32-bit virtual address translated into a physical address?
• The upper 10 bits of the virtual address select one of 1,024 entries in the page directory.
• The base address of the page directory is stored in the page directory base register (PDBR).
• Each entry in the page directory is 4 bytes wide and contains the base address of a page table.
• The next 10 bits from the virtual address select one of 1,024 entries in the page table pointed to by
the page directory entry.
• This entry is also 4 bytes wide and contains the base address of the actual physical memory page
frame.
• This address is combined with the lower 12 bits of the virtual address to access the desired location
in memory.
Paging
Displacement or
Offset
PDE & PTE format 31 – 12( PT Address) 11- 0 ( control & status flags

Paging
Translation lookaside buffers(TLBs)
• To improve the performance, the internal instruction and data cache
of the Pentium contain small, special caches called TLBs that
automatically translate the upper 20 bits of the virtual address into
upper 20 bits of physical address.
• So it requires only one clock cycle to process.
• TLBs contains only the address of the most recently used pages.
• If the required translation is not available in TLB, then the processor
access the page directory and page table from RAM and store it TLBs.
• Prior to doing this it may be necessary to invalidate the contents in
TLBs.
PDE:
Page frame address(12-31) Avail. 0 0 0 A PCD PWT U W F
PTE:
Page frame address(12-31) Avail. 0 0 0 A PCD PWT U W F
• D-Dirty bit: It is set if a write has been performed to the page pointed by
PTE.
• A-Accessed: It is set if a read or write was performed to the page selected
by the PTE and PDE.
• PCD-cache disable: This bit determines whether the current memory
accessed is cache.
• PWT-Writethrough: This bit enables writethrough operations between
cache and memory.
• U-user: This bit is set when performing protection check in memory
• W-writable: This bit determines whether page may be written to and is
also used in protection checks
• P-Present: This page indicates page is actually stored in memory. If new
page is needed, processor creates one and updates TLBs.
Paging
Summary….
• Page translation allows the physical memory used by a system to be much
smaller than the linear addressing space.
• For instance, the Pentium’s 4GB linear addressing space may be mapped to a
physical memory of only 512MB.
• The pages used by a program do not need to be stored consecutively.
• A program’s code and data may be spread out all over physical memory, and
even moved around (with help from the hard disk) while the program is
executing!
• This helps to explain why the linear addresses are also called virtual addresses,
since they have no relation to the actual physical memory address used, except
for the lower 12 bits.
MULTITASKING
Multitasking VS Multithreading
• Tasks are like jobs. So, multi tasking means doing multiple jobs at the
same time.
• Threads run within a process or task. So, multi threading means many
sub tasks being done within a main task.
• Like, using Microsoft word and PowerPoint is multi tasking. while
typing and using the grammar and spell check means you are running
2 threads within Microsoft word.
MULTITASKING
• Ability to support execution of multiple programs ( Tasks)
simultaneously
• Actually one program is running at one point in time, but the ability to
switch the Task to Task at very high speed gives the impression of
multitasking
The processor defines four data structures for handling task related
activities:
Task state segment (TSS).
TSS descriptor.
Task register
Task gate descriptor.
• Each task executes for a period of time called TIME SLICE.
• TASK SWITCH is used to switch from one task to another task. But
rapidly switching from task to task gives the impression that all tasks
are running at the same time.
1.Task State Segment:
During the task switch, the contents of all processor register, as well
as information saved for the task being suspended and new
information is loaded for the next task.
This information is not saved on the stack, but saved on special
memory structure called the TASK STATE SEGMANT(TSS)
It contains storage areas for all of Pentiums Registers, segment
selectors, stack pointers
 When a task is created, the task’s
LDR, PDBR, Protection level stack,
T-bit, I/O permission map bit are
filled in.
 During the task switch, these items
are not altered. Only the register
portion EIP to GS is modified
during task switching.
2. TSS-Defines the various characteristics of the segments exhibits. TSS utilizes this descriptor.
descriptor
• B – task is currently running or waiting to run.
• P – segment is in memory or not ( sometimes suspended if page fault
occurs)
• G- determines how the limit field is interpreted.
Clear-segment size from 1 byte to 1MB.
Set- Segment size from 4KB to 4GB(in chunks of 4KB)
• If the segment is available for use then AVL bit will be set.
• DPL- indicates privilege level of the segment and is used in protection
check.
3. Task Register (TR)
1. The task register holds the 16-bit segment selector, 32-bit base address, 32-bit
segment limit, and descriptor attributes for the TSS of the current task
2. The TSS actually in use is accessed through TR (using STR and LTR commands)
TSS descriptor may only be loaded into the
GDT(global descriptor table). When
multiple TSS is stored in GDT. The currently
in use is accessed through the use of TR
• The task register may be loaded with a new TSS selector with the
LTR(Load Task Register) instruction. LTR requires a 16-bit register or
memory operand and may only executed in protected mode.
4. Task Gate Descriptor
1. A task switch may results in a privilege violation if the new task has a
lower priority then the current executing task. Task Gate provides a way
to facilitate task switching.
2.A task gate descriptor provides an indirect, protected reference to a
task. A task gate descriptor can be placed in the GDT or LDT.
3. It allows a single busy bit to be used for a segment ( contained in TSS
descriptor)
4. By this approach it safe guards the processor in facilitating
Multitasking using DPL and Busy bit.

TASK SWITCH
• The following steps take place during task switch
 The new TSS descriptor or task gate must have sufficient privilege
to allow a task switch.
 The new TSS descriptor must have its present bit set.
 The state of current task is saved.
 The task register is loaded with the selector of the new TSS
descriptor
 The state of the new task is loaded from its TSS and execution is
resumed.
TASK ADDRESSING SPACE
• If paging is not enabled, the linear addresses generated by a task are
the same as the physical addresses sent to the memory system.
• When paging is enabled it is possible for each task to have its own
separate, protected addressing space, through the use of PDBR(Page
Directory Base Register) stored in TSS.
INTERRUPTS AND EXCEPTIONS
• Interrupts typically occur at random times during the execution of a program,
in response to signals from hardware. They are used to handle events
external to the processor, such as requests to service peripheral devices.
• Software can also generate interrupts by executing the INT n instruction.
• Exceptions occur when the processor detects an error condition while
executing an instruction, such as division by zero.
• The processor detects a variety of error conditions including protection
violations, page faults, and internal machine faults.
• When an interrupt is received or an exception is detected, the
currently running procedure or task is automatically suspended while
the processor executes an interrupt or exception handler.
• When execution of the handler is complete, the processor resumes
execution of the interrupted procedure or task. The resumption of the
interrupted procedure or task happens without loss of program
continuity
INTERRUPTS
• Non- maskable interrupts (NMIs). These interrupts are received on
the processor’s NMI# input pin. The processor does not provide a
mechanism to prevent nonmaskable interrupts.
• Maskable interrupts. These interrupts are received either at the
processor's INTR# (interrupt) pin from an external, system-based
interrupt controller (8259A) or as a serial message on the LINT[1:0]
pins from a system-based I/O APIC. The processor does not act on
maskable interrupts unless the IF (interrupt-enable) flag in the EFLAGS
register is set.
• Software-generated interrupts. These are generated by INT n
instruction. The processor does not provide a mechanism for masking
interrupts generated in this manner.
EXCEPTIONS
• Processor-detected exceptions. These are generated when the
processor detects program and machine errors. They are further
classified as faults, traps, and aborts.
• Software-generated exceptions. The INTO, INT3, BOUND, and INTn
instructions generate exceptions. (The INTn instruction generates an
exception when an exception vector number as an operand.)
• The processor associates an identification number, called a vector,
with each interrupt and exception.
• The NMI interrupt and the exceptions are assigned vectors in the
range 0 through 31. Not all of these vectors are currently used by the
processor. Unassigned vectors in this range are reserved for possible
future uses.
• The vectors in the range 32 to 255 are provided for maskable
interrupts, generated either by asserting the INTR pin or by sending
interrupt messages over the APIC bus. (Advanced Programmable
interrupt controller)
• External interrupt controllers (such as Intel's 8259A Programmable
Interrupt Controller) deliver one of these vectors to the processor on
the system bus during its interrupt-acknowledge cycle.
INTERRUPT DESCRIPTOR TABLE
(IDT)
• Real mode uses a 1KB Interrupt Vector Table(IVT) beginning at address
00000H. Each 4-byte entry in the IVT.
• Protected mode relies on an Interrupt Descriptor Table(IDT) to support
interrupts and exceptions.
• IDT comprises 8-byte gate descriptor for task, trap or interrupt gates. The
IDT has a maximum size of 256 descriptors. The size of IDT is controlled by
a 16-bit limit value stored in Interrupt Table Descriptor Register(ITDR).
• ITDR is a 48-bit register contains the 32-bit base address for the IDT and
the 16-bit size limit.
• It can be placed anywhere in physical memory.
IDT DESCRIPTORS
The IDT may contain any of three kinds of gate descriptors:
• Task gate descriptor
• Interrupt gate descriptor
• Trap gate descriptor
• The P-bit in each descriptor stands for present, and indicates whether
the segment is present in memory.
• The DPL field specifies the descriptor privilege level.
• When fewer interrupts/exceptions are required, the limit field of the
IDTR is used to specify the addressable limit within the IDT. The
Pentium will enter shutdown mode if the limit is exceeded.
Interrupt 0—Divide Error Exception
• Indicates the divisor operand for a DIV or IDIV instruction is 0 or that the
result cannot be represented in the number of bits specified for the
destination operand.
Interrupt 1—Debug Exception
• Indicates that one or more of several debug-exception conditions has been
detected. Whether the exception is a fault or a trap depends on the
condition
• Trap or Fault. The exception handler can distinguish between traps or
faults by examining the contents of the DR6 register and other debug
registers.
Interrupt 2—NMI Interrupt
• The non-maskable interrupt (NMI) is generated externally by asserting the
processor’s NMI pin. This interrupt causes the NMI interrupt handler to be
called.
Interrupt 3—Breakpoint Exception
• Indicates that a breakpoint instruction (INT3) was executed, causing a
breakpoint trap to be generated. Typically, a debugger sets a breakpoint by
replacing the first opcode byte of an instruction with the opcode for the
INT3 instruction.
• Breakpoint handler is responsible for replacing the original byte of the
instruction modified.
Interrupt 4—Overflow Exception
Indicates that an overflow trap occurred when an INTO instruction was
executed. If the OF flag is set, an overflow trap is generated.
Interrupt 5—BOUND Range Exceeded Exception
Indicates that a BOUND-range-exceeded fault occurred when a BOUND
instruction was executed. It detects the array subscript out of range.
Interrupt 6—Invalid Opcode Exception
Attempted to execute an invalid or reserved opcode.
Interrupt 7—Device Not Available Exception
On earlier 80x86 machines, This exception was used to indicate that there
was no external floating point coprocessor interfaced to the CPU
Interrupt 8—Double Fault Exception
• Indicates that the processor detected a second exception while calling an
exception handler for a prior exception.
Interrupt 9—CoProcessor Segment Overrun
• This was previously used to signal the page fault but it is not available in
Pentium.
Interrupt 10—Invalid TSS Exception
Indicates that a task switch was attempted that referenced an invalid TSS.
Interrupt 11—Segment Not Present
Indicates that the present flag of a segment or gate descriptor is clear. It
indicates segment is not present in memory.
Interrupt 12—Stack Fault Exception
• A limit violation is detected during an operation that refers to the SS
register. Operations that can cause a limit violation include stack-oriented
instructions
Interrupt 13—General Protection Exception
• Indicates that the processor detected one of a class of protection
violations called “general protection violations.”
Violations like
• Exceeding the segment limit when accessing the CS, DS, ES, FS, or GS
segments.
• Writing to a code segment or a read-only data segment.
• Reading from an execute-only code segment.
Interrupt 14—Page Fault Exception
It occurs when processor attempts to access a page that is not in memory
Interrupt 16—Floating-Point Error Exception
Indicates that the FPU has detected a floating-point-error exception.
Interrupt 17—Alignment Check Exception
• Indicates that the processor detected an unaligned memory operand
when alignment checking was enabled.
Interrupt 18—Machine Check Exception
Indicates that the processor detected an internal machine error.

Advanced Microprocessor Architecture

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Advanced Microprocessor Architecture

Hochgeladen von

Copyright:

Verfügbare Formate

EC6013-ADVANCED

 Learn the advanced features in microprocessors and

 Study the Architecture of Various microcontrollers.

• UNIT II-HIGH PERFORMANCE RISC ARCHITECTURE – ARM

• UNIT IV - MOTOROLA 68HC11 MICROCONTROLLERS (9)

• [1] Andrew N.Sloss, Dominic Symes and Chris

• 3.James L. Antonakos , “ The Pentium Microprocessor”, Pearson

• 5.John .B.Peatman , “Design with PIC Microcontroller”, Prentice Hall,

• 6.James L.Antonakos, “An Introduction to the Intel family of

 Programming the pentium Processor.

“Microprocessor is present inside a Microcontroller”.

This is valid to some extent because:

Microcontroller = Microprocessor + Few Extra components

8086 Architecture (continued…) 39

 Contains the offset addresses of memory

8086 Architecture (continued…) 41

 SI: Source Index register

• The SI and the DI registers may also be used to

8086 Architecture (continued…) 42

8086 Architecture (continued…) 43

8086 Architecture (continued…) 45

Segment Register (16 bits) 0 0 0 0

Physical Address (20 Bits)

8086 Architecture (continued…) 46

8086 Architecture (continued…) 47

8086 Architecture (continued…) 48

current task is nested Carry flag

 Status or Conditional flags:

 IF (interrupt) Indicates whether external

 TF (trap) Permits operation of the processor in

8086 Architecture (continued…) 52

0010 0011 0100 0101 SF= 0 ZF= 0 AF= 0

0101 0100 0011 1001 SF= 1 ZF= 0 AF= 1

8086 Architecture (continued…) 54

 General Instruction Format

 Operands may be contained in

1- The logical address is the CS: IP content which is: 24F6:634A

1. Implied Addressing - The data value/data

3. Direct Addressing – Data is pointed by 16 bit

No PhyAddr, since data is in regr

MOV AX, [BX] Default Segment - DS or ES

MOV AX, [SI] Data is available in the logical

Effective Addr = [SI]

MOV AX, 50H [BX]

MOV AX, [BX] [SI]

If its directory entry is valid, the target address of the branch is stored in

• Physical memory is divided into fixed-size page frames of 4KB each.

• Paging is controlled by three flags in the processor’s control registers:

PDE & PTE format 31 – 12( PT Address) 11- 0 ( control & status flags

Das könnte Ihnen auch gefallen