Beruflich Dokumente
Kultur Dokumente
AND MICROCONTROLLER
Objectives
Study the fundamentals of microprocessor
architecture .
2
Syllabus
UNIT I -HIGH PERFORMANCE CISC ARCHITECTURE –
PENTIUM 9
CPU Architecture- Bus Operations – Pipelining – Branch prediction –
floating point unit- Operating Modes –Paging – Multitasking – Exception
and Interrupts – Instruction set – addressing modes – Programming the
Pentium processor.
3
Syllabus
UNIT III-ARM APPLICATION DEVELOPMENT (9)
Introduction to DSP on ARM –FIR filter – IIR filter – Discrete fourier
transform – Exception handling – Interrupts – Interrupt handling schemes-
Firmware and bootloader – Embedded Operating systems – Integrated
Development Environment- STDIO Libraries – Peripheral Interface –
Application of ARMProcessor - Caches – Memory protection Units –
Memory Management units-Future ARM Technologies.
4
Text Books
[1] Andrew N.Sloss, Dominic Symes and Chris Wright “ ARM
System Developer‟s Guide : Designing and Optimizing
System Software” , First edition, Morgan Kaufmann
Publishers, 2004.
5
References
1.Steve Furber , “ARM System –On –Chip architecture”, Addision Wesley,
2000.
2.Daniel Tabak , “Advanced Microprocessors”, Mc Graw Hill. Inc., 1995
6
UNIT I -HIGH PERFORMANCE CISC ARCHITECTURE – PENTIUM 9
Objective
Study the Architecture of Pentium processor
7
Overview
Introduction to Pentium
Pentium Architecture
Addressing Modes
Instruction Set
Assembly Language Programming
Bus Operations
Pipelining
Branch Prediction
Exception and Interrupts
Floating point unit
Operating Modes.
Paging and Multitasking.
8
INTRODUCTION
MICROPROCESSOR
F0000
E0000
5000:FFFF
D0000
C0000
B0000
Starting
A0000
Address
90000
of each 5000:0250
80000
segment
70000
60000
50000 5000:0000
40000
SegAddr:Offset
30000
20000
10000
00000
8086 Architecture (continued…) 45
Physical Memory Address Generation
The BIU has a dedicated adder for determining
Physical memory addresses
Adder
49
Flags
Flags:
- 32 bit flag register.
-Used only in Protected mode.
Control flags:
– Used to control some operations of the MPU.
– These flags are to be set by the user, in order to achieve
some specific purposes.
55
Addressing Modes
Various methods used to access instruction operands is
called as Addressing Mode
Addressing Modes
56
Example:
If CS=24F6h & IP=634Ah, show the;
1- The logical address
2- The offset address
3- The physical address
4- The lower range of the segment
Solution: 5- The upper range of the segment
1- The logical address is the CS: IP content which is: 24F6:634A
2- The offset address is the content of the IP register which is: 634A
3- The physical address:
8 BIT Operand Registers - AL, AH, BL, BH, CL, CH, DL, DH
DS BX
PhyAddr = 10H *
ES
+ SI
DI
If DS=5000H; BX=10FF;
Then EffectiveAddr = 10FF
and PhyAddr = 10H*5000H + 10FFH = 510FFH
EffectiveAddr = 50H+[BX]
BX
DS BP
PhyAddr = 10H * ES +
SI
DI
DS BX SI
PhyAddr = 10H * ES
+ BP
+ DI
8 bit BX SI
EffectiveAddr = 16 bit + BP + DI
DS 8 bit BX SI
PhyAddr = 10H * ES +
16 bit
+ BP + DI
When paired instruction reach the EX stage, it is possible that one or other will stall and require
additional cycles to execute. A pipeline stall lowers performance, since no work is done during stall
Instruction stall for various reasons, most notably when their operands are not available in data
cache.
If the instruction in the U pipeline stalls, then V-pipeline does the same.
If the V pipeline stalls, the instruction in the U-pipeline may continue executing. Both instructions
must process to the WB stage before another pair may enter the EX stage.
Branch Prediction Logic
Flushing of pipeline problem
• Performance gain through pipelining can be reduced
by the presence of program transfer instructions
(such as JMP,CALL,RET and conditional jumps).
• They change the sequence causing all the instructions
that entered the pipeline after program transfer
instruction invalid.
Flushing of pipeline problem
• Suppose instruction I3 is a conditional jump to I50 at
some other address(target address), then the
instructions that entered after I3 is invalid and new
sequence beginning with I50 need to be loaded in.
• This causes bubbles in pipeline, where no work is
done as the pipeline stages are reloaded.
Flushing of pipeline problem
• To avoid this problem, the Pentium uses a scheme
called Dynamic Branch Prediction.
• In this scheme, a prediction is made concerning the
branch instruction currently in pipeline.
• Prediction will be either taken or not taken.
• If the prediction turns out to be true, the pipeline
will not be flushed and no clock cycles will be lost.
Flushing of pipeline problem
• If the prediction turns out to be false, the pipeline is flushed and
started over with the correct instruction.
• It results in a 3 cycle penalty if the branch is executed in the u-
pipeline and 4 cycle penalty in v-pipeline.
Dynamic Branch Prediction Mechanism
• It is implemented using a 4-way set associative cache with 256
entries. This is referred to as the Branch Target Buffer(BTB).
• The directory entry for each line contains the following
information:
• Valid Bit : Indicates whether or not the entry is in use
• History Bits: track how often the branch has been taken
• Source memory address that the branch instruction was fetched from
(address of I3)
If its directory entry is valid, the target address of the branch is stored in
corresponding data entry in BTB
Dynamic Branch Prediction Mechanism
• The first time that a branch instruction enters either pipeline, the
BTB uses its source memory address to perform a lookup in the
cache.
• Since the instruction has not been seen before, this results in a BTB
miss.
Dynamic Branch Prediction Mechanism
• It means the prediction logic has no history on
instruction.
• It then predicts that the branch will not be taken and
program flow is altered.
• Even unconditional jumps will be predicted as not
taken the first time that they are seen by BTB.
Dynamic Branch Prediction Mechanism
• When the instruction reaches the execution stage, the branch
will be either taken or not taken.
• If taken, the next instruction to be executed should be the one
fetched from branch target address.
• If not taken, the next instruction is the next sequential
memory address.
Dynamic Branch Prediction Mechanism
• When the branch is taken for the first time, the execution unit
provides feedback to the branch prediction logic.
• The branch target address is sent back and recorded in BTB.
• A directory entry is made containing the source memory
address and history bits set as strongly taken
Dynamic Branch Prediction Mechanism
Strongly Weakly
Taken Taken
Strongly Weakly
Not Not
Taken Taken
Dynamic Branch Prediction Mechanism
History Resulting Prediction If branch is If branch is
Bits Description Made taken not taken
Strongly Branch Remains Downgrades to
11 Taken Taken Strongly Weakly Taken
Taken
Weakly Branch Upgrades to Downgrades to
10 Taken Taken Strongly Weakly Not
Taken Taken
Weakly Not Branch Not Upgrades to Downgrades to
01 Taken Taken Weakly Taken Strongly Not
Taken
Strongly Not Branch Not Upgrades to Remains
00 Taken Taken Weakly Not Strongly Not
Taken Taken
FLOATING POINT UNIT(FPU)
Floating-Point Pipeline
• The floating point pipeline has 8 stages as follows:
1. Prefetch(PF) :
– Instructions are prefetched from the on-chip instruction
cache
2. Instruction Decode(D1):
– Two parallel decoders attempt to decode and issue the
next two sequential instructions
– It decodes the instruction to generate a control word
Floating-Point Pipeline
3. Address Generate (D2):
• Decodes the control word
• Address of memory resident operands are calculated
4. Memory and Register Read (Execution Stage) (EX):
• Register read, memory read or memory write performed
as required by the instruction to access an operand.
5. Floating Point Execution Stage 1(X1):
• Information from register or memory is written into FP
register.
• Data is converted to floating point format before being
loaded into the floating point unit
Floating-Point Pipeline
6. Floating Point Execution Stage 2(X2):
• Floating point operation performed within floating point
unit.
7. Write FP Result (WF):
• Floating point results are rounded and the result is
written to the target floating point register.
8. Error Reporting(ER)
• If an error is detected, an error reporting stage is entered
where the error is reported and FPU status word is
updated
Instruction Issue for Floating Point Unit
• The rules of how floating-point (FP) instructions get issued
on the Pentium processor are :
1. FP instructions do not get paired with integer
instructions.
2. When a pair of FP instructions is issued to the FPU, only
the FXCH instruction can be the second instruction of
the pair.
The first instruction of the pair must be one of a set F where F
= [ FLD,FADD, FSUB, FMUL, FDIV, FCOM, FUCOM, FTST, FABS,
FCHS].
3. FP instructions other than FXCH and instructions
belonging to set F, always get issued singly to the FPU.
4. FP instructions that are not directly followed by an
FXCH instruction are issued singly to the FPU.
Bypass1
Floating –point
registers
ST(0)
X1
Read port 1 Write port 1
Ex 80 bits
ST(7)
Bypass2
FPU Register File
PAGING
Paging
• The Pentium supports translation of virtual (linear) addresses into physical
addresses through the use of special tables that map portions of the virtual
address into actual physical memory locations.
• PSE (page size extensions) flag, bit 4 of CR4. { set => page size 2MB or 4MB
• PAE (physical address extension) flag, bit 5 of CR4).
Paging
• Page directory—An array of 32-bit page-directory entries contained in a 4-KByte page. Up to 1024 page-
directory entries can be held in a page directory.
• Page table—An array of 32-bit page-table entries contained in a 4-KByte page. Up to 1024 page-table
entries can be held in a page table. (Page tables are not used for 2-MByte or 4-MByte pages. These page
sizes are mapped directly from one or more page directories.)
• Page—A 4-KByte, 2-MByte, or 4-MByte flat address space.
Paging
32-bit virtual (linear) addresses generated by a running task select entries in the
systems page directory and page table, which translate the upper 20 bits of the virtual
address into the actual physical address where a page frame is located.
The lower 12 bits of the virtual address are not translated and point to one of 4,096
byte locations within a page frame.
• The base address of the page directory is stored in the page directory base register (PDBR).
• Each entry in the page directory is 4 bytes wide and contains the base address of a page table.
• The next 10 bits from the virtual address select one of 1,024 entries in the page table pointed to by
the page directory entry.
• This entry is also 4 bytes wide and contains the base address of the actual physical memory page
frame.
• This address is combined with the lower 12 bits of the virtual address to access the desired location
in memory.
Paging
Displacement or
Offset
PDE & PTE format 31 – 12( PT Address) 11- 0 ( control & status flags
Paging
Translation lookaside buffers(TLBs)
• To improve the performance, the internal instruction and data cache
of the Pentium contain small, special caches called TLBs that
automatically translate the upper 20 bits of the virtual address into
upper 20 bits of physical address.
• So it requires only one clock cycle to process.
• TLBs contains only the address of the most recently used pages.
• If the required translation is not available in TLB, then the processor
access the page directory and page table from RAM and store it TLBs.
• Prior to doing this it may be necessary to invalidate the contents in
TLBs.
PDE:
PTE:
Page frame address(12-31) Avail. 0 0 0 A PCD PWT U W F
• D-Dirty bit: It is set if a write has been performed to the page pointed
by PTE.
• A-Accessed: It is set if a read or write was performed to the page
selected by the PTE and PDE.
• PCD-cache disable: This bit determines whether the current memory
accessed is cache.
• PWT-Writethrough: This bit enables writethrough operations
between cache and memory.
• U-user: This bit is set when performing protection check in memory
• W-writable: This bit determines whether page may be written to and
is also used in protection checks
• P-Present: This page indicates page is actually stored in memory. If
new page is needed, processor creates one and updates TLBs.
Paging
Summary….