Beruflich Dokumente
Kultur Dokumente
2020-2021
Dr. Ahmad F. Mahmood
Lecture No. 2
MIPS (Microprocessor without Interlocked Pipelined Stages)
MIPS is a reduced instruction set computer (RISC) instruction set architecture
(ISA) developed by MIPS Computer Systems, USA.
MIPS® architecture is a most efficient RISC architecture, delivering the best
performance and lowest power consumption in a given silicon area.
There are multiple versions of MIPS: including MIPS I, II, III, IV, and V.
The early MIPS architectures were 32-bit only; 64-bit versions were developed
later.
The MIPS architecture has several optional extensions:
1. MIPS-3D adds 13 new instructions for improving the performance of 3D
graphics applications. The instructions improved performance by reducing
the number of instructions required to implement four common 3D graphics
operations: vertex transformation, clipping, transformation and lighting.
2. MDMX MDMX was developed to accelerate multimedia applications. It
contains a set of new instructions and registers designed specifically for
multimedia applications.
3. MIPS16e which adds compression to the instruction stream to make
programs take up less room, and
4. MIPS MT, which adds multithreading capability.
• The MIPS32 Architecture is based on the MIPS II ISA, adding selected
instructions from MIPS III, MIPS IV, and MIPS V to improve the efficiency
of generated code and of data movement.
• The MIPS64 Architecture is based on the MIPS V ISA and is backward
compatible with the MIPS32 Architecture.
• MIPS processor is dominant in embedded applications including digital
cameras, digital TVs, Sony PlayStation 2, network routers, and so on.
• MIPS is truly the ideal CPU for tomorrow’s SoCs, from the highest-
performance mobile applications processors to the lowest power connected
sensor processors.
• In this lecture, we give details on the MIPS32 architecture.
MIPS Architecture:
o Architecture modules that provide a range of flexible, scalable and
powerful features include:
o Multi-threading – can make a single processor core appear and
function like multiple separate cores for improved performance
and efficiency.
o SIMD (Single Instruction Multiple Data) – improves performance
by allowing efficient parallel processing of vector operations.
o Virtualization – provides enhanced security features and support
for multiple operating systems.
o DSP Technology – supporting the growing number of consumer
products which require an increasing amount of signal and media
processing horsepower.
• A simplified CPU in MIPS is concluded in the following structure:
• The memory address register (MAR) and memory data register(MDR) are
the interface to instruction and data memory, respectively.
• The ALU and
register file are
the core of the
data path.
• The program
control unit (PCU)
fetches instructions
and data, and
handles branches
and jumps.
• The instruction
decode unit (IDU)
is the control unit
for the processor.
The MIPS architecture was planned with separate instruction and data
caches, so it can fetch an instruction and read or write a memory
variable simultaneously.
“instruction memory” part of memory that stores the program (machine
code)- read only
“data memory” part of memory that stores data manipulated by
program- read/write
Note: There are two types of memory organization:
• Von Neumann architecture – single, shared memory
• Harvard architecture – physically separate data and instruction
memories
A MIPS processor consists of an integer processing unit (the CPU) and
a collection of coprocessors that perform auxiliary tasks or operate on
other types of data such as floating-point numbers.
Integer arithmetic and logical operations are executed directly by the
CPU. Floating point operations are executed by Coprocessor 1.
Coprocessor 0 is used do manage exceptions and interrupts.
• Normal user level code doesn’t have access Coprocessor 0, but interrupt and
exception aware code has to use it.
• Coprocessor 0 has several registers which controls exceptions and interrupts.
• The Cause register, is a mostly read-only register whose value is set by
the system when an interrupt or exception occurs.
• It specifies what kind of interrupt or exception just happened.
• When an exception or interrupt occur, a code is stored in the cause register as a
5 bit value (bits 2-6), see Figure below.
If you want to modify a value in a Coprocessor 0 register, you need to move the
register’s value to a general- purpose register with “mfc0”(Move From
Coprocessor 0), modify the value there, and move the changed value back with
“mtc0”(Move To Coprocessor 0).
MIPS Memory Organization
• In MIPS, each instruction is exactly 32-bits long, therefore the PC
increments by 4 after each instruction.
• For MIPS, a memory word is 32 bits or 4 bytes.
• 2^32 bytes with byte addresses from 0 to 2^32-1
• 2^30 words with byte addresses 0, 4, 8, ... 2^32-4
MIPS systems typically divided memory into three parts, called
segments.
These segments are :
1. The text segment, which stores
instructions, is placed at the
bottom of the user address space
at 4000 000hex,
2. The data segment is placed above
the text segment and starts at
1000 000 0hex.
• The data segment is divided into two parts, the lower part for static data (with
size known at compile time) and the upper part, which can grow, upward, for
dynamic data structures.
3. The stack segment is placed at the end of the user address space at 7FFF FFF
Fhex. It grows downward towards the lower memory address. This placement
of segments allows sharing of unused memory by both data and stack
segments. It contains the return addresses for function calls, and also contains
register values which are to be saved and restored. It may also contain local
variables.
• The stack segment varies in size during the execution of a program, as
functions are called and returned from. It starts at the top of memory and
grows down.
Aligned words
MIPS Registers:
• Registers are memory locations located inside the CPU.
• Register memory access time is typically less than 1 clock cycle.
• In the MIPS CPU architecture, there are 32 general registers.
• When a floating point co-processor is present there are an
additional 32 floating point registers.
• In Micro-Assembly Language (MAL) instructions, registers are identified
by a $ symbol followed by a register number in the range 0-31.
• For example, add $4, $5, $6 adds the contents of registers
$5 and $6, with the result in register $4.
1-Register addressing:
• Operands are in a register.
• Ex: add $3,$4,$5 ; Note rs(sourse-1),rt(sourse-2), and rd(destination )
2-Immediate Addressing
• The operand is embedded inside the encoded instruction.
MIPS immediate addressing means that one operand is a constant
within the instruction itself. The advantage of using it is that there is
no need to have extra memory access to fetch the operand.
• When a register contains the address for an operand, the operand is specified by
enclosing the register name in parentheses.
4- PC-relative addressing:
The value in the immediate field is interpreted as an offset of the
next
Instruction.
Jalr=jump-and-link-register
JALR, meaning that it transfers control to the
address in a specified register, and stores the
return address in the register file.
JALR allows the programmer to specify
the destination register of the return address.
The MIPS instruction set architecture:
• Each instruction in the instruction set describes one particular CUP
operation.
• Each instruction is represented in both assembly language by the
mnimonics and machine language (binary) by a word of 32 bits
subdivided into several fields.
• The MIPS has a 32 bit architecture, with 32 bit instructions, a 32
bit data word, and 32 bit addresses.
• It has 32 addressable internal registers requiring a 5 bit register
address.
• Register 0 always has the constant value 0.
• Addresses are for individual bytes (8 bits) but instructions must
have addresses which are a multiple of 4.
• This is usually stated as “instructions must be word aligned in
memory.”
There are three basic instruction types with the following formats:
The R-type instructions are 3 operand arithmetic and logic instructions, where the
operands are contained in the registers indicated by rs, rt, and rd.
op: opcode specifying the arithmetic/logic operation to be performed;
$rd: destination register in which the result is to be stored;
$rs: source register containing the 1st operand;
$rt: source register containing the 2nd operand.
For all R-type instructions, the opcode field is 000000.
The funct field selects the particular type of operation for R-type operations.
The shamt field (5-bits) determines the amount in which the source perand rs is
shifted (0 to 31).
Opcode can be: add, sub, mult, div, and, or, etc. $rd, $rs, $rt can be any of the 32
registers.These instructions perform the following:
R[rd] R[rs] op R[rt]
Following are examples of R-type instructions:
Recall that the MIPS processor addresses data at the byte level,
but instructions are addressed at the word level.
Moreover, all instructions must be aligned on a word boundary (an
integer multiple of 4 bytes).
Therefore, the next instruction is 4 byte addresses from the current
instruction.
The Jump Instruction:
The Jump Instruction Format is:
R-type instructions:
• load word and store word are the only instructions that access
memory directly.
• The MIPS is said to be a load/store architecture.
• This is often considered to be an essential feature of a reduced
instruction set architecture (RISC).
Example: 1) Multiplication in MIPS:
Two Multiply instructions
mult $s1,$s2 Signed multiplication
multu $s1,$s2 Unsigned multiplication
32-bit multiplication produces a 64-bit Product
Separate pair of 32-bit registers
HI = high-order 32-bit
LO = low-order 32-bit
Result of multiplication is always in HI & LO
Moving data from HI/LO to MIPS registers
mfhi Rd (move from HI to Rd)
mflo Rd (move from LO to Rd)
2) Division in MIPS
Two Divide instructions
div $s1,$s2 Signed division
divu $s1,$s2 Unsigned division
Division produces quotient and remainder
Separate pair of 32-bit registers
HI = 32-bit remainder
LO = 32-bit quotient
If divisor is 0 then result is unpredictable
Moving data to HI/LO from MIPS registers
mthi Rs (move to HI from Rs)
mtlo Rs (move to LO from Rs)
The MIPS Five-Stage Pipeline
• The MIPS architecture is made for pipelining, as shown in Figure
below.
• The execution of every MIPS instruction is divided into five
phases, called pipe-stages, with each pipe-stage taking a fixed
amount of time.
• The fixed amount of time is usually a processor clock cycle.
• Though some actions take only half a clock, so the MIPS five-
stage pipeline actually occupies only four clock cycles.
• All instructions are strictly defined so they can follow the same
sequence of pipestages, even where the instruction does nothing
at some stage.
• The net result is that, as long as you keep accessing the cache, the
CPU starts an instruction every clock cycle.
Look at Figure above and consider what happens in each
pipestage.
1. IF (instruction fetch) Gets the next instruction from the instruction
cache (I-cache).
2. RD (read registers) Fetches the contents of the CPU registers whose
numbers are in the two possible source register fields of the instruction.
3. ALU (arithmetic/logic unit) Performs an arithmetical or logical
operation in one clock cycle (floating-point math and integer
multiply/divide can’t be done in one clock cycle and are done
differently).
4. MEM Is the stage where the instruction can read/write memory
variables in the data cache (D-cache). On average, about ¾ of
instructions do nothing in this stage, but allocating the stage for each
instruction ensures that you never get two instructions wanting the
data cache at the same time.
5. WB (write back) Stores the value obtained from an operation back
to the register file.
Figure below shows the rough division of responsibilities.
The buffers between stages are not shown.
In Figure below a pipeline diagram shows the execution of a
series of load instructions (lw: loads word from the data memory)
• On the MIPS architecture, jump and branch instructions have a "delay slot".
• This means that the instruction after the jump or branch instruction is
executed before the jump or branch is executed.
Speedup
• The steady state throughput is determined by the time
t needed by one stage.
• The length of the pipeline determines the pipeline
filling time.
• If there are k stages, and each stage takes t time units,
then the time needed to execute N instructions is
k.t + (N-1).t
• Estimate the speedup when N=5000 and k=5?
Hazards in a pipeline
Hazards refer to conflicts in the execution of a pipeline.
On example is the need for the same resource (like the same adderss) in
two concurrent actions.
This is called structural hazard.
To avoid it, we have to replicate resources. Here is an example:
Notice the second instruction tries to read $s1 before the first instruction
complete the load. This is known as data hazard.
One solution is in insert bubbles (means delaying certain operation in the
pipeline) as shown:
• Another solution may require some modification in the data-
path, which will raise the hardware cost
• Hazards slow down the instruction execution speed.