Beruflich Dokumente
Kultur Dokumente
4/25/2012
Shaftab Ahmed
Instruction set architecture is based on the structure of a computer i.e. the description of the CPU in terms of Registers, Addressability and various Arithmetic / Control and Store operations etc. Assembly / Machine language programmer must understand ISA of target processor to program for it.
4/25/2012
Shaftab Ahmed
software
instruction set
hardware
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 3
ISA Metrics
Orthogonally
All operand modes are available with any data type or instruction type. Support for a wide range of operations and target applications No overloading for the meanings of instruction fields Resource needs easily determined
Completeness
Regularity
Streamlined
4/25/2012
registers, memory, stack, accumulator 0, 1, 2, or 3 register, immediate, indirect, . . . byte, int, float, double, string, vector. . . add, sub, mul, move, compare . . .
5
4/25/2012
Shaftab Ahmed
General Purpose Register Machines Complex Instruction Sets (Vax, Intel 8086 1977-80) Load/Store Architecture (CDC 6600, Cray 1 1963-76) RISC (Mips,Sparc,88000,IBM RS6000, . . .1987+)
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 6
Classifying ISAs
Accumulator (before 1960):
1 address add A acc acc + mem[A] tos tos + next mem[A] mem[A] + mem[B] mem[A] mem[B] + mem[C] R1 R1 + mem[A] R1 mem[A] R1 R2 + R3 R1 mem[R2] mem[R1] R2
4/25/2012
Shaftab Ahmed
4/25/2012
Shaftab Ahmed
Direct
Register Indirect Transfer a data byte/word to a location whose address is specified in a register e.g. [Bx] Use of Byte PTR, Word PTR, DWord PTR specifies boundary of data. Base + Index Indirect Relative Base relative plus index Scaled Index
4/25/2012
MOV AX, [BX+SI] MOV AX, (BX+4) MOV AX, (BX+SI+4) MOV AX,[AX+4*BX]
ACA Spring 2012 Bahria Shaftab Ahmed 9
Instruction Encoding
Variable Size Instruction length varies based on opcode and address specifiers For example, VAX instructions vary between 1 and 53 bytes, while x86 instruction vary between 1 and 17 bytes. Good source code density, but difficult to decode and pipeline Fixed Size Only a single size for all instructions For example, DLX, MIPS, Power PC, Sparc all have 32 bit instructions Not as good code density, but easier to decode and pipeline Hybrid Size Have multiple format lengths specified by the opcode For example, IBM 360/370 Compromise between code density and ease in decoding
ACA Spring 2012 Bahria Shaftab Ahmed 10
4/25/2012
DLX Architecture
Derived from many different instruction set architectures from MIPS, Sun, IBM, Intel, HP, AMD, etc. 32-bit fixed length instructions 3 instruction formats Load/store architecture Simple branch conditions (no condition codes)
DLX registers
32 32-bit general-purpose registers (R0 = 0) 32 32-bit (or 16 64-bit) floating point registers Special purpose registers (e.g., FP Status and PC)
ACA Spring 2012 Bahria Shaftab Ahmed 11
4/25/2012
Use general purpose registers with a load-store architecture Support commonly used addressing modes displacement, immediate, and register deferred Support simple instructions that occur frequently load, store, add, subtract, move, and, shift, compare equal, branch, jump, call, and return Support commonly required data sizes 8 (byte), 16 (half word), and 32-bit (word) integers 32 (float) and 64-bit (double) floating point Use fixed length instructions that are easy to decode Provide plenty of general purpose registers and separate floating point registers
ACA Spring 2012 Bahria Shaftab Ahmed 12
4/25/2012
Op
rs1
rs2
rd
function
(ALL reg. operations, read/write special registers and moves) (b) Register-Immediate (I-type)
31 26 25 21 20
Op
rs1
rd
immediate
JUMP
end
0
Op
offset added to PC
4/25/2012
Shaftab Ahmed
14
x86 instructions typically have two operands, where one operand is both a source and a destination operand. Possible combinations include Source/destination type Second source type Register Register Register Immediate Register Memory Memory Register Memory Immediate No memory-memory or immediate-immediate Immediate can be 8, 16, or 32 bits
ACA Spring 2012 Bahria Shaftab Ahmed
4/25/2012
15
4/25/2012
Shaftab Ahmed
16
80x86 Instructions
Data movement
(move, push, pop)
Control flow
(branches, jumps, calls, returns)
String instructions
(move and compare)
FP data movement
(load, load const., store)
Arithmetic instructions
(add, subtract, multiply, divide, square root, absolute value)
Comparisons
(Result to Flag)
Transcendental functions
(sin, cos, log, etc.)
ACA Spring 2012 Bahria Shaftab Ahmed 17
4/25/2012
4/25/2012
Shaftab Ahmed
18
Two Byte instruction where first byte contains Opcode followed by width and second operand has 2nd register and R/ M fields. Mod field is 11
NOTE: W fields D1 gives Dir i.e. 0 Byte2 Reg is Source, 1 Byte 2 Reg is Destination
W fields D0 bit specifies whether it is a eight bit data of 16 bit data R/M field specifies one of 8 registers. The MOD field is 11 for Register, 00 for memory without displacement, 01 for memory with 8 bit displacement and 10 for 16 bit displacement
4/25/2012 ACA Spring 2012 Bahria Shaftab Ahmed 19
4/25/2012
Shaftab Ahmed
20
4. Register to / from Memory with Displacement One or Two additional bytes specify displacement
5. Immediate operand to Register In this instruction the 7bits of first byte and bits 3-4 of second byte specify the op code. The last two bytes specify the data
4/25/2012
Shaftab Ahmed
21
6. Immediate Operand to Memory with 16 bit Displacement First two bytes specify the Opcode MOD and R/M as before followed by two bytes of displacement and two bytes of data
4/25/2012
Shaftab Ahmed
22
4/25/2012
Shaftab Ahmed
23
4/25/2012
Shaftab Ahmed
24
4/25/2012
Shaftab Ahmed
25
4/25/2012
Shaftab Ahmed
26
4/25/2012
Shaftab Ahmed
27
4/25/2012
Shaftab Ahmed
28
4/25/2012
Shaftab Ahmed
29
Several companies have extended their computers instruction sets to support graphics and multimedia applications.
Intels MMX Technology Intels Internet Streaming SIMD Extensions AMDs 3DNow! Technology Suns Visual Instruction Set Motorolas and IBMs AltiVec Technology
Computer-aided design Internet applications Computer visualization Video games Speech recognition
ACA Spring 2012 Bahria Shaftab Ahmed 30
4/25/2012
MMX Instructions
MMX Technology adds 57 new instructions to the x86 architecture (Reference article on PII MMX) Some of these instructions include
PADD(b, w, d) PSUB(b, w, d) PCMPE(b, w, d) PMULLw PMULHw PMADDwd PSRL(w, d, q) PACKSS(wb, dw) PUNPCK(bw, wd, dq) PAND, POR, PXOR
Packed addition Packed subtraction Packed compare equal Packed word multiply low Packed word multiply high Packed word multiply-add Pack shift right logical Pack data Unpack data Packed logical operations
4/25/2012
Shaftab Ahmed
31
MMX Technology supports operations on the following 64-bit integer data types.
Packed byte (eight 8-bit elements)
4/25/2012
Shaftab Ahmed
32
SIMD Operations
MMX Technology allows a Single Instruction to work on Multiple pieces of Data (SIMD). Example:
A3
B3
A3+B3
B2
A2+B2
B1
A1+B1
B0
A0+B0
4 parallel adds are performed on 16-bit elements. Most MMX instructions only require a single cycle.
4/25/2012
Shaftab Ahmed
33
Saturating Arithmetic
Both wrap-around and saturating ADD instructions are supported. With saturating arithmetic, results that overflow are set to the largest value. Below are examples for both types
4/25/2012
Shaftab Ahmed
34
Pack and unpack instructions provide conversion between standard data types and packed data types
Multiply-Add Operations
Vector Dot Products Matrix Multiplication Fast Fourier Transforms (FFTs) Filter implementations
A dot product on two 8-element vector can be performed using 9 MMX instructions
a0*c0+..+ a3*c3
a4*c4+..+ a7*c7 a0*c0+..+ a7*c7 With MMX 9 Instructions 2 loads for one of the vectors Other vector is loaded by PMADD 2 PMADDs, 2 PADDs, 2 shifts (if reqd. to fix precision) 1 Store
4/25/2012
Shaftab Ahmed
37
a0*c0+..+ a3*c3
4/25/2012
Shaftab Ahmed
38
MMX technology extends the Intel x86 architecture to improve the performance of multimedia and graphics applications. Most MMX instructions can be executed in one clock cycle, so the performance improvement will be more dramatic than the simple ratio of instruction counts. It provides a speedup of 1.5 to 2.0 for certain applications. Only increase the chip area by about 5%.
4/25/2012
Shaftab Ahmed
39
MMX instructions are hand-coded in assembly or implemented as libraries to achieve high performance. MMX data types use the x86 floating point registers
Makes it hard to perform MMX and floating point instructions at the same time
4/25/2012
Shaftab Ahmed
40
4/25/2012
Shaftab Ahmed
41
4/25/2012
Shaftab Ahmed
42
Help improve the performance of video and 3D applications 70 new instructions beyond MMX Technology Adds new 128-bit registers Provide the ability to perform parallel floating point operations
Provide data pre-fetch instructions Make certain applications 1.5 to 2.0 times faster.
4/25/2012
Shaftab Ahmed
43