INTERNAL STRUCTURE AND FUNCTIONING OF THE CPU 1. Internal Structure of the CPU 2. Register Organization 3. The Instruction Cycle 4. Instruction Pipelining 5. Pipeline Hazards 6. Structural Hazards 7. Data Hazards 8. Control Hazards Datorarkitektur I F 6/7- 2 Petru Eles, IDA, LiTH Internal Structure of the CPU ALU Control Unit R e g i s t e r s IR PC C o n t r o l
L i n e s D a t a L i n e s A d d r e s s
L i n e s Internal CPU Bus CPU System Bus Datorarkitektur I F 6/7- 3 Petru Eles, IDA, LiTH Register Organization The set of registers within the CPU represents the top level of the memory hierarchy inside the computer system - User visible registers: can be accessed by as- sembly language programmers. - Control and Status registers: used by the Con- trol Unit to control the operation of the CPU; not directly accessible by the programmer. User Visible Registers Some architectures provide a set of registers which can be used without restrictions as operands for any opcode and as address registers; these are so called general-purpose registers. Often the architecture creates a separation between: - data registers: can be used to hold only data. Some architectures impose restrictions to the use of data registers: for example there can be disjoint sets of registers for integer and for oat- ing point computation. - address registers: registers used only for ad- dress representation and computation: base registers, index registers, stack pointer, etc. In some architectures address registers can be specialized for some of the previous functions. Datorarkitektur I F 6/7- 4 Petru Eles, IDA, LiTH Some Trade-offs A large number of general purpose registers large number of bits for encoding register operands; specialization of registers reduces this need. Too small number of registers creates problems to the programmer and leads to an increased memory trafc. The number of general-purpose or data registers is often between 8 - 32. RISC processors often have a very large number of registers (~ 100). Control and Status Registers Program Counter (PC): holds the address of the instruction to be fetched. Instruction Register (IR): holds the last instruction fetched. Memory Address Register (MAR): holds the address of a memory location that is to be read or written. Memory Buffer Register (MBR): holds the data to be written to memory or the data most recently read. Program Status Word (PSW): Condition Code Flags + other bits dening the status of the CPU (interrupt enabled/disabled, supervisor, etc.) Datorarkitektur I F 6/7- 5 Petru Eles, IDA, LiTH Some Examples of Register Organizations Z8000: 16 General purpose registers; no restrictions in use Intel 80X86, Pentium: 4 Data registers 4 Index&address registers 4 Base (segment) registers Some of the Address registers can also be used for general purpose PowerPC: 2 groups of General purpose registers, each of 32 registers; one group is for integer (xed point) computation, the other one for oating point computation. Datorarkitektur I F 6/7- 6 Petru Eles, IDA, LiTH The Instruction Cycle Fetch instruction Decode Fetch operand Execute instruction FI DI - Calculate operand address (CO) - Fetch operand (FO) - Execute instruction (EI) - Write back operand (WO) Datorarkitektur I F 6/7- 7 Petru Eles, IDA, LiTH Instruction Pipelining Instruction execution is extremely complex and involves several operations which are executed successively (see slide 6). This implies a large amount of hardware, but only one part of this hardware works at a given moment. Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. This is solved without additional hardware but only by letting different parts of the hardware work for different instructions at the same time. The pipeline organization of a CPU is similar to an assembly line: the work to be done in an instruction is broken into smaller steps (pieces), each of which takes a fraction of the time needed to complete the entire instruction. Each of these steps is a pipe stage (or a pipe segment). Pipe stages are connected to form a pipe: The time required for moving an instruction from one stage to the next: a machine cycle (often this is one clock cycle). The execution of one instruction takes several machine cycles as it passes through the pipeline. Stage 1 Stage 2 Stage n Datorarkitektur I F 6/7- 8 Petru Eles, IDA, LiTH Acceleration by Pipelining Two stage pipeline: FI: fetch instruction EI: execute instruction We consider that each instruction takes execution time T ex . Execution time for the 7 instructions, with pipelining: (T ex /2)*8= 4*T ex FI EI FI EI FI EI FI EI FI EI FI EI FI EI 1 2 8 3 4 5 6 7 Clock cycle Instr. i Instr. i+1 Instr. i+2 Instr. i+3 Instr. i+4 Instr. i+5 Instr. i+6 Datorarkitektur I F 6/7- 9 Petru Eles, IDA, LiTH Acceleration by Pipelining (contd) Six stage pipeline (see also slide 6): FI: fetch instruction FO: fetch operand DI: decode instruction EI: execute instruction CO: calculate operand address WO:write operand Execution time for the 7 instructions, with pipelining: (T ex /6)*12= 2*T ex After a certain time (N-1 cycles) all the N stages of the pipeline are working: the pipeline is lled. Now, theoretically, the pipeline works providing maximal parallelism(N instructions are active simultaneously). FI DI 1 2 8 3 4 5 6 7 Clock cycle Instr. i Instr. i+1 Instr. i+2 Instr. i+3 Instr. i+4 Instr. i+5 Instr. i+6 COFO EI WO FI DI COFO EI WO FI DI COFO EI WO FI DI COFO EI WO FI DI COFO EI WO FI DI COFO EI WO FI DI COFO EI WO 9 10 11 12 Datorarkitektur I F 6/7- 10 Petru Eles, IDA, LiTH Acceleration by Pipelining (contd) Apparently a greater number of stages always provides better performance. However: - a greater number of stages increases the over- head in moving information between stages and synchronization between stages. - with the number of stages the complexity of the CPU grows. - it is difcult to keep a large pipeline at maximum rate because of pipeline hazards. 80486 and Pentium: five-stage pipeline for integer instr. eight-stage pipeline for FP instr. PowerPC: four-stage pipeline for integer instr. six-stage pipeline for FP instr. Datorarkitektur I F 6/7- 11 Petru Eles, IDA, LiTH Pipeline Hazards Pipeline hazards are situations that prevent the next instruction in the instruction stream from executing during its designated clock cycle. The instruction is said to be stalled. When an instruction is stalled, all instructions later in the pipeline than the stalled instruction are also stalled. Instructions earlier than the stalled one can continue. No new instructions are fetched during the stall. Types of hazards: 1. Structural hazards 2. Data hazards 3. Control hazards Datorarkitektur I F 6/7- 12 Petru Eles, IDA, LiTH Structural Hazards Structural hazards occur when a certain resource (memory, functional unit) is requested by more than one instruction at the same time. Instruction ADD R4,X fetches in the FO stage operand X from memory. The memory doesnt accept another access during that cycle. Penalty: 1 cycle Certain resources are duplicated in order to avoid structural hazards. Functional units (ALU, FP unit) can be pipelined themselves in order to support several instructions at a time. A classical way to avoid hazards at memory access is by providing separate data and instruction caches. FI DI 1 2 8 3 4 5 6 7 Clock cycle ADD R4,X Instr. i+1 Instr. i+2 Instr. i+3 Instr. i+4 COFO EI WO FI DI COFO EI WO FI DI COFO EI WO FI DI COFO EI WO FI DI COFO EI WO 9 10 11 12 stall Datorarkitektur I F 6/7- 13 Petru Eles, IDA, LiTH Data Hazards We have two instructions, I1 and I2. In a pipeline the execution of I2 can start before I1 has terminated. If in a certain stage of the pipeline, I2 needs the result produced by I1, but this result has not yet been generated, we have a data hazard. I1: MUL R2,R3 R2 R2 * R3 I2: ADD R1,R2 R1 R1 + R2 Before executing its FO stage, the ADD instruction is stalled until the MUL instruction has written the result into R2. Penalty: 2 cycles FI DI 1 2 8 3 4 5 6 7 Clock cycle MUL R2,R3 ADD R1,R2 Instr. i+2 COFO EI WO FI DI CO FO EI WO 9 10 11 12 stall stall FI DI COFO EI WO Datorarkitektur I F 6/7- 14 Petru Eles, IDA, LiTH Data Hazards (contd) Some of the penalty produced by data hazards can be avoided using a technique called forwarding (bypassing). The ALU result is always fed back to the ALU input. If the hardware detects that the value needed for the current operation is the one produced by the previous operation (but which has not yet been written back) it selects the forwarded result as the ALU input, in- stead of the value read from register or memory. After the EI stage of the MUL instruction the result is avail- able by forwarding. The penalty is reduced to one cycle. MUX MUX ALU to register or memory from reg- ister or memory from reg- ister or memory bypass path bypass path FI DI 1 2 8 3 4 5 6 7 Clock cycle MUL R2,R3 ADD R1,R2 COFO EI WO FI DI CO 9 10 11 12 stall FO EI WO Datorarkitektur I F 6/7- 15 Petru Eles, IDA, LiTH Control Hazards Control hazards are produced by branch instructions. Unconditional branch - - - - - - - - - - - - - - BR TARGET - - - - - - - - - - - - - - TARGET - - - - - - - - - - - - - - Penalty: 3 cycles FI DI 1 2 8 3 4 5 6 7 Clock cycle BR TARGET target target+1 COFO EI WO FI FI DI COFO EI WO 9 10 11 12 stall stall FI DI COFO EI WO The instruction following the branch is fetched; before the DI is nished it is not known that a branch is exe- cuted. Later the fetched in- struction is discarded After the FO stage of the branch instruction the address of the target is known and it can be fetched Datorarkitektur I F 6/7- 16 Petru Eles, IDA, LiTH Control Hazards (contd) Conditional branch ADD R1,R2 R1 R1 + R2 BEZ TARGET branch if zero instruction i+1 - - - - - - - - - - - - - TARGET - - - - - - - - - - - - - Branch is taken Penalty: 3 cycles Branch not taken Penalty: 2 cycles FI DI 1 2 8 3 4 5 6 7 Clock cycle ADD R1,R2 BEZ TARGET target COFO EI WO FI DI COFO EI WO 9 10 11 12 FI stall stall FI DI COFO EI WO At this moment, both the condition (set by ADD) and the target address are known. FI DI 1 2 8 3 4 5 6 7 Clock cycle ADD R1,R2 BEZ TARGET instr i+1 COFO EI WO FI DI COFO EI WO 9 10 11 12 FI stall stall DI COFO EI WO At this moment the condition is known and instr+1 can go on. Datorarkitektur I F 6/7- 17 Petru Eles, IDA, LiTH Control Hazards (contd) With conditional branch we have a penalty even if the branch has not been taken. This is because we have to wait until the branch condition is available. Branch instructions represent a major problem in assuring an optimal ow through the pipeline. Several approaches have been taken for reducing branch penalties (see slides of the following lecture). Datorarkitektur I F 6/7- 18 Petru Eles, IDA, LiTH Summary The main components of the CPU are: Control Unit, ALU and Register set. They are interconnected through the internal CPU Bus. Interconnection with external modules is through the System Bus. Control signals issued by the Control Unit coordinate the functionality and data ow inside the CPU and between CPU and external modules. The register set id the top level of the memory hierarchy. Only a part of the registers is user visible. User visible registers can be general-purpose or specialised. Instructions are executed by the CPU as a sequence of steps. Instruction execution can be substantially accelerated by instruction pipelining. A pipeline is organized as a succession of N stages. At a certain moment N instructions can be active inside the pipeline. Keeping a pipeline at its maximal rate is prevented by pipeline hazards. Structural hazards are due to resource conicts. Data hazards are produced by data dependencies between instructions. Control hazards are produced as consequence of branch instructions
The Recruitment and Selection is the Major Function of Any Department and Recruitment Process is the First Step Towards Creating the Competitive Strength and the Strategic Advantage for the Organ is at Ions