You are on page 1of 13

Effect of pipeline depth in CPU efficiency Carlos David Bula R.

Computers Architecture
Electrical and Computer Engineering Department University of Puerto Rico, Mayagez Campus April 2008

Outline
Introduction Background Introductory Real life Example Pipeline depth Vs. Performance Introducing the power consumption factor Conclusions

Introduction
In recent years the use of Microprocessor based mobile devices have become widespread. processing and Battery

life

More powerful servers and workstations consuming less power are desirable.

Background
In the past the Microprocessor design was onlyperformance driven. Nanometer fab. technologies have taken VLSI designs to higher levels. Today at 45nm

582M Transistors actual version @ 65nm Next version 820M Transistors @45nm

AMD Phenom X4 CPU 462M Transistors

AMD Quad Core chip Layout

Background
When designing an energy efficient CPU the pipeline structure must be decided in early stages of the design. Slicing CPU in more pipeline stages may lead to some performance gains. But we have to be careful Some tradeoffs involved

A real life example: Netburst Architecture a.k.a. Pentium 4


The P4 CPUs and derivatives were designed to reach high clock speeds. 20 pipelines initially 32 Pipeline stages in later revisions Strategy: Marketing Driven design Performance scaling by means of clock speed scaling. Poor power efficiency specially in later models (Prescott). Not suited for Mobile devices Resigned this space to Pentium M and A64 Poor IPC rate

The Core 2: A well designed CPU


It has only 10 pipeline stages Core 2 is a 4 width issue CPU

Designed from the scratch to be efficient

Execute much more instructions per cycle than P4 Consumes much less power.

Pipeline depth Vs. Performance

Adding more pipelines allows using higher clock speeds. The functional units have less logic depth and therefore introduce less delay. Tradeoff: Increasing pipelines in excess reduces IPC. Branch misprediction is catastrophic for long pipeline CPUs
Tda Inter-stage latches introduces some delay overhead. More delay per stage

Fmax=1/Tdmax Tdmax: delay of stage with the greatest delay

S1 S 1 S 2

S2 S S 3 4

S3 S 5

S4 S S 6 7 S 8

S5 S S 1 9 0
Tdb Less delay per stage

Pipeline depth Vs. Performance

Studies by Harstein and Puzak shows the effect of pipeline depth on the performance. It was found that there is an optimal number of pipeline stages which maximizes performance. No power considerations were made.[Harstein02]

Introducing the power consumption factor

The heat produced by a microprocessor is directly related to its power consumption, which at the same time is proportional to the working clock frequency Pipelining is employed to achieve higher clock frequencies and lower supply voltages. Higher working clock frequencies are needed to compensate for the IPC loss

Optimal power/performance There is a number of pipeline stages which Pipeline depth produces maximum performance.
It was found that the optimum power/ performance depth is 7 stages.[Harstein03] Considering a pure performance driven design may lead to a selection of overly deep pipelined CPU, operating in an inefficient way. [Zubyan04]

Conclusions
There is a trend to design and fabricate even more energy efficient processors, triggered by the widespread use of mobile computers and the growing need of energy saving. Maximizing CPU performance consist in finding a balance between clock frequency and IPC. Pipelining serves as a method for reducing energy consumption. However, the reduction

References
[Harstein02] Harstein, A. and Puzak R., The Optimum Pipeline Depth for a Microprocessor, Proceedings of the 29th Annual International Symposium on Computer Architecture (Anchorage, USA, May 2002). [Harstein03] Hartstein, A. and Puzak, T., Optimun Power/Performance Pipeline Depth