Superscalar Processor

Superscalar processor
While a superscalar CPU is typically also pipelined,

pipelining and superscalar execution are considered dif-
ferent performance enhancement techniques. The former
executes multiple instructions in the same execution unit
in parallel by dividing the execution unit into dierent
phases, whereas the latter executes multiple instructions
in parallel by using multiple execution units.
The superscalar technique is traditionally associated with
several identifying characteristics (within a given CPU):
Instructions are issued from a sequential instruction

Simple superscalar pipeline. By fetching and dispatching two in- stream
structions at a time, a maximum of two instructions per cycle can
be completed. (IF = Instruction Fetch, ID = Instruction Decode, The CPU dynamically checks for data dependen-
EX = Execute, MEM = Memory access, WB = Register write cies between instructions at run time (versus soft-
back, i = Instruction number, t = Clock cycle [i.e., time]) ware checking at compile time)
The CPU can execute multiple instructions per clock
cycle
1 History
Seymour Cray's CDC 6600 from 1966 is often mentioned
as the rst superscalar design. The Motorola MC88100
(1988), the Intel i960CA (1989) and the AMD 29000-
series 29050 (1990) microprocessors were the rst com-
Processor board of a CRAY T3e supercomputer with four super-
mercial single-chip superscalar microprocessors. RISC
scalar Alpha 21164 processors microprocessors like these were the rst to have super-
scalar execution, because RISC architectures frees tran-
A superscalar processor is a CPU that implements a sistors and die area which could be used to include mul-
tiple execution units (this was why RISC designs were
form of parallelism called instruction-level parallelism
within a single processor. It therefore allows for more faster than CISC designs through the 1980s and into the
1990s).
throughput (the number of instructions that can be exe-
cuted in a unit of time) than would otherwise be possible Except for CPUs used in low-power applications,
at a given clock rate. A superscalar processor can exe- embedded systems, and battery-powered devices, essen-
cute more than one instruction during a clock cycle by tially all general-purpose CPUs developed since about
simultaneously dispatching multiple instructions to dif- 1998 are superscalar.
ferent execution units on the processor. Each execution The P5 Pentium was the rst superscalar x86 proces-
unit is not a separate processor (or a core if the processor; the Nx586, P6 Pentium Pro and AMD K5 were
sor is a multi-core processor), but an execution resource among the rst designs which decode x86-instructions
within a single CPU such as an arithmetic logic unit. asynchronously into dynamic microcode-like micro-op
In Flynns taxonomy, a single-core superscalar proces- sequences prior to actual execution on a superscalar
sor is classied as an SISD processor (Single Instruction microarchitecture; this opened up for dynamic scheduling
stream, Single Data stream), though many superscalar of buered partial instructions and enabled more paral-
processors support short vector operations and so could lelism to be extracted compared to the more rigid meth-
be classied as SIMD (Single Instruction stream, Mul- ods used in the simpler P5 Pentium; it also simplied
tiple Data streams). A multi-core superscalar processor speculative execution and allowed higher clock frequen-
is classied as an MIMD processor (Multiple Instruction cies compared to designs such as the advanced Cyrix
streams, Multiple Data streams). 6x86.
1
2 4 ALTERNATIVES
2 Scalar to superscalar other. The instructions a = b + c; d = e + f can be run

in parallel because none of the results depend on other
The simplest processors are scalar processors. Each in- calculations. However, the instructions a = b + c; b = e
struction executed by a scalar processor typically manip- + f might not be runnable in parallel, depending on the
ulates one or two data items at a time. By contrast, each order in which the instructions complete while they move
instruction executed by a vector processor operates simul- through the units.
taneously on many data items. An analogy is the dif- When the number of simultaneously issued instructions
ference between scalar and vector arithmetic. A super- increases, the cost of dependency checking increases ex-
scalar processor is a mixture of the two. Each instruction tremely rapidly. This is exacerbated by the need to check
processes one data item, but there are multiple execution dependencies at run time and at the CPUs clock rate.
units within each CPU thus multiple instructions can be This cost includes additional logic gates required to im-
processing separate data items concurrently. plement the checks, and time delays through those gates.
Superscalar CPU design emphasizes improving the in- Research shows the gate cost in some cases may be nk
struction dispatcher accuracy, and allowing it to keep the gates, and the delay cost k 2 log n , where n is the number
multiple execution units in use at all times. This has be- of instructions in the processors instruction set, and k is
come increasingly important as the number of units has the number of simultaneously dispatched instructions.
increased. While early superscalar CPUs would have two Even though the instruction stream may contain no
ALUs and a single FPU, a modern design such as the inter-instruction dependencies, a superscalar CPU must
PowerPC 970 includes four ALUs, two FPUs, and two nonetheless check for that possibility, since there is no
SIMD units. If the dispatcher is ineective at keeping assurance otherwise and failure to detect a dependency
all of these units fed with instructions, the performance would produce incorrect results.
of the system will be no better than that of a simpler,
No matter how advanced the semiconductor process or
cheaper design.
how fast the switching speed, this places a practical limit
A superscalar processor usually sustains an execution rate on how many instructions can be simultaneously dis-
in excess of one instruction per machine cycle. But patched. While process advances will allow ever greater
merely processing multiple instructions concurrently does numbers of execution units (e.g., ALUs), the burden of
not make an architecture superscalar, since pipelined, checking instruction dependencies grows rapidly, as does
multiprocessor or multi-core architectures also achieve the complexity of register renaming circuitry to mitigate
that, but with dierent methods. some dependencies. Collectively the power consump-
In a superscalar CPU the dispatcher reads instructions tion, complexity and gate delay costs limit the achievable
from memory and decides which ones can be run in par- superscalar speedup to roughly eight simultaneously dis-
allel, dispatching each to one of the several execution patched instructions.
units contained inside a single CPU. Therefore, a super- However even given innitely fast dependency checking
scalar processor can be envisioned having multiple par- logic on an otherwise conventional superscalar CPU, if
allel pipelines, each of which is processing instructions the instruction stream itself has many dependencies, this
simultaneously from a single instruction thread. would also limit the possible speedup. Thus the degree
of intrinsic parallelism in the code stream forms a second
limitation.
3 Limitations
Available performance improvement from superscalar 4 Alternatives
techniques is limited by three key areas:
Collectively, these limits drive investigation into alterna-

1. The degree of intrinsic parallelism in the instruc-
tive architectural changes such as very long instruction
tion stream (instructions requiring the same compu-
word (VLIW), explicitly parallel instruction computing
tational resources from the CPU).
(EPIC), simultaneous multithreading (SMT), and multi-
2. The complexity and time cost of dependency check- core computing.
ing logic and register renaming circuitry With VLIW, the burdensome task of dependency check-
3. The branch instruction processing. ing by hardware logic at run time is removed and dele-
gated to the compiler. Explicitly parallel instruction com-
Existing binary executable programs have varying de- puting (EPIC) is like VLIW, with extra cache prefetching
grees of intrinsic parallelism. In some cases instructions instructions.
are not dependent on each other and can be executed si- Simultaneous multithreading, often abbreviated as SMT,
multaneously. In other cases they are inter-dependent: is a technique for improving the overall eciency of su-
one instruction impacts either resources or results of the perscalar processors. SMT permits multiple independent
3
threads of execution to better utilize the resources pro- 7 External links

vided by modern processor architectures.
Superscalar processors dier from multi-core processors Eager Execution / Dual Path / Multiple Path, By
in that the several execution units are not entire proces- Mark Smotherman
sors. A single processor is composed of ner-grained ex-
ecution units such as the ALU, integer multiplier, inte-
ger shifter, FPU, etc. There may be multiple versions of
each execution unit to enable execution of many instruc-
tions in parallel. This diers from a multi-core proces-
sor that concurrently processes instructions from multiple
threads, one thread per processing unit (called core). It
also diers from a pipelined processor, where the multi-
ple instructions can concurrently be in various stages of
execution, assembly-line fashion.
The various alternative techniques are not mutually
exclusivethey can be (and frequently are) combined in a
single processor. Thus a multicore CPU is possible where
each core is an independent processor containing multiple
parallel pipelines, each pipeline being superscalar. Some
processors also include vector capability.
5 See also
Out-of-order execution
Super-threading
Simultaneous multithreading (SMT)
Speculative execution / Eager execution
Software lockout, a multiprocessor issue similar to

logic dependencies on superscalars
Shelving buer
6 References
Mike Johnson, Superscalar Microprocessor Design,

Prentice-Hall, 1991, ISBN 0-13-875634-1
Sorin Cotofana, Stamatis Vassiliadis, On the De-

sign Complexity of the Issue Logic of Superscalar
Machines, EUROMICRO 1998: 10277-10284
Steven McGeady, The i960CA SuperScalar Imple-

mentation of the 80960 Architecture, IEEE 1990,
pp. 232240
Steven McGeady, et al., Performance Enhance-

ments in the Superscalar i960MM Embedded Mi-
croprocessor, ACM Proceedings of the 1991 Con-
ference on Computer Architecture (Compcon), 1991,
pp. 47
4 8 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES
8 Text and image sources, contributors, and licenses

8.1 Text
Superscalar processor Source: https://en.wikipedia.org/wiki/Superscalar_processor?oldid=735564047 Contributors: Damian Yerrick,
The Anome, Roadrunner, SimonP, Rade Kutil, Maury Markowitz, 7265, Zoicon5, AnthonyQBachler, Hadal, SpellBott, DavidCary, Drat-
man, VampWillow, Phe, Discospinster, Smyth, Chub~enwiki, Dyl, ZeroOne, Uli, CanisRufus, Drhex, Sleske, Liao, EmmetCauleld,
Bookandcoee, Kenyon, Ruud Koot, Hdante, Qwertyus, Kbdank71, FlaBot, Ffaarr, LAk loho, Gurch, BMF81, YurikBot, Borgx, Dugosz,
Blitterbug, Cal guy, Rwwww, Eskimbot, QTCaptain, Frap, JonHarder, Joema, Valenciano, Daniel Santos, Fuzzbox, 16@r, Hgrobe, Mike
Fikes, JLD, Rud Almeida, Thijs!bot, Kubanczyk, JAnDbot, .anacondabot, Destynova, TechnoFaye, R'n'B, Anomen, Su-steve, Ksinkar,
Nxavar, Retiono Virginian, Kvangend, AlleborgoBot, PanagosTheOther, Gerakibot, Hello.sanjay, Oxymoron83, Pmcgrane, WakingLili,
Rilak, Houyi, Addbot, Cst17, Lightbot, Ptbotgourou, Nallimbot, Kcubeice, AnomieBOT, Techdoode, Flewis, Materialscientist, ArthurBot,
JimVC3, Jerey Mall, Omnipaedista, Miyagawa, Zxb, Arndbergmann, Maggyero, Atyatya, EmausBot, WikitanvirBot, Dewritech, Klbrain,
Cogiati, ClueBot NG, Helpful Pixie Bot, Wbm1058, IronOak, Brian Tomasik, Paul A. Clayton, GlennHK, Numbermaniac, Hamoudafg,
Wandering Logic, Latvia Man, ThunderGodMod, L9G45AT0, Samhita Vasu and Anonymous: 86
8.2 Images
File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: PD Contributors: ? Origi-
nal artist: ?
File:Folder_Hexagonal_Icon.svg Source: https://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg License: Cc-by-
sa-3.0 Contributors: ? Original artist: ?
File:Processor_board_cray-2_hg.jpg Source: https://upload.wikimedia.org/wikipedia/commons/c/c2/Processor_board_cray-2_hg.jpg
License: CC BY-SA 2.5 Contributors: Own work Original artist: Hannes Grobe & Chresten Wbber, Alfred Wegener Institute for Polar
and Marine Research, Bremerhaven, Germany
File:Superscalarpipeline.svg Source: https://upload.wikimedia.org/wikipedia/commons/4/46/Superscalarpipeline.svg License: CC BY-
SA 3.0 Contributors: Own work Original artist: Amit6, original version (File:Superscalarpipeline.png) by User:Poil
8.3 Content license

Creative Commons Attribution-Share Alike 3.0

Superscalar Processor

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Superscalar Processor

Hochgeladen von

Copyright:

Verfügbare Formate

Superscalar processor

While a superscalar CPU is typically also pipelined,

Instructions are issued from a sequential instruction

2 Scalar to superscalar other. The instructions a = b + c; d = e + f can be run

Collectively, these limits drive investigation into alterna-

threads of execution to better utilize the resources pro- 7 External links

Simultaneous multithreading (SMT)

Speculative execution / Eager execution

Software lockout, a multiprocessor issue similar to

Mike Johnson, Superscalar Microprocessor Design,

Sorin Cotofana, Stamatis Vassiliadis, On the De-

Steven McGeady, The i960CA SuperScalar Imple-

Steven McGeady, et al., Performance Enhance-

8 Text and image sources, contributors, and licenses

8.3 Content license

Das könnte Ihnen auch gefallen