Beruflich Dokumente
Kultur Dokumente
Computer Organization
and Architecture
8th Edition
Chapter 6
External Memory
Types of External Memory
• Magnetic Disk
—RAID
—Removable
• Optical
—CD-ROM
—CD-Recordable (CD-R)
—CD-R/W
—DVD
• Magnetic Tape
Magnetic Disk
• Disk substrate coated with magnetizable
material (iron oxide…rust)
• Substrate used to be aluminium
• Now glass
—Improved surface uniformity
– Increases reliability
—Reduction in surface defects
– Reduced read/write errors
—Lower flight heights (See later)
—Better stiffness
—Better shock/damage resistance
Read and Write Mechanisms
• Recording & retrieval via conductive coil called a head
• May be single read/write head or separate ones
• During read/write, head is stationary, platter rotates
• Write
— Current through coil produces magnetic field
— Pulses sent to head
— Magnetic pattern recorded on surface below
• Read (traditional)
— Magnetic field moving relative to coil produces current
— Coil is the same for read and write
• Read (contemporary)
— Separate read head, close to write head
— Partially shielded magneto resistive (MR) sensor
— Electrical resistance depends on direction of magnetic field
— High frequency operation
– Higher storage density and speed
Inductive Write MR Read
Data Organization and Formatting
• Concentric rings or tracks
—Gaps between tracks
—Reduce gap to increase capacity
—Same number of bits per track (variable
packing density)
—Constant angular velocity
• Tracks divided into sectors
• Minimum block size is one sector
• May have more than one sector per block
Disk Data Layout
Disk Velocity
• Bit near centre of rotating disk passes fixed point
slower than bit on outside of disk
• Increase spacing between bits in different tracks
• Rotate disk at constant angular velocity (CAV)
— Gives pie shaped sectors and concentric tracks
— Individual tracks and sectors addressable
— Move head to given track and wait for given sector
— Waste of space on outer tracks
– Lower data density
• Can use zones to increase capacity
— Each zone has fixed bits per track
— More complex circuitry
Disk Layout Methods Diagram
Finding Sectors
• Must be able to identify start of track and
sector
• Format disk
—Additional information not available to user
—Marks tracks and sectors
Winchester Disk Format
Seagate ST506
Characteristics
• Fixed (rare) or movable head
• Removable or fixed
• Single or double (usually) sided
• Single or multiple platter
• Head mechanism
—Contact (Floppy)
—Fixed gap
—Flying (Winchester)
Fixed/Movable Head Disk
• Fixed head
—One read write head per track
—Heads mounted on fixed ridged arm
• Movable head
—One read write head per side
—Mounted on a movable arm
Removable or Not
• Removable disk
—Can be removed from drive and replaced with
another disk
—Provides unlimited storage capacity
—Easy data transfer between systems
• Nonremovable disk
—Permanently mounted in the drive
Multiple Platter
• One head per side
• Heads are joined and aligned
• Aligned tracks on each platter form
cylinders
• Data is striped by cylinder
—reduces head movement
—Increases speed (transfer rate)
Multiple Platters
Tracks and Cylinders
Floppy Disk
• 8”, 5.25”, 3.5”
• Small capacity
—Up to 1.44Mbyte (2.88M never popular)
• Slow
• Universal
• Cheap
• Obsolete?
Winchester Hard Disk (1)
• Developed by IBM in Winchester (USA)
• Sealed unit
• One or more platters (disks)
• Heads fly on boundary layer of air as disk
spins
• Very small head to disk gap
• Getting more robust
Winchester Hard Disk (2)
• Universal
• Cheap
• Fastest external storage
• Getting larger all the time
—250 Gigabyte now easily available
Speed
• Seek time
—Moving head to correct track
• (Rotational) latency
—Waiting for data to rotate under head
• Access time = Seek + Latency
• Transfer rate
Timing of Disk I/O Transfer
RAID
• Redundant Array of Independent Disks
• Redundant Array of Inexpensive Disks
• 6 levels in common use
• Not a hierarchy
• Set of physical disks viewed as single
logical drive by O/S
• Data distributed across physical drives
• Can use redundant capacity to store
parity information
RAID 0
• No redundancy
• Data striped across all disks
• Round Robin striping
• Increase speed
—Multiple data requests probably not on same
disk
—Disks seek in parallel
—A set of data is likely to be striped across
multiple disks
RAID 1
• Mirrored Disks
• Data is striped across disks
• 2 copies of each stripe on separate disks
• Read from either
• Write to both
• Recovery is simple
—Swap faulty disk & re-mirror
—No down time
• Expensive
RAID 2
• Disks are synchronized
• Very small stripes
—Often single byte/word
• Error correction calculated across
corresponding bits on disks
• Multiple parity disks store Hamming code
error correction in corresponding positions
• Lots of redundancy
—Expensive
—Not used
RAID 3
• Similar to RAID 2
• Only one redundant disk, no matter how
large the array
• Simple parity bit for each set of
corresponding bits
• Data on failed drive can be reconstructed
from surviving data and parity info
• Very high transfer rates
RAID 4
• Each disk operates independently
• Good for high I/O request rate
• Large stripes
• Bit by bit parity calculated across stripes
on each disk
• Parity stored on parity disk
RAID 5
• Like RAID 4
• Parity striped across all disks
• Round robin allocation for parity stripe
• Avoids RAID 4 bottleneck at parity disk
• Commonly used in network servers
Write elements 8 8 16 16
Internet Resources
• Optical Storage Technology Association
—Good source of information about optical
storage technology and vendors
—Extensive list of relevant links
• DLTtape
—Good collection of technical information and
links to vendors
• Search on RAID
William Stallings
Computer Organization
and Architecture
8th Edition
Chapter 5
Internal Memory
Semiconductor Memory Types
Memory Type Category Erasure Write Mechanism Volatility
Random-access
Read-write memory Electrically, byte-level Electrically Volatile
memory (RAM)
Read-only
Masks
memory (ROM)
Read-only memory Not possible
Programmable
ROM (PROM)
Erasable PROM
UV light, chip-level
(EPROM) Nonvolatile
Electrically
Chapter 4
Cache Memory
Characteristics
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical characteristics
• Organisation
Location
• CPU
• Internal
• External
Capacity
• Word size
—The natural unit of organisation
• Number of words
—or Bytes
Unit of Transfer
• Internal
—Usually governed by data bus width
• External
—Usually a block which is much larger than a
word
• Addressable unit
—Smallest location which can be uniquely
addressed
—Word internally
—Cluster on M$ disks
Access Methods (1)
• Sequential
—Start at the beginning and read through in
order
—Access time depends on location of data and
previous location
—e.g. tape
• Direct
—Individual blocks have unique address
—Access is by jumping to vicinity plus
sequential search
—Access time depends on location and previous
location
—e.g. disk
Access Methods (2)
• Random
—Individual addresses identify locations exactly
—Access time is independent of location or
previous access
—e.g. RAM
• Associative
—Data is located by a comparison with contents
of a portion of the store
—Access time is independent of location or
previous access
—e.g. cache
Memory Hierarchy
• Registers
—In CPU
• Internal or Main memory
—May include one or more levels of cache
—“RAM”
• External memory
—Backing store
Memory Hierarchy - Diagram
Performance
• Access time
—Time between presenting the address and
getting the valid data
• Memory Cycle time
—Time may be required for the memory to
“recover” before next access
—Cycle time is access + recovery
• Transfer Rate
—Rate at which data can be moved
Physical Types
• Semiconductor
—RAM
• Magnetic
—Disk & Tape
• Optical
—CD & DVD
• Others
—Bubble
—Hologram
Physical Characteristics
• Decay
• Volatility
• Erasable
• Power consumption
Organisation
• Physical arrangement of bits into words
• Not always obvious
• e.g. interleaved
The Bottom Line
• How much?
—Capacity
• How fast?
—Time is money
• How expensive?
Hierarchy List
• Registers
• L1 Cache
• L2 Cache
• Main memory
• Disk cache
• Disk
• Optical
• Tape
So you want fast?
• It is possible to build a computer which
uses only static RAM (see later)
• This would be very fast
• This would need no cache
—How can you cache cache?
• This would cost a very large amount
Locality of Reference
• During the course of the execution of a
program, memory references tend to
cluster
• e.g. loops
Cache
• Small amount of fast memory
• Sits between normal main memory and
CPU
• May be located on CPU chip or module
Cache and Main Memory
Cache/Main Memory Structure
Cache operation – overview
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from
main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which
block of main memory is in each cache
slot
Cache Read Operation - Flowchart
Cache Design
• Addressing
• Size
• Mapping Function
• Replacement Algorithm
• Write Policy
• Block Size
• Number of Caches
Cache Addressing
• Where does cache sit?
— Between processor and virtual memory management
unit
— Between MMU and main memory
• Logical cache (virtual cache) stores data using
virtual addresses
— Processor accesses cache directly, not thorough physical
cache
— Cache access faster, before MMU address translation
— Virtual addresses use same address space for different
applications
– Must flush cache on each context switch
• Physical cache stores data using main memory
physical addresses
Size does matter
• Cost
—More cache is expensive
• Speed
—More cache is faster (up to a point)
—Checking cache for data takes time
Typical Cache Organization
Comparison of Cache Sizes
Year of
Processor Type L1 cache L2 cache L3 cache
Introduction
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
High-end server/
IBM SP 2000 64 KB/32 KB 8 MB —
supercomputer
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
Mapping Function
• Cache of 64kByte
• Cache block of 4 bytes
—i.e. cache is 16k (214) lines of 4 bytes
• 16MBytes main memory
• 24 bit address
—(224=16M)
Direct Mapping
• Each block of main memory maps to only
one cache line
—i.e. if a block is in cache, it must be in one
specific place
• Address is in two parts
• Least Significant w bits identify unique
word
• Most Significant s bits specify one
memory block
• The MSBs are split into a cache line field r
and a tag of s-r (most significant)
Direct Mapping
Address Structure
• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
— 8 bit tag (=22-14)
— 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Direct Mapping from Cache to Main Memory
Direct Mapping
Cache Line Table
1 1,m+1, 2m+1…2s-m+1
…
m-1 m-1, 2m-1,3m-1…2s-1
Direct Mapping Cache Organization
Direct
Mapping
Example
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w
words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+
w/2w = 2s
• Number of lines in cache = m = 2r
• Size of tag = (s – r) bits
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to
the same line repeatedly, cache misses are
very high
Victim Cache
• Lower miss penalty
• Remember what was discarded
—Already fetched
—Use again with little penalty
• Fully associative
• 4 to 16 cache lines
• Between direct mapped L1 cache and next
memory level
Associative Mapping
• A main memory block can load into any
line of cache
• Memory address is interpreted as tag and
word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Associative Mapping from
Cache to Main Memory
Fully Associative Cache Organization
Associative
Mapping
Example
Associative Mapping
Address Structure
Word
Tag 22 bit 2 bit
• 22 bit tag stored with each 32 bit block of data
• Compare tag field with tag entry in cache to
check for hit
• Least significant 2 bits of address identify which
16 bit word is required from 32 bit data block
• e.g.
— Address Tag Data Cache line
— FFFFFC FFFFFC24682468 3FFF
Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w
words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+
w/2w = 2s
• Number of lines in cache = undetermined
• Size of tag = s bits
Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given
set
—e.g. Block B can be in any line of set i
• e.g. 2 lines per set
—2 way associative mapping
—A given block can be in one of 2 lines in only
one set
Set Associative Mapping
Example
• 13 bit set number
• Block number in main memory is modulo
213
• 000000, 00A000, 00B000, 00C000 … map
to same set
Mapping From Main Memory to Cache:
v Associative
Mapping From Main Memory to Cache:
k-way Associative
K-Way Set Associative Cache
Organization
Set Associative Mapping
Address Structure
Word
Tag 9 bit Set 13 bit 2 bit
Contention occurs when both the Instruction Prefetcher and Create separate data and instruction Pentium
the Execution Unit simultaneously require access to the caches.
cache. In that case, the Prefetcher is stalled while the
Execution Unit’s data access takes place.
Core Cache Cache Size (kB) Cache Line Size Associativity Location Write Buffer
Type (words) Size (words)
Chapter 3
Top Level View of Computer
Function and Interconnection
Program Concept
• Hardwired systems are inflexible
• General purpose hardware can do
different tasks, given correct control
signals
• Instead of re-wiring, supply a new set of
control signals
What is a program?
• A sequence of steps
• For each step, an arithmetic or logical
operation is done
• For each operation, a different set of
control signals is needed
Function of Control Unit
• For each operation a unique code is
provided
—e.g. ADD, MOVE
• A hardware segment accepts the code and
issues the control signals
• We have a computer!
Components
• The Control Unit and the Arithmetic and
Logic Unit constitute the Central
Processing Unit
• Data and instructions need to get into the
system and results out
—Input/output
• Temporary storage of code and results is
needed
—Main memory
Computer Components:
Top Level View
Instruction Cycle
• Two steps:
—Fetch
—Execute
Fetch Cycle
• Program Counter (PC) holds address of
next instruction to fetch
• Processor fetches instruction from
memory location pointed to by PC
• Increment PC
—Unless told otherwise
• Instruction loaded into Instruction
Register (IR)
• Processor interprets instruction and
performs required actions
Execute Cycle
• Processor-memory
—data transfer between CPU and main memory
• Processor I/O
—Data transfer between CPU and I/O module
• Data processing
—Some arithmetic or logical operation on data
• Control
—Alteration of sequence of operations
—e.g. jump
• Combination of above
Example of Program Execution
Instruction Cycle State Diagram
Interrupts
• Mechanism by which other modules (e.g.
I/O) may interrupt normal sequence of
processing
• Program
—e.g. overflow, division by zero
• Timer
—Generated by internal processor timer
—Used in pre-emptive multi-tasking
• I/O
—from I/O controller
• Hardware failure
—e.g. memory parity error
Program Flow Control
Interrupt Cycle
• Added to instruction cycle
• Processor checks for interrupt
—Indicated by an interrupt signal
• If no interrupt, fetch next instruction
• If interrupt pending:
—Suspend execution of current program
—Save context
—Set PC to start address of interrupt handler
routine
—Process interrupt
—Restore context and continue interrupted
program
Transfer of Control via Interrupts
Instruction Cycle with Interrupts
Program Timing
Short I/O Wait
Program Timing
Long I/O Wait
Instruction Cycle (with Interrupts) -
State Diagram
Multiple Interrupts
• Disable interrupts
—Processor will ignore further interrupts whilst
processing one interrupt
—Interrupts remain pending and are checked
after first interrupt has been processed
—Interrupts handled in sequence as they occur
• Define priorities
—Low priority interrupts can be interrupted by
higher priority interrupts
—When higher priority interrupt has been
processed, processor returns to previous
interrupt
Multiple Interrupts - Sequential
Multiple Interrupts – Nested
Time Sequence of Multiple Interrupts
Connecting
• All the units must be connected
• Different type of connection for different
type of unit
—Memory
—Input/Output
—CPU
Computer Modules
Memory Connection
• Receives and sends data
• Receives addresses (of locations)
• Receives control signals
—Read
—Write
—Timing
Input/Output Connection(1)
• Similar to memory from computer’s
viewpoint
• Output
—Receive data from computer
—Send data to peripheral
• Input
—Receive data from peripheral
—Send data to computer
Input/Output Connection(2)
• Receive control signals from computer
• Send control signals to peripherals
—e.g. spin disk
• Receive addresses from computer
—e.g. port number to identify peripheral
• Send interrupt signals (control)
CPU Connection
• Reads instruction and data
• Writes out data (after processing)
• Sends control signals to other units
• Receives (& acts on) interrupts
Buses
• There are a number of possible
interconnection systems
• Single and multiple BUS structures are
most common
• e.g. Control/Address/Data bus (PC)
• e.g. Unibus (DEC-PDP)
What is a Bus?
• A communication pathway connecting two
or more devices
• Usually broadcast
• Often grouped
—A number of channels in one bus
—e.g. 32 bit data bus is 32 separate single bit
channels
• Power lines may not be shown
Data Bus
• Carries data
—Remember that there is no difference between
“data” and “instruction” at this level
• Width is a key determinant of
performance
—8, 16, 32, 64 bit
Address bus
• Identify the source or destination of data
• e.g. CPU needs to read an instruction
(data) from a given location in memory
• Bus width determines maximum memory
capacity of system
—e.g. 8080 has 16 bit address bus giving 64k
address space
Control Bus
• Control and timing information
—Memory read/write signal
—Interrupt request
—Clock signals
Bus Interconnection Scheme
Big and Yellow?
• What do buses look like?
—Parallel lines on circuit boards
—Ribbon cables
—Strip connectors on mother boards
– e.g. PCI
—Sets of wires
Physical Realization of Bus Architecture
Single Bus Problems
• Lots of devices on one bus leads to:
—Propagation delays
– Long data paths mean that co-ordination of bus use
can adversely affect performance
– If aggregate data transfer approaches bus capacity
• Most systems use multiple buses to
overcome these problems
Traditional (ISA)
(with cache)
High Performance Bus
Bus Types
• Dedicated
—Separate data & address lines
• Multiplexed
—Shared lines
—Address valid or data valid control line
—Advantage - fewer lines
—Disadvantages
– More complex control
– Ultimate performance
Bus Arbitration
• More than one module controlling the bus
• e.g. CPU and DMA controller
• Only one module may control bus at one
time
• Arbitration may be centralised or
distributed
Centralised or Distributed Arbitration
• Centralised
—Single hardware device controlling bus access
– Bus Controller
– Arbiter
—May be part of CPU or separate
• Distributed
—Each module may claim the bus
—Control logic on all modules
Timing
• Co-ordination of events on bus
• Synchronous
—Events determined by clock signals
—Control Bus includes clock line
—A single 1-0 is a bus cycle
—All devices can read clock line
—Usually sync on leading edge
—Usually a single cycle for an event
Synchronous Timing Diagram
Asynchronous Timing – Read Diagram
Asynchronous Timing – Write Diagram
PCI Bus
• Peripheral Component Interconnection
• Intel released to public domain
• 32 or 64 bit
• 50 lines
PCI Bus Lines (required)
• Systems lines
—Including clock and reset
• Address & Data
—32 time mux lines for address/data
—Interrupt & validate lines
• Interface Control
• Arbitration
—Not shared
—Direct connection to PCI bus arbiter
• Error lines
PCI Bus Lines (Optional)
• Interrupt lines
—Not shared
• Cache support
• 64-bit Bus Extension
—Additional 32 lines
—Time multiplexed
—2 lines to enable devices to agree to use 64-
bit transfer
• JTAG/Boundary Scan
—For testing procedures
PCI Commands
• Transaction between initiator (master)
and target
• Master claims bus
• Determine type of transaction
—e.g. I/O read/write
• Address phase
• One or more data phases
PCI Read Timing Diagram
PCI Bus Arbiter
PCI Bus Arbitration
Foreground Reading
• Stallings, chapter 3 (all of it)
• www.pcguide.com/ref/mbsys/buses/
Chapter 2
Computer Evolution and
Performance
ENIAC - background
• Electronic Numerical Integrator And
Computer
• Eckert and Mauchly
• University of Pennsylvania
• Trajectory tables for weapons
• Started 1943
• Finished 1946
—Too late for war effort
• Used until 1955
ENIAC - details
• Decimal (not binary)
• 20 accumulators of 10 digits
• Programmed manually by switches
• 18,000 vacuum tubes
• 30 tons
• 15,000 square feet
• 140 kW power consumption
• 5,000 additions per second
von Neumann/Turing
• Stored Program concept
• Main memory storing programs and data
• ALU operating on binary data
• Control unit interpreting instructions from
memory and executing
• Input and output equipment operated by
control unit
• Princeton Institute for Advanced Studies
—IAS
• Completed 1952
Structure of von Neumann machine
IAS - details
• 1000 x 40 bit words
—Binary number
—2 x 20 bit instructions
• Set of registers (storage in CPU)
—Memory Buffer Register
—Memory Address Register
—Instruction Register
—Instruction Buffer Register
—Program Counter
—Accumulator
—Multiplier Quotient
Structure of IAS –
detail
Commercial Computers
• 1947 - Eckert-Mauchly Computer
Corporation
• UNIVAC I (Universal Automatic Computer)
• US Bureau of Census 1950 calculations
• Became part of Sperry-Rand Corporation
• Late 1950s - UNIVAC II
—Faster
—More memory
IBM
• Punched-card processing equipment
• 1953 - the 701
—IBM’s first stored program computer
—Scientific calculations
• 1955 - the 702
—Business applications
• Lead to 700/7000 series
Transistors
• Replaced vacuum tubes
• Smaller
• Cheaper
• Less heat dissipation
• Solid State device
• Made from Silicon (Sand)
• Invented 1947 at Bell Labs
• William Shockley et al.
Transistor Based Computers
• Second generation machines
• NCR & RCA produced small transistor
machines
• IBM 7000
• DEC - 1957
—Produced PDP-1
Microelectronics
• Literally - “small electronics”
• A computer is made up of gates, memory
cells and interconnections
• These can be manufactured on a
semiconductor
• e.g. silicon wafer
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
—Up to 100 devices on a chip
• Medium scale integration - to 1971
—100-3,000 devices on a chip
• Large scale integration - 1971-1977
—3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
—100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 -
—Over 100,000,000 devices on a chip
Moore’s Law
• Increased density of components on chip
• Gordon Moore – co-founder of Intel
• Number of transistors on a chip will double every
year
• Since 1970’s development has slowed a little
— Number of transistors doubles every 18 months
• Cost of a chip has remained almost unchanged
• Higher packing density means shorter electrical
paths, giving higher performance
• Smaller size gives increased flexibility
• Reduced power and cooling requirements
• Fewer interconnections increases reliability
Growth in CPU Transistor Count
IBM 360 series
• 1964
• Replaced (& not compatible with) 7000
series
• First planned “family” of computers
—Similar or identical instruction sets
—Similar or identical O/S
—Increasing speed
—Increasing number of I/O ports (i.e. more
terminals)
—Increased memory size
—Increased cost
• Multiplexed switch structure
DEC PDP-8
• 1964
• First minicomputer (after miniskirt!)
• Did not need air conditioned room
• Small enough to sit on a lab bench
• $16,000
—$100k+ for IBM 360
• Embedded applications & OEM
• BUS STRUCTURE
DEC - PDP-8 Bus Structure
Semiconductor Memory
• 1970
• Fairchild
• Size of a single core
—i.e. 1 bit of magnetic core storage
• Holds 256 bits
• Non-destructive read
• Much faster than core
• Capacity approximately doubles each year
Intel
• 1971 - 4004
—First microprocessor
—All CPU components on a single chip
—4 bit
• Followed in 1972 by 8008
—8 bit
—Both designed for specific applications
• 1974 - 8080
—Intel’s first general purpose microprocessor
Speeding it up
• Pipelining
• On board cache
• On board L1 & L2 cache
• Branch prediction
• Data flow analysis
• Speculative execution
Performance Balance
• Processor speed increased
• Memory capacity increased
• Memory speed lags behind processor
speed
Login and Memory Performance Gap
Solutions
• Increase number of bits retrieved at one
time
—Make DRAM “wider” rather than “deeper”
• Change DRAM interface
—Cache
• Reduce frequency of memory access
—More complex cache and cache on chip
• Increase interconnection bandwidth
—High speed buses
—Hierarchy of buses
I/O Devices
• Peripherals with intensive I/O demands
• Large data throughput demands
• Processors can handle this
• Problem moving data
• Solutions:
—Caching
—Buffering
—Higher-speed interconnection buses
—More elaborate bus structures
—Multiple-processor configurations
Typical I/O Device Data Rates
Key is Balance
• Processor components
• Main memory
• I/O devices
• Interconnection structures
Improvements in Chip Organization and
Architecture
• Increase hardware speed of processor
—Fundamentally due to shrinking logic gate size
– More gates, packed more tightly, increasing clock
rate
– Propagation time for signals reduced
• Increase size and speed of caches
—Dedicating part of processor chip
– Cache access times drop significantly
• Change processor organization and
architecture
—Increase effective speed of execution
—Parallelism
Problems with Clock Speed and Login
Density
• Power
— Power density increases with density of logic and clock
speed
— Dissipating heat
• RC delay
— Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them
— Delay increases as RC product increases
— Wire interconnects thinner, increasing resistance
— Wires closer together, increasing capacitance
• Memory latency
— Memory speeds lag processor speeds
• Solution:
— More emphasis on organizational and architectural
approaches
Intel Microprocessor Performance
Increased Cache Capacity
• Typically two or three levels of cache
between processor and main memory
• Chip density increased
—More cache memory on chip
– Faster cache access
• Pentium chip devoted about 10% of chip
area to cache
• Pentium 4 devotes about 50%
More Complex Execution Logic
• Enable parallel execution of instructions
• Pipeline works like assembly line
—Different stages of execution of different
instructions at same time along pipeline
• Superscalar allows multiple pipelines
within single processor
—Instructions that do not depend on one
another can be executed in parallel
Diminishing Returns
• Internal organization of processors
complex
—Can get a great deal of parallelism
—Further significant increases likely to be
relatively modest
• Benefits from cache are reaching limit
• Increasing clock rate runs into power
dissipation problem
—Some fundamental physical limits are being
reached
New Approach – Multiple Cores
• Multiple processors on single chip
— Large shared cache
• Within a processor, increase in performance
proportional to square root of increase in
complexity
• If software can use multiple processors, doubling
number of processors almost doubles
performance
• So, use two simpler processors on the chip rather
than one more complex processor
• With two processors, larger caches are justified
— Power consumption of memory logic less than
processing logic
x86 Evolution (1)
• 8080
— first general purpose microprocessor
— 8 bit data path
— Used in first personal computer – Altair
• 8086 – 5MHz – 29,000 transistors
— much more powerful
— 16 bit
— instruction cache, prefetch few instructions
— 8088 (8 bit external bus) used in first IBM PC
• 80286
— 16 Mbyte memory addressable
— up from 1Mb
• 80386
— 32 bit
— Support for multitasking
• 80486
— sophisticated powerful cache and instruction pipelining
— built in maths co-processor
x86 Evolution (2)
• Pentium
— Superscalar
— Multiple instructions executed in parallel
• Pentium Pro
— Increased superscalar organization
— Aggressive register renaming
— branch prediction
— data flow analysis
— speculative execution
• Pentium II
— MMX technology
— graphics, video & audio processing
• Pentium III
— Additional floating point instructions for 3D graphics
x86 Evolution (3)
• Pentium 4
— Note Arabic rather than Roman numerals
— Further floating point and multimedia enhancements
• Core
— First x86 with dual core
• Core 2
— 64 bit architecture
• Core 2 Quad – 3GHz – 820 million transistors
— Four processors on chip
• Conclusions
— f small, parallel processors has little effect
— N ->∞, speedup bound by 1/(1 – f)
– Diminishing returns for using more processors
Internet Resources
• http://www.intel.com/
—Search for the Intel Museum
• http://www.ibm.com
• http://www.dec.com
• Charles Babbage Institute
• PowerPC
• Intel Developer Home
References
• AMDA67 Amdahl, G. “Validity of the
Single-Processor Approach to Achieving
Large-Scale Computing Capability”,
Proceedings of the AFIPS Conference,
1967.
William Stallings
Computer Organization
and Architecture
8th Edition
Chapter 1
Introduction
Architecture & Organization 1
• Architecture is those attributes visible to
the programmer
—Instruction set, number of bits used for data
representation, I/O mechanisms, addressing
techniques.
—e.g. Is there a multiply instruction?
• Organization is how features are
implemented
—Control signals, interfaces, memory
technology.
—e.g. Is there a hardware multiply unit or is it
done by repeated addition?
Architecture & Organization 2
• All Intel x86 family share the same basic
architecture
• The IBM System/370 family share the
same basic architecture
Peripherals Computer
Central Main
Processing Memory
Unit
Computer
Systems
Interconnection
Input
Output
Communication
lines
Structure - The CPU
CPU
Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection
Control
Unit
Structure - The Control Unit
Control Unit
CPU
Sequencing
ALU Login
Control
Internal
Unit
Bus
Control Unit
Registers Registers and
Decoders
Control
Memory
Outline of the Book (1)
• Computer Evolution and Performance
• Computer Interconnection Structures
• Internal Memory
• External Memory
• Input/Output
• Operating Systems Support
• Computer Arithmetic
• Instruction Sets
Outline of the Book (2)
• CPU Structure and Function
• Reduced Instruction Set Computers
• Superscalar Processors
• Control Unit Operation
• Microprogrammed Control
• Multiprocessors and Vector Processing
• Digital Logic (Appendix)
Internet Resources
- Web site for book
• http://WilliamStallings.com/COA/COA7e.html
— links to sites of interest
— links to sites for courses that use the book
— errata list for book
— information on other books by W. Stallings
• http://WilliamStallings.com/StudentSupport.html
— Math
— How-to
— Research resources
— Misc
Internet Resources
- Web sites to look for
• WWW Computer Architecture Home Page
• CPU Info Center
• Processor Emporium
• ACM Special Interest Group on Computer
Architecture
• IEEE Technical Committee on Computer
Architecture
• Intel Technology Journal
• Manufacturer’s sites
—Intel, IBM, etc.
Internet Resources
- Usenet News Groups
• comp.arch
• comp.arch.arithmetic
• comp.arch.storage
• comp.parallel