Computer Book 1

William Stallings
Computer Organization
and Architecture
8th Edition
Chapter 6
External Memory
Types of External Memory
• Magnetic Disk
—RAID
—Removable
• Optical
—CD-ROM
—CD-Recordable (CD-R)
—CD-R/W
—DVD
• Magnetic Tape
Magnetic Disk
• Disk substrate coated with magnetizable
material (iron oxide…rust)
• Substrate used to be aluminium
• Now glass
—Improved surface uniformity
– Increases reliability
—Reduction in surface defects
– Reduced read/write errors
—Lower flight heights (See later)
—Better stiffness
—Better shock/damage resistance
Read and Write Mechanisms
• Recording & retrieval via conductive coil called a head
• May be single read/write head or separate ones
• During read/write, head is stationary, platter rotates
• Write
— Current through coil produces magnetic field
— Pulses sent to head
— Magnetic pattern recorded on surface below
• Read (traditional)
— Magnetic field moving relative to coil produces current
— Coil is the same for read and write
• Read (contemporary)
— Separate read head, close to write head
— Partially shielded magneto resistive (MR) sensor
— Electrical resistance depends on direction of magnetic field
— High frequency operation
– Higher storage density and speed
Inductive Write MR Read
Data Organization and Formatting
• Concentric rings or tracks
—Gaps between tracks
—Reduce gap to increase capacity
—Same number of bits per track (variable
packing density)
—Constant angular velocity
• Tracks divided into sectors
• Minimum block size is one sector
• May have more than one sector per block
Disk Data Layout
Disk Velocity
• Bit near centre of rotating disk passes fixed point
slower than bit on outside of disk
• Increase spacing between bits in different tracks
• Rotate disk at constant angular velocity (CAV)
— Gives pie shaped sectors and concentric tracks
— Individual tracks and sectors addressable
— Move head to given track and wait for given sector
— Waste of space on outer tracks
– Lower data density
• Can use zones to increase capacity
— Each zone has fixed bits per track
— More complex circuitry
Disk Layout Methods Diagram
Finding Sectors
• Must be able to identify start of track and
sector
• Format disk
—Additional information not available to user
—Marks tracks and sectors
Winchester Disk Format
Seagate ST506
Characteristics
• Fixed (rare) or movable head
• Removable or fixed
• Single or double (usually) sided
• Single or multiple platter
• Head mechanism
—Contact (Floppy)
—Fixed gap
—Flying (Winchester)
Fixed/Movable Head Disk
• Fixed head
—One read write head per track
—Heads mounted on fixed ridged arm
• Movable head
—One read write head per side
—Mounted on a movable arm
Removable or Not
• Removable disk
—Can be removed from drive and replaced with
another disk
—Provides unlimited storage capacity
—Easy data transfer between systems
• Nonremovable disk
—Permanently mounted in the drive
Multiple Platter
• One head per side
• Heads are joined and aligned
• Aligned tracks on each platter form
cylinders
• Data is striped by cylinder
—reduces head movement
—Increases speed (transfer rate)
Multiple Platters
Tracks and Cylinders
Floppy Disk
• 8”, 5.25”, 3.5”
• Small capacity
—Up to 1.44Mbyte (2.88M never popular)
• Slow
• Universal
• Cheap
• Obsolete?
Winchester Hard Disk (1)
• Developed by IBM in Winchester (USA)
• Sealed unit
• One or more platters (disks)
• Heads fly on boundary layer of air as disk
spins
• Very small head to disk gap
• Getting more robust
Winchester Hard Disk (2)
• Universal
• Cheap
• Fastest external storage
• Getting larger all the time
—250 Gigabyte now easily available
Speed
• Seek time
—Moving head to correct track
• (Rotational) latency
—Waiting for data to rotate under head
• Access time = Seek + Latency
• Transfer rate
Timing of Disk I/O Transfer
RAID
• Redundant Array of Independent Disks
• Redundant Array of Inexpensive Disks
• 6 levels in common use
• Not a hierarchy
• Set of physical disks viewed as single
logical drive by O/S
• Data distributed across physical drives
• Can use redundant capacity to store
parity information
RAID 0
• No redundancy
• Data striped across all disks
• Round Robin striping
• Increase speed
—Multiple data requests probably not on same
disk
—Disks seek in parallel
—A set of data is likely to be striped across
multiple disks
RAID 1
• Mirrored Disks
• Data is striped across disks
• 2 copies of each stripe on separate disks
• Read from either
• Write to both
• Recovery is simple
—Swap faulty disk & re-mirror
—No down time
• Expensive
RAID 2
• Disks are synchronized
• Very small stripes
—Often single byte/word
• Error correction calculated across
corresponding bits on disks
• Multiple parity disks store Hamming code
error correction in corresponding positions
• Lots of redundancy
—Expensive
—Not used
RAID 3
• Similar to RAID 2
• Only one redundant disk, no matter how
large the array
• Simple parity bit for each set of
corresponding bits
• Data on failed drive can be reconstructed
from surviving data and parity info
• Very high transfer rates
RAID 4
• Each disk operates independently
• Good for high I/O request rate
• Large stripes
• Bit by bit parity calculated across stripes
on each disk
• Parity stored on parity disk
RAID 5
• Like RAID 4
• Parity striped across all disks
• Round robin allocation for parity stripe
• Avoids RAID 4 bottleneck at parity disk
• Commonly used in network servers
• N.B. DOES NOT MEAN 5 DISKS!!!!!

RAID 6
• Two parity calculations
• Stored in separate blocks on different
disks
• User requirement of N disks needs N+2
• High data availability
—Three disks need to fail for data loss
—Significant write penalty
RAID 0, 1, 2
RAID 3 & 4
RAID 5 & 6
Data Mapping For RAID 0
Optical Storage CD-ROM
• Originally for audio
• 650Mbytes giving over 70 minutes audio
• Polycarbonate coated with highly
reflective coat, usually aluminium
• Data stored as pits
• Read by reflecting laser
• Constant packing density
• Constant linear velocity
CD Operation
CD-ROM Drive Speeds
• Audio is single speed
—Constant linier velocity
—1.2 ms-1
—Track (spiral) is 5.27km long
—Gives 4391 seconds = 73.2 minutes
• Other speeds are quoted as multiples
• e.g. 24x
• Quoted figure is maximum drive can
achieve
CD-ROM Format
• Mode 0=blank data field

• Mode 1=2048 byte data+error correction
• Mode 2=2336 byte data
Random Access on CD-ROM
• Difficult
• Move head to rough position
• Set correct speed
• Read address
• Adjust to required location
• (Yawn!)
CD-ROM for & against
• Large capacity (?)
• Easy to mass produce
• Removable
• Robust
• Expensive for small runs

• Slow
• Read only
Other Optical Storage
• CD-Recordable (CD-R)
—WORM
—Now affordable
—Compatible with CD-ROM drives
• CD-RW
—Erasable
—Getting cheaper
—Mostly CD-ROM drive compatible
—Phase change
– Material has two different reflectivities in different
phase states
DVD - what’s in a name?
• Digital Video Disk
—Used to indicate a player for movies
– Only plays video disks
• Digital Versatile Disk
—Used to indicate a computer drive
– Will read computer disks and play video disks
• Dogs Veritable Dinner
• Officially - nothing!!!
DVD - technology
• Multi-layer
• Very high capacity (4.7G per layer)
• Full length movie on single disk
—Using MPEG compression
• Finally standardized (honest!)
• Movies carry regional coding
• Players only play correct region films
• Can be “fixed”
DVD – Writable
• Loads of trouble with standards
• First generation DVD drives may not read
first generation DVD-W disks
• First generation DVD drives may not read
CD-RW disks
• Wait for it to settle down before buying!
CD and DVD
High Definition Optical Disks
• Designed for high definition videos
• Much higher capacity than DVD
—Shorter wavelength laser
– Blue-violet range
—Smaller pits
• HD-DVD
—15GB single side single layer
• Blue-ray
—Data layer closer to laser
– Tighter focus, less distortion, smaller pits
—25GB on single layer
—Available read only (BD-ROM), Recordable
once (BR-R) and re-recordable (BR-RE)
Optical Memory Characteristics
Magnetic Tape
• Serial access
• Slow
• Very cheap
• Backup and archive
• Linear Tape-Open (LTO) Tape Drives
—Developed late 1990s
—Open source alternative to proprietary tape
systems
Linear Tape-Open (LTO) Tape Drives
LTO-1 LTO-2 LTO-3 LTO-4 LTO-5 LTO-6
Release date 2000 2003 2005 2007 TBA TBA
Compressed capacity 200 GB 400 GB 800 GB 1600 GB 3.2 TB 6.4 TB
Compressed transfer 40 80 160 240 360 540

rate (MB/s)
Linear density 4880 7398 9638 13300

(bits/mm)
Tape tracks 384 512 704 896
Tape length 609 m 609 m 680 m 820 m
Tape width (cm) 1.27 1.27 1.27 1.27
Write elements 8 8 16 16
Internet Resources
• Optical Storage Technology Association
—Good source of information about optical
storage technology and vendors
—Extensive list of relevant links
• DLTtape
—Good collection of technical information and
links to vendors
• Search on RAID
William Stallings
and Architecture
8th Edition
Chapter 5
Internal Memory
Semiconductor Memory Types
Memory Type Category Erasure Write Mechanism Volatility
Random-access
Read-write memory Electrically, byte-level Electrically Volatile
memory (RAM)
Read-only
Masks
memory (ROM)
Read-only memory Not possible
Programmable
ROM (PROM)
Erasable PROM
UV light, chip-level
(EPROM) Nonvolatile
Electrically
Electrically Erasable Read-mostly memory

Electrically, byte-level
PROM (EEPROM)
Flash memory Electrically, block-level

Semiconductor Memory
• RAM
—Misnamed as all semiconductor memory is
random access
—Read/Write
—Volatile
—Temporary storage
—Static or dynamic
Memory Cell Operation
Dynamic RAM
• Bits stored as charge in capacitors
• Charges leak
• Need refreshing even when powered
• Simpler construction
• Smaller per bit
• Less expensive
• Need refresh circuits
• Slower
• Main memory
• Essentially analogue
—Level of charge determines value
Dynamic RAM Structure
DRAM Operation
• Address line active when bit read or written
— Transistor switch closed (current flows)
• Write
— Voltage to bit line
– High for 1 low for 0
— Then signal address line
– Transfers charge to capacitor
• Read
— Address line selected
– transistor turns on
— Charge from capacitor fed via bit line to sense amplifier
– Compares with reference value to determine 0 or 1
— Capacitor charge must be restored
Static RAM
• Bits stored as on/off switches
• No charges to leak
• No refreshing needed when powered
• More complex construction
• Larger per bit
• More expensive
• Does not need refresh circuits
• Faster
• Cache
• Digital
—Uses flip-flops
Stating RAM Structure
Static RAM Operation
• Transistor arrangement gives stable logic
state
• State 1
—C1 high, C2 low
—T1 T4 off, T2 T3 on
• State 0
—C2 high, C1 low
—T2 T3 off, T1 T4 on
• Address line transistors T5 T6 is switch
• Write – apply value to B & compliment to
B
• Read – value is on line B
SRAM v DRAM
• Both volatile
—Power needed to preserve data
• Dynamic cell
—Simpler to build, smaller
—More dense
—Less expensive
—Needs refresh
—Larger memory units
• Static
—Faster
—Cache
Read Only Memory (ROM)
• Permanent storage
—Nonvolatile
• Microprogramming (see later)
• Library subroutines
• Systems programs (BIOS)
• Function tables
Types of ROM
• Written during manufacture
—Very expensive for small runs
• Programmable (once)
—PROM
—Needs special equipment to program
• Read “mostly”
—Erasable Programmable (EPROM)
– Erased by UV
—Electrically Erasable (EEPROM)
– Takes much longer to write than read
—Flash memory
– Erase whole memory electrically
Organisation in detail
• A 16Mbit chip can be organised as 1M of
16 bit words
• A bit per chip system has 16 lots of 1Mbit
chip with bit 1 of each word in chip 1 and
so on
• A 16Mbit chip can be organised as a 2048
x 2048 x 4bit array
—Reduces number of address pins
– Multiplex row address and column address
– 11 pins to address (211=2048)
– Adding one more pin doubles range of values so x4
capacity
Refreshing
• Refresh circuit included on chip
• Disable chip
• Count through rows
• Read & Write back
• Takes time
• Slows down apparent performance
Typical 16 Mb DRAM (4M x 4)
Packaging
256kByte Module
Organisation
1MByte Module Organisation
Interleaved Memory
• Collection of DRAM chips
• Grouped into memory bank
• Banks independently service read or write
requests
• K banks can service k requests
simultaneously
Error Correction
• Hard Failure
—Permanent defect
• Soft Error
—Random, non-destructive
—No permanent damage to memory
• Detected using Hamming error correcting
code
Error Correcting Code Function
Advanced DRAM Organization
• Basic DRAM same since first RAM chips
• Enhanced DRAM
—Contains small SRAM as well
—SRAM holds last line read (c.f. Cache!)
• Cache DRAM
—Larger SRAM component
—Use as cache or serial buffer
Synchronous DRAM (SDRAM)
• Access is synchronized with an external clock
• Address is presented to RAM
• RAM finds data (CPU waits in conventional
DRAM)
• Since SDRAM moves data in time with system
clock, CPU knows when data will be ready
• CPU does not have to wait, it can do something
else
• Burst mode allows SDRAM to set up stream of
data and fire it out in block
• DDR-SDRAM sends data twice per clock cycle
(leading & trailing edge)
SDRAM
SDRAM Read Timing
RAMBUS
• Adopted by Intel for Pentium & Itanium
• Main competitor to SDRAM
• Vertical package – all pins on one side
• Data exchange over 28 wires < cm long
• Bus addresses up to 320 RDRAM chips at
1.6Gbps
• Asynchronous block protocol
—480ns access time
—Then 1.6 Gbps
RAMBUS Diagram
DDR SDRAM
• SDRAM can only send data once per clock
• Double-data-rate SDRAM can send data
twice per clock cycle
—Rising edge and falling edge
DDR SDRAM
Read Timing
Simplified DRAM Read Timing
Cache DRAM
• Mitsubishi
• Integrates small SRAM cache (16 kb) onto
generic DRAM chip
• Used as true cache
—64-bit lines
—Effective for ordinary random access
• To support serial access of block of data
—E.g. refresh bit-mapped screen
– CDRAM can prefetch data from DRAM into SRAM
buffer
– Subsequent accesses solely to SRAM
Reading
• The RAM Guide
• RDRAM
William Stallings
and Architecture
8th Edition
Chapter 4
Cache Memory
Characteristics
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical characteristics
• Organisation
Location
• CPU
• Internal
• External
Capacity
• Word size
—The natural unit of organisation
• Number of words
—or Bytes
Unit of Transfer
• Internal
—Usually governed by data bus width
• External
—Usually a block which is much larger than a
word
• Addressable unit
—Smallest location which can be uniquely
addressed
—Word internally
—Cluster on M$ disks
Access Methods (1)
• Sequential
—Start at the beginning and read through in
order
—Access time depends on location of data and
previous location
—e.g. tape
• Direct
—Individual blocks have unique address
—Access is by jumping to vicinity plus
sequential search
—Access time depends on location and previous
location
—e.g. disk
Access Methods (2)
• Random
—Individual addresses identify locations exactly
—Access time is independent of location or
previous access
—e.g. RAM
• Associative
—Data is located by a comparison with contents
of a portion of the store
—Access time is independent of location or
previous access
—e.g. cache
Memory Hierarchy
• Registers
—In CPU
• Internal or Main memory
—May include one or more levels of cache
—“RAM”
• External memory
—Backing store
Memory Hierarchy - Diagram
Performance
• Access time
—Time between presenting the address and
getting the valid data
• Memory Cycle time
—Time may be required for the memory to
“recover” before next access
—Cycle time is access + recovery
• Transfer Rate
—Rate at which data can be moved
Physical Types
• Semiconductor
—RAM
• Magnetic
—Disk & Tape
• Optical
—CD & DVD
• Others
—Bubble
—Hologram
Physical Characteristics
• Decay
• Volatility
• Erasable
• Power consumption
Organisation
• Physical arrangement of bits into words
• Not always obvious
• e.g. interleaved
The Bottom Line
• How much?
—Capacity
• How fast?
—Time is money
• How expensive?
Hierarchy List
• Registers
• L1 Cache
• L2 Cache
• Main memory
• Disk cache
• Disk
• Optical
• Tape
So you want fast?
• It is possible to build a computer which
uses only static RAM (see later)
• This would be very fast
• This would need no cache
—How can you cache cache?
• This would cost a very large amount
Locality of Reference
• During the course of the execution of a
program, memory references tend to
cluster
• e.g. loops
Cache
• Small amount of fast memory
• Sits between normal main memory and
CPU
• May be located on CPU chip or module
Cache and Main Memory
Cache/Main Memory Structure
Cache operation – overview
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from
main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which
block of main memory is in each cache
slot
Cache Read Operation - Flowchart
Cache Design
• Addressing
• Size
• Mapping Function
• Replacement Algorithm
• Write Policy
• Block Size
• Number of Caches
Cache Addressing
• Where does cache sit?
— Between processor and virtual memory management
unit
— Between MMU and main memory
• Logical cache (virtual cache) stores data using
virtual addresses
— Processor accesses cache directly, not thorough physical
cache
— Cache access faster, before MMU address translation
— Virtual addresses use same address space for different
applications
– Must flush cache on each context switch
• Physical cache stores data using main memory
physical addresses
Size does matter
• Cost
—More cache is expensive
• Speed
—More cache is faster (up to a point)
—Checking cache for data takes time
Typical Cache Organization
Comparison of Cache Sizes
Year of
Processor Type L1 cache L2 cache L3 cache
Introduction
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
High-end server/
IBM SP 2000 64 KB/32 KB 8 MB —
supercomputer
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
Mapping Function
• Cache of 64kByte
• Cache block of 4 bytes
—i.e. cache is 16k (214) lines of 4 bytes
• 16MBytes main memory
• 24 bit address
—(224=16M)
Direct Mapping
• Each block of main memory maps to only
one cache line
—i.e. if a block is in cache, it must be in one
specific place
• Address is in two parts
• Least Significant w bits identify unique
word
• Most Significant s bits specify one
memory block
• The MSBs are split into a cache line field r
and a tag of s-r (most significant)
Direct Mapping
Address Structure
Tag s-r Line or Slot r Word w

8 14 2
• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
— 8 bit tag (=22-14)
— 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Direct Mapping from Cache to Main Memory
Direct Mapping
Cache Line Table
Cache line Main Memory blocks held

0 0, m, 2m, 3m…2s-m
1 1,m+1, 2m+1…2s-m+1
…
m-1 m-1, 2m-1,3m-1…2s-1
Direct Mapping Cache Organization
Direct
Mapping
Example
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w
words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+
w/2w = 2s
• Number of lines in cache = m = 2r
• Size of tag = (s – r) bits
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to
the same line repeatedly, cache misses are
very high
Victim Cache
• Lower miss penalty
• Remember what was discarded
—Already fetched
—Use again with little penalty
• Fully associative
• 4 to 16 cache lines
• Between direct mapped L1 cache and next
memory level
Associative Mapping
• A main memory block can load into any
line of cache
• Memory address is interpreted as tag and
word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Associative Mapping from
Cache to Main Memory
Fully Associative Cache Organization
Associative
Mapping
Example
Associative Mapping
Address Structure
Word
Tag 22 bit 2 bit
• 22 bit tag stored with each 32 bit block of data
• Compare tag field with tag entry in cache to
check for hit
• Least significant 2 bits of address identify which
16 bit word is required from 32 bit data block
• e.g.
— Address Tag Data Cache line
— FFFFFC FFFFFC24682468 3FFF
Associative Mapping Summary
words or bytes
• Number of blocks in main memory = 2s+
w/2w = 2s
• Number of lines in cache = undetermined
• Size of tag = s bits
Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given
set
—e.g. Block B can be in any line of set i
• e.g. 2 lines per set
—2 way associative mapping
—A given block can be in one of 2 lines in only
one set
Example
• 13 bit set number
• Block number in main memory is modulo
213
• 000000, 00A000, 00B000, 00C000 … map
to same set
Mapping From Main Memory to Cache:
v Associative
Mapping From Main Memory to Cache:
k-way Associative
K-Way Set Associative Cache
Organization
Address Structure
Word
Tag 9 bit Set 13 bit 2 bit
• Use set field to determine cache set to

look in
• Compare tag field to see if we have a hit
• e.g
—Address Tag Data Set
number
—1FF 7FFC 1FF 12345678 1FFF
—001 7FFC 001 11223344 1FFF
Two Way Set Associative Mapping
Example
Set Associative Mapping Summary
words or bytes
• Number of blocks in main memory = 2d
• Number of lines in set = k
• Number of sets = v = 2d
• Number of lines in cache = kv = k * 2d
• Size of tag = (s – d) bits
Direct and Set Associative Cache
Performance Differences
• Significant up to at least 64kB for 2-way
• Difference between 2-way and 4-way at
4kB much less than 4kB to 8kB
• Cache complexity increases with
associativity
• Not justified against increasing cache to
8kB or 16kB
• Above 32kB gives no improvement
• (simulation results)
Figure 4.16
Varying Associativity over Cache Size
Replacement Algorithms (1)
Direct mapping
• No choice
• Each block only maps to one line
• Replace that line
Replacement Algorithms (2)
Associative & Set Associative
• Hardware implemented algorithm (speed)
• Least Recently used (LRU)
• e.g. in 2 way set associative
—Which of the 2 block is lru?
• First in first out (FIFO)
—replace block that has been in cache longest
• Least frequently used
—replace block which has had fewest hits
• Random
Write Policy
• Must not overwrite a cache block unless
main memory is up to date
• Multiple CPUs may have individual caches
• I/O may address main memory directly
Write through
• All writes go to main memory as well as
cache
• Multiple CPUs can monitor main memory
traffic to keep local (to CPU) cache up to
date
• Lots of traffic
• Slows down writes
• Remember bogus write through caches!

Write back
• Updates initially made in cache only
• Update bit for cache slot is set when
update occurs
• If block is to be replaced, write to main
memory only if update bit is set
• Other caches get out of sync
• I/O must access main memory through
cache
• N.B. 15% of memory references are
writes
Line Size
• Retrieve not only desired word but a number of
adjacent words as well
• Increased block size will increase hit ratio at first
— the principle of locality
• Hit ratio will decreases as block becomes even
bigger
— Probability of using newly fetched information becomes
less than probability of reusing replaced
• Larger blocks
— Reduce number of blocks that fit in cache
— Data overwritten shortly after being fetched
— Each additional word is less local so less likely to be
needed
• No definitive optimum value has been found
• 8 to 64 bytes seems reasonable
• For HPC systems, 64- and 128-byte most
common
Multilevel Caches
• High logic density enables caches on chip
—Faster than bus access
—Frees bus for other transfers
• Common to use both on and off chip
cache
—L1 on chip, L2 off chip in static RAM
—L2 access much faster than DRAM or ROM
—L2 often uses separate data path
—L2 may now be on chip
—Resulting in L3 cache
– Bus access or now on chip…
Hit Ratio (L1 & L2)
For 8 kbytes and 16 kbyte L1
Unified v Split Caches
• One cache for data and instructions or
two, one for data and one for instructions
• Advantages of unified cache
—Higher hit rate
– Balances load of instruction and data fetch
– Only one cache to design & implement
• Advantages of split cache
—Eliminates cache contention between
instruction fetch/decode unit and execution
unit
– Important in pipelining
Pentium 4 Cache
• 80386 – no on chip cache
• 80486 – 8k using 16 byte lines and four way set
associative organization
• Pentium (all versions) – two on chip L1 caches
— Data & instructions
• Pentium III – L3 cache added off chip
• Pentium 4
— L1 caches
– 8k bytes
– 64 byte lines
– four way set associative
— L2 cache
– Feeding both L1 caches
– 256k
– 128 byte lines
– 8 way set associative
— L3 cache on chip
Intel Cache Evolution
Processor on which feature
Problem Solution first appears
Add external cache using faster 386
External memory slower than the system bus. memory technology.
Move external cache on-chip, 486

Increased processor speed results in external bus becoming a operating at the same speed as the
bottleneck for cache access. processor.
Add external L2 cache using faster 486

Internal cache is rather small, due to limited space on chip technology than main memory
Contention occurs when both the Instruction Prefetcher and Create separate data and instruction Pentium
the Execution Unit simultaneously require access to the caches.
cache. In that case, the Prefetcher is stalled while the
Execution Unit’s data access takes place.
Create separate back-side bus that Pentium Pro

runs at higher speed than the main
(front-side) external bus. The BSB is
Increased processor speed results in external bus becoming a dedicated to the L2 cache.
bottleneck for L2 cache access.
Move L2 cache on to the processor Pentium II
chip.
Add external L3 cache. Pentium III

Some applications deal with massive databases and must
have rapid access to large amounts of data. The on-chip
caches are too small. Move L3 cache on-chip. Pentium 4
Pentium 4 Block Diagram
Pentium 4 Core Processor
• Fetch/Decode Unit
— Fetches instructions from L2 cache
— Decode into micro-ops
— Store micro-ops in L1 cache
• Out of order execution logic
— Schedules micro-ops
— Based on data dependence and resources
— May speculatively execute
• Execution units
— Execute micro-ops
— Data from L1 cache
— Results in registers
• Memory subsystem
— L2 cache and systems bus
Pentium 4 Design Reasoning
• Decodes instructions into RISC like micro-ops before L1
cache
• Micro-ops fixed length
— Superscalar pipelining and scheduling
• Pentium instructions long & complex
• Performance improved by separating decoding from
scheduling & pipelining
— (More later – ch14)
• Data cache is write back
— Can be configured to write through
• L1 cache controlled by 2 bits in register
— CD = cache disable
— NW = not write through
— 2 instructions to invalidate (flush) cache and write back then
invalidate
• L2 and L3 8-way set-associative
— Line size 128 bytes
ARM Cache Features
Core Cache Cache Size (kB) Cache Line Size Associativity Location Write Buffer
Type (words) Size (words)
ARM720T Unified 8 4 4-way Logical 8
ARM920T Split 16/16 D/I 8 64-way Logical 16
ARM926EJ-S Split 4-128/4-128 D/I 8 4-way Logical 16
ARM1022E Split 16/16 D/I 8 64-way Logical 16
ARM1026EJ-S Split 4-128/4-128 D/I 8 4-way Logical 8
Intel StrongARM Split 16/16 D/I 4 32-way Logical 32
Intel Xscale Split 32/32 D/I 8 32-way Logical 32
ARM1136-JF-S Split 4-64/4-64 D/I 8 4-way Physical 32

ARM Cache Organization
• Small FIFO write buffer
—Enhances memory write performance
—Between cache and main memory
—Small c.f. cache
—Data put in write buffer at processor clock
speed
—Processor continues execution
—External write in parallel until empty
—If buffer full, processor stalls
—Data in write buffer not available until written
– So keep buffer small
ARM Cache and Write Buffer Organization
Internet Sources
• Manufacturer sites
—Intel
—ARM
• Search on cache
William Stallings
and Architecture
8th Edition
Chapter 3
Top Level View of Computer
Function and Interconnection
Program Concept
• Hardwired systems are inflexible
• General purpose hardware can do
different tasks, given correct control
signals
• Instead of re-wiring, supply a new set of
control signals
What is a program?
• A sequence of steps
• For each step, an arithmetic or logical
operation is done
• For each operation, a different set of
control signals is needed
Function of Control Unit
• For each operation a unique code is
provided
—e.g. ADD, MOVE
• A hardware segment accepts the code and
issues the control signals
• We have a computer!
Components
• The Control Unit and the Arithmetic and
Logic Unit constitute the Central
Processing Unit
• Data and instructions need to get into the
system and results out
—Input/output
• Temporary storage of code and results is
needed
—Main memory
Computer Components:
Top Level View
Instruction Cycle
• Two steps:
—Fetch
—Execute
Fetch Cycle
• Program Counter (PC) holds address of
next instruction to fetch
• Processor fetches instruction from
memory location pointed to by PC
• Increment PC
—Unless told otherwise
• Instruction loaded into Instruction
Register (IR)
• Processor interprets instruction and
performs required actions
Execute Cycle
• Processor-memory
—data transfer between CPU and main memory
• Processor I/O
—Data transfer between CPU and I/O module
• Data processing
—Some arithmetic or logical operation on data
• Control
—Alteration of sequence of operations
—e.g. jump
• Combination of above
Example of Program Execution
Instruction Cycle State Diagram
Interrupts
• Mechanism by which other modules (e.g.
I/O) may interrupt normal sequence of
processing
• Program
—e.g. overflow, division by zero
• Timer
—Generated by internal processor timer
—Used in pre-emptive multi-tasking
• I/O
—from I/O controller
• Hardware failure
—e.g. memory parity error
Program Flow Control
Interrupt Cycle
• Added to instruction cycle
• Processor checks for interrupt
—Indicated by an interrupt signal
• If no interrupt, fetch next instruction
• If interrupt pending:
—Suspend execution of current program
—Save context
—Set PC to start address of interrupt handler
routine
—Process interrupt
—Restore context and continue interrupted
program
Transfer of Control via Interrupts
Instruction Cycle with Interrupts
Program Timing
Short I/O Wait
Program Timing
Long I/O Wait
Instruction Cycle (with Interrupts) -
State Diagram
Multiple Interrupts
• Disable interrupts
—Processor will ignore further interrupts whilst
processing one interrupt
—Interrupts remain pending and are checked
after first interrupt has been processed
—Interrupts handled in sequence as they occur
• Define priorities
—Low priority interrupts can be interrupted by
higher priority interrupts
—When higher priority interrupt has been
processed, processor returns to previous
interrupt
Multiple Interrupts - Sequential
Multiple Interrupts – Nested
Time Sequence of Multiple Interrupts
Connecting
• All the units must be connected
• Different type of connection for different
type of unit
—Memory
—Input/Output
—CPU
Computer Modules
Memory Connection
• Receives and sends data
• Receives addresses (of locations)
• Receives control signals
—Read
—Write
—Timing
Input/Output Connection(1)
• Similar to memory from computer’s
viewpoint
• Output
—Receive data from computer
—Send data to peripheral
• Input
—Receive data from peripheral
—Send data to computer
Input/Output Connection(2)
• Receive control signals from computer
• Send control signals to peripherals
—e.g. spin disk
• Receive addresses from computer
—e.g. port number to identify peripheral
• Send interrupt signals (control)
CPU Connection
• Reads instruction and data
• Writes out data (after processing)
• Sends control signals to other units
• Receives (& acts on) interrupts
Buses
• There are a number of possible
interconnection systems
• Single and multiple BUS structures are
most common
• e.g. Control/Address/Data bus (PC)
• e.g. Unibus (DEC-PDP)
What is a Bus?
• A communication pathway connecting two
or more devices
• Usually broadcast
• Often grouped
—A number of channels in one bus
—e.g. 32 bit data bus is 32 separate single bit
channels
• Power lines may not be shown
Data Bus
• Carries data
—Remember that there is no difference between
“data” and “instruction” at this level
• Width is a key determinant of
performance
—8, 16, 32, 64 bit
Address bus
• Identify the source or destination of data
• e.g. CPU needs to read an instruction
(data) from a given location in memory
• Bus width determines maximum memory
capacity of system
—e.g. 8080 has 16 bit address bus giving 64k
address space
Control Bus
• Control and timing information
—Memory read/write signal
—Interrupt request
—Clock signals
Bus Interconnection Scheme
Big and Yellow?
• What do buses look like?
—Parallel lines on circuit boards
—Ribbon cables
—Strip connectors on mother boards
– e.g. PCI
—Sets of wires
Physical Realization of Bus Architecture
Single Bus Problems
• Lots of devices on one bus leads to:
—Propagation delays
– Long data paths mean that co-ordination of bus use
can adversely affect performance
– If aggregate data transfer approaches bus capacity
• Most systems use multiple buses to
overcome these problems
Traditional (ISA)
(with cache)
High Performance Bus
Bus Types
• Dedicated
—Separate data & address lines
• Multiplexed
—Shared lines
—Address valid or data valid control line
—Advantage - fewer lines
—Disadvantages
– More complex control
– Ultimate performance
Bus Arbitration
• More than one module controlling the bus
• e.g. CPU and DMA controller
• Only one module may control bus at one
time
• Arbitration may be centralised or
distributed
Centralised or Distributed Arbitration
• Centralised
—Single hardware device controlling bus access
– Bus Controller
– Arbiter
—May be part of CPU or separate
• Distributed
—Each module may claim the bus
—Control logic on all modules
Timing
• Co-ordination of events on bus
• Synchronous
—Events determined by clock signals
—Control Bus includes clock line
—A single 1-0 is a bus cycle
—All devices can read clock line
—Usually sync on leading edge
—Usually a single cycle for an event
Synchronous Timing Diagram
Asynchronous Timing – Read Diagram
Asynchronous Timing – Write Diagram
PCI Bus
• Peripheral Component Interconnection
• Intel released to public domain
• 32 or 64 bit
• 50 lines
PCI Bus Lines (required)
• Systems lines
—Including clock and reset
• Address & Data
—32 time mux lines for address/data
—Interrupt & validate lines
• Interface Control
• Arbitration
—Not shared
—Direct connection to PCI bus arbiter
• Error lines
PCI Bus Lines (Optional)
• Interrupt lines
—Not shared
• Cache support
• 64-bit Bus Extension
—Additional 32 lines
—Time multiplexed
—2 lines to enable devices to agree to use 64-
bit transfer
• JTAG/Boundary Scan
—For testing procedures
PCI Commands
• Transaction between initiator (master)
and target
• Master claims bus
• Determine type of transaction
—e.g. I/O read/write
• Address phase
• One or more data phases
PCI Read Timing Diagram
PCI Bus Arbiter
PCI Bus Arbitration
Foreground Reading
• Stallings, chapter 3 (all of it)
• www.pcguide.com/ref/mbsys/buses/
• In fact, read the whole site!

• www.pcguide.com/
William Stallings
and Architecture
8th Edition
Chapter 2
Computer Evolution and
Performance
ENIAC - background
• Electronic Numerical Integrator And
Computer
• Eckert and Mauchly
• University of Pennsylvania
• Trajectory tables for weapons
• Started 1943
• Finished 1946
—Too late for war effort
• Used until 1955
ENIAC - details
• Decimal (not binary)
• 20 accumulators of 10 digits
• Programmed manually by switches
• 18,000 vacuum tubes
• 30 tons
• 15,000 square feet
• 140 kW power consumption
• 5,000 additions per second
von Neumann/Turing
• Stored Program concept
• Main memory storing programs and data
• ALU operating on binary data
• Control unit interpreting instructions from
memory and executing
• Input and output equipment operated by
control unit
• Princeton Institute for Advanced Studies
—IAS
• Completed 1952
Structure of von Neumann machine
IAS - details
• 1000 x 40 bit words
—Binary number
—2 x 20 bit instructions
• Set of registers (storage in CPU)
—Memory Buffer Register
—Memory Address Register
—Instruction Register
—Instruction Buffer Register
—Program Counter
—Accumulator
—Multiplier Quotient
Structure of IAS –
detail
Commercial Computers
• 1947 - Eckert-Mauchly Computer
Corporation
• UNIVAC I (Universal Automatic Computer)
• US Bureau of Census 1950 calculations
• Became part of Sperry-Rand Corporation
• Late 1950s - UNIVAC II
—Faster
—More memory
IBM
• Punched-card processing equipment
• 1953 - the 701
—IBM’s first stored program computer
—Scientific calculations
• 1955 - the 702
—Business applications
• Lead to 700/7000 series
Transistors
• Replaced vacuum tubes
• Smaller
• Cheaper
• Less heat dissipation
• Solid State device
• Made from Silicon (Sand)
• Invented 1947 at Bell Labs
• William Shockley et al.
Transistor Based Computers
• Second generation machines
• NCR & RCA produced small transistor
machines
• IBM 7000
• DEC - 1957
—Produced PDP-1
Microelectronics
• Literally - “small electronics”
• A computer is made up of gates, memory
cells and interconnections
• These can be manufactured on a
semiconductor
• e.g. silicon wafer
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
—Up to 100 devices on a chip
• Medium scale integration - to 1971
—100-3,000 devices on a chip
• Large scale integration - 1971-1977
—3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
—100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 -
—Over 100,000,000 devices on a chip
Moore’s Law
• Increased density of components on chip
• Gordon Moore – co-founder of Intel
• Number of transistors on a chip will double every
year
• Since 1970’s development has slowed a little
— Number of transistors doubles every 18 months
• Cost of a chip has remained almost unchanged
• Higher packing density means shorter electrical
paths, giving higher performance
• Smaller size gives increased flexibility
• Reduced power and cooling requirements
• Fewer interconnections increases reliability
Growth in CPU Transistor Count
IBM 360 series
• 1964
• Replaced (& not compatible with) 7000
series
• First planned “family” of computers
—Similar or identical instruction sets
—Similar or identical O/S
—Increasing speed
—Increasing number of I/O ports (i.e. more
terminals)
—Increased memory size
—Increased cost
• Multiplexed switch structure
DEC PDP-8
• 1964
• First minicomputer (after miniskirt!)
• Did not need air conditioned room
• Small enough to sit on a lab bench
• $16,000
—$100k+ for IBM 360
• Embedded applications & OEM
• BUS STRUCTURE
DEC - PDP-8 Bus Structure
Semiconductor Memory
• 1970
• Fairchild
• Size of a single core
—i.e. 1 bit of magnetic core storage
• Holds 256 bits
• Non-destructive read
• Much faster than core
• Capacity approximately doubles each year
Intel
• 1971 - 4004
—First microprocessor
—All CPU components on a single chip
—4 bit
• Followed in 1972 by 8008
—8 bit
—Both designed for specific applications
• 1974 - 8080
—Intel’s first general purpose microprocessor
Speeding it up
• Pipelining
• On board cache
• On board L1 & L2 cache
• Branch prediction
• Data flow analysis
• Speculative execution
Performance Balance
• Processor speed increased
• Memory capacity increased
• Memory speed lags behind processor
speed
Login and Memory Performance Gap
Solutions
• Increase number of bits retrieved at one
time
—Make DRAM “wider” rather than “deeper”
• Change DRAM interface
—Cache
• Reduce frequency of memory access
—More complex cache and cache on chip
• Increase interconnection bandwidth
—High speed buses
—Hierarchy of buses
I/O Devices
• Peripherals with intensive I/O demands
• Large data throughput demands
• Processors can handle this
• Problem moving data
• Solutions:
—Caching
—Buffering
—Higher-speed interconnection buses
—More elaborate bus structures
—Multiple-processor configurations
Typical I/O Device Data Rates
Key is Balance
• Processor components
• Main memory
• I/O devices
• Interconnection structures
Improvements in Chip Organization and
Architecture
• Increase hardware speed of processor
—Fundamentally due to shrinking logic gate size
– More gates, packed more tightly, increasing clock
rate
– Propagation time for signals reduced
• Increase size and speed of caches
—Dedicating part of processor chip
– Cache access times drop significantly
• Change processor organization and
architecture
—Increase effective speed of execution
—Parallelism
Problems with Clock Speed and Login
Density
• Power
— Power density increases with density of logic and clock
speed
— Dissipating heat
• RC delay
— Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them
— Delay increases as RC product increases
— Wire interconnects thinner, increasing resistance
— Wires closer together, increasing capacitance
• Memory latency
— Memory speeds lag processor speeds
• Solution:
— More emphasis on organizational and architectural
approaches
Intel Microprocessor Performance
Increased Cache Capacity
• Typically two or three levels of cache
between processor and main memory
• Chip density increased
—More cache memory on chip
– Faster cache access
• Pentium chip devoted about 10% of chip
area to cache
• Pentium 4 devotes about 50%
More Complex Execution Logic
• Enable parallel execution of instructions
• Pipeline works like assembly line
—Different stages of execution of different
instructions at same time along pipeline
• Superscalar allows multiple pipelines
within single processor
—Instructions that do not depend on one
another can be executed in parallel
Diminishing Returns
• Internal organization of processors
complex
—Can get a great deal of parallelism
—Further significant increases likely to be
relatively modest
• Benefits from cache are reaching limit
• Increasing clock rate runs into power
dissipation problem
—Some fundamental physical limits are being
reached
New Approach – Multiple Cores
• Multiple processors on single chip
— Large shared cache
• Within a processor, increase in performance
proportional to square root of increase in
complexity
• If software can use multiple processors, doubling
number of processors almost doubles
performance
• So, use two simpler processors on the chip rather
than one more complex processor
• With two processors, larger caches are justified
— Power consumption of memory logic less than
processing logic
x86 Evolution (1)
• 8080
— first general purpose microprocessor
— 8 bit data path
— Used in first personal computer – Altair
• 8086 – 5MHz – 29,000 transistors
— much more powerful
— 16 bit
— instruction cache, prefetch few instructions
— 8088 (8 bit external bus) used in first IBM PC
• 80286
— 16 Mbyte memory addressable
— up from 1Mb
• 80386
— 32 bit
— Support for multitasking
• 80486
— sophisticated powerful cache and instruction pipelining
— built in maths co-processor
x86 Evolution (2)
• Pentium
— Superscalar
— Multiple instructions executed in parallel
• Pentium Pro
— Increased superscalar organization
— Aggressive register renaming
— branch prediction
— data flow analysis
— speculative execution
• Pentium II
— MMX technology
— graphics, video & audio processing
• Pentium III
— Additional floating point instructions for 3D graphics
x86 Evolution (3)
• Pentium 4
— Note Arabic rather than Roman numerals
— Further floating point and multimedia enhancements
• Core
— First x86 with dual core
• Core 2
— 64 bit architecture
• Core 2 Quad – 3GHz – 820 million transistors
— Four processors on chip
• x86 architecture dominant outside embedded systems

• Organization and technology changed dramatically
• Instruction set architecture evolved with backwards compatibility
• ~1 instruction per month added
• 500 instructions available
• See Intel web pages for detailed information on processors
Embedded Systems
ARM
• ARM evolved from RISC design
• Used mainly in embedded systems
—Used within product
—Not general purpose computer
—Dedicated function
—E.g. Anti-lock brakes in car
Embedded Systems Requirements
• Different sizes
—Different constraints, optimization, reuse
• Different requirements
—Safety, reliability, real-time, flexibility,
legislation
—Lifespan
—Environmental conditions
—Static v dynamic loads
—Slow to fast speeds
—Computation v I/O intensive
—Descrete event v continuous dynamics
Possible Organization of an Embedded System
ARM Evolution
• Designed by ARM Inc., Cambridge,
England
• Licensed to manufacturers
• High speed, small die, low power
consumption
• PDAs, hand held games, phones
—E.g. iPod, iPhone
• Acorn produced ARM1 & ARM2 in 1985
and ARM3 in 1989
• Acorn, VLSI and Apple Computer founded
ARM Ltd.
ARM Systems Categories
• Embedded real time
• Application platform
—Linux, Palm OS, Symbian OS, Windows mobile
• Secure applications
Performance Assessment
Clock Speed
• Key parameters
— Performance, cost, size, security, reliability, power
consumption
• System clock speed
— In Hz or multiples of
— Clock rate, clock cycle, clock tick, cycle time
• Signals in CPU take time to settle down to 1 or 0
• Signals may change at different speeds
• Operations need to be synchronised
• Instruction execution in discrete steps
— Fetch, decode, load and store, arithmetic or logical
— Usually require multiple clock cycles per instruction
• Pipelining gives simultaneous execution of
instructions
• So, clock speed is not the whole story
System Clock
Instruction Execution Rate
• Millions of instructions per second (MIPS)
• Millions of floating point instructions per
second (MFLOPS)
• Heavily dependent on instruction set,
compiler design, processor
implementation, cache & memory
hierarchy
Benchmarks
• Programs designed to test performance
• Written in high level language
— Portable
• Represents style of task
— Systems, numerical, commercial
• Easily measured
• Widely distributed
• E.g. System Performance Evaluation Corporation
(SPEC)
— CPU2006 for computation bound
– 17 floating point programs in C, C++, Fortran
– 12 integer programs in C, C++
– 3 million lines of code
— Speed and rate metrics
– Single task and throughput
SPEC Speed Metric
• Single task
• Base runtime defined for each benchmark using
reference machine
• Results are reported as ratio of reference time to
system run time
— Trefi execution time for benchmark i on reference
machine
— Tsuti execution time of benchmark i on test system
• Overall performance calculated by averaging

ratios for all 12 integer benchmarks
— Use geometric mean
– Appropriate for normalized numbers such as ratios
SPEC Rate Metric
• Measures throughput or rate of a machine carrying out a
number of tasks
• Multiple copies of benchmarks run simultaneously
— Typically, same as number of processors
• Ratio is calculated as follows:
— Trefi reference execution time for benchmark i
— N number of copies run simultaneously
— Tsuti elapsed time from start of execution of program on all N
processors until completion of all copies of program
— Again, a geometric mean is calculated
Amdahl’s Law
• Gene Amdahl [AMDA67]
• Potential speed up of program using
multiple processors
• Concluded that:
—Code needs to be parallelizable
—Speed up is bound, giving diminishing returns
for more processors
• Task dependent
—Servers gain by maintaining multiple
connections on multiple processors
—Databases can be split into parallel tasks
Amdahl’s Law Formula
• For program running on single processor
— Fraction f of code infinitely parallelizable with no
scheduling overhead
— Fraction (1-f) of code inherently serial
— T is total execution time for program on single processor
— N is number of processors that fully exploit parralle
portions of code
• Conclusions
— f small, parallel processors has little effect
— N ->∞, speedup bound by 1/(1 – f)
– Diminishing returns for using more processors
Internet Resources
• http://www.intel.com/
—Search for the Intel Museum
• http://www.ibm.com
• http://www.dec.com
• Charles Babbage Institute
• PowerPC
• Intel Developer Home
References
• AMDA67 Amdahl, G. “Validity of the
Single-Processor Approach to Achieving
Large-Scale Computing Capability”,
Proceedings of the AFIPS Conference,
1967.
William Stallings
and Architecture
8th Edition
Chapter 1
Introduction
Architecture & Organization 1
• Architecture is those attributes visible to
the programmer
—Instruction set, number of bits used for data
representation, I/O mechanisms, addressing
techniques.
—e.g. Is there a multiply instruction?
• Organization is how features are
implemented
—Control signals, interfaces, memory
technology.
—e.g. Is there a hardware multiply unit or is it
done by repeated addition?
Architecture & Organization 2
• All Intel x86 family share the same basic
architecture
• The IBM System/370 family share the
same basic architecture
• This gives code compatibility

—At least backwards
• Organization differs between different
versions
Structure & Function
• Structure is the way in which components
relate to each other
• Function is the operation of individual
components as part of the structure
Function
• All computer functions are:
—Data processing
—Data storage
—Data movement
—Control
Functional View
Operations (a) Data movement
Operations (b) Storage
Operation (c) Processing from/to storage
Operation (d)
Processing from storage to I/O
Structure - Top Level
Peripherals Computer
Central Main
Processing Memory
Unit
Computer
Systems
Interconnection
Input
Output
Communication
lines
Structure - The CPU
CPU
Computer Arithmetic
Registers and
I/O Login Unit
System CPU
Bus
Internal CPU
Memory Interconnection
Control
Unit
Structure - The Control Unit
Control Unit
CPU
Sequencing
ALU Login
Control
Internal
Unit
Bus
Control Unit
Registers Registers and
Decoders
Control
Memory
Outline of the Book (1)
• Computer Evolution and Performance
• Computer Interconnection Structures
• Internal Memory
• External Memory
• Input/Output
• Operating Systems Support
• Computer Arithmetic
• Instruction Sets
Outline of the Book (2)
• CPU Structure and Function
• Reduced Instruction Set Computers
• Superscalar Processors
• Control Unit Operation
• Microprogrammed Control
• Multiprocessors and Vector Processing
• Digital Logic (Appendix)
Internet Resources
- Web site for book
• http://WilliamStallings.com/COA/COA7e.html
— links to sites of interest
— links to sites for courses that use the book
— errata list for book
— information on other books by W. Stallings
• http://WilliamStallings.com/StudentSupport.html
— Math
— How-to
— Research resources
— Misc
Internet Resources
- Web sites to look for
• WWW Computer Architecture Home Page
• CPU Info Center
• Processor Emporium
• ACM Special Interest Group on Computer
Architecture
• IEEE Technical Committee on Computer
Architecture
• Intel Technology Journal
• Manufacturer’s sites
—Intel, IBM, etc.
Internet Resources
- Usenet News Groups
• comp.arch
• comp.arch.arithmetic
• comp.arch.storage
• comp.parallel

Computer Book 1

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Computer Book 1

Hochgeladen von

Copyright:

Verfügbare Formate

William Stallings

• N.B. DOES NOT MEAN 5 DISKS!!!!!

• Mode 0=blank data field

• Expensive for small runs

Release date 2000 2003 2005 2007 TBA TBA

Compressed capacity 200 GB 400 GB 800 GB 1600 GB 3.2 TB 6.4 TB

Compressed transfer 40 80 160 240 360 540

Linear density 4880 7398 9638 13300

Tape tracks 384 512 704 896

Tape length 609 m 609 m 680 m 820 m

Tape width (cm) 1.27 1.27 1.27 1.27

Electrically Erasable Read-mostly memory

Flash memory Electrically, block-level

Tag s-r Line or Slot r Word w

Cache line Main Memory blocks held

• Use set field to determine cache set to

• Remember bogus write through caches!

Move external cache on-chip, 486

Add external L2 cache using faster 486

Create separate back-side bus that Pentium Pro

Add external L3 cache. Pentium III

ARM720T Unified 8 4 4-way Logical 8

ARM920T Split 16/16 D/I 8 64-way Logical 16

ARM926EJ-S Split 4-128/4-128 D/I 8 4-way Logical 16

ARM1022E Split 16/16 D/I 8 64-way Logical 16

ARM1026EJ-S Split 4-128/4-128 D/I 8 4-way Logical 8

Intel StrongARM Split 16/16 D/I 4 32-way Logical 32

Intel Xscale Split 32/32 D/I 8 32-way Logical 32

ARM1136-JF-S Split 4-64/4-64 D/I 8 4-way Physical 32

• In fact, read the whole site!

• x86 architecture dominant outside embedded systems

• Overall performance calculated by averaging

• This gives code compatibility

Das könnte Ihnen auch gefallen