Sie sind auf Seite 1von 14

IA-32 Processor Architecture Chapter Overview

• General Concepts
• IA-32 Processor Architecture
• IA-32 Memory Management
• Components of an IA-32 Microcomputer
• Input-Output System
Javed Ahmed Shahani

General Concepts Basic Microcomputer Design


• clock synchronizes CPU operations

• Basic microcomputer design control unit (CU) coordinates sequence of execution steps
• ALU performs arithmetic and bitwise processing
• Instruction execution cycle data bus

• Reading from memory


registers

• How programs run Central Processor Unit Memory Storage


I/O
Device
I/O
Device
(CPU) Unit
#1 #2

ALU CU clock

control bus

address bus

3 4

1
Clock Instruction Execution Cycle
• synchronizes all CPU and BUS operations
• machine (clock) cycle measures time of a
PC program
I-1 I-2 I-3 I-4

single operation • Fetch memory fetch


op1
• Decode op2
read

• clock is used to trigger events • Fetch operands


registers registers
instruction
I-1
• Execute register

one cycle

decode
Store output

write

write
1
flags ALU

0 execute
(output)

5 6

Multi-Stage Pipeline Pipelined Execution


• Pipelining makes it possible for processor to execute instructions in
• More efficient use of cycles, greater throughput of instructions:
parallel
• Instruction execution divided into discrete stages
Stages
Stages
S1 S2 S3 S4 S5 S6
S1 S2 S3 S4 S5 S6
1 I-1 1 I-1 For k states and n
2 I-1 2 I-2 I-1 instructions, the
Cycles

3 I-1 3 I-2 I-1 number of required


Example of a non- 4 I-1 4 I-2 I-1 cycles is:
5 I-1
pipelined processor.
Cycles

5 I-2 I-1
6 I-1 k + (n – 1)
6 I-2 I-1
Many wasted cycles. 7 I-2
7 I-2
8 I-2
9 I-2
10 I-2
11 I-2
12 I-2

7 8

2
Wasted Cycles (pipelined) Superscalar
• When one of the stages requires two or more clock cycles, clock A superscalar processor has multiple execution pipelines. In the following,
cycles are again wasted. note that Stage S4 has left and right pipelines (u and v).

Stages Stages
exe
S4
S1 S2 S3 S4 S5 S6
1 I-1 S1 S2 S3 u v S5 S6
1 I-1
2 I-2 I-1 For k states and n For k states and n
2 I-2 I-1
3 I-3 I-2 I-1 instructions, the instructions, the
3 I-3 I-2 I-1
Cycles

4 I-3 I-2 I-1 number of required number of required

Cycles
4 I-4 I-3 I-2 I-1
5 I-3 I-1 cycles is: cycles is:
5 I-4 I-3 I-1 I-2
6 I-2 I-1
7 I-2 I-1 k + (2n – 1) 6 I-4 I-3 I-2 I-1
k+n
7 I-3 I-4 I-2 I-1
8 I-3 I-2
8 I-4 I-3 I-2
9 I-3 I-2
9 I-4 I-3
10 I-3
10 I-4
11 I-3

9 10

Reading from Memory Cache Memory


• Multiple machine cycles are required when reading from memory, because it
responds much more slowly than the CPU. The steps are:
– address placed on address bus • High-speed expensive static RAM both
– Read Line (RD) set low
– CPU waits one cycle for memory to respond
inside and outside the CPU.
– Read Line (RD) goes to 1, indicating that the data is on the data bus – Level-1 cache: inside the CPU
Cycle 1 Cycle 2 Cycle 3 Cycle 4 – Level-2 cache: outside the CPU
CLK
• Cache hit: when data to be read is already
ADDR
Address
in cache memory
RD
• Cache miss: when data to be read is not in
DATA
Data
cache memory.

11 12

3
How a Program Runs Multitasking
User

sends program
• OS can run multiple programs at the same
name to
time.
Operating searches for Current • Multiple threads of execution within the same
system program in directory
program.
gets starting
cluster from returns to
System
• Scheduler utility assigns a given amount of
loads and
starts
path
CPU time to each running program.
Directory
entry
Program
• Rapid switching of tasks
– gives illusion that all programs are running at once
– the processor must support task switching.
13 14

IA-32 Processor Architecture Modes of Operation


• Protected mode
• Modes of operation – native mode (Windows, Linux)
• Basic execution environment • Real-address mode
• Floating-point unit – native MS-DOS
• Intel Microprocessor history • System management mode
– power management, system security,
• Virtual-8086 mode
diagnostics
• hybrid of Protected
• each program has its own 8086 computer

15 16

4
Basic Execution Environment Addressable Memory
• Addressable memory • Protected mode
• General-purpose registers – 4 GB
• Index and base registers – 32-bit address
• Specialized register uses • Real-address and Virtual-8086
• Status flags modes
• Floating-point, MMX, XMM – 1 MB space
registers – 20-bit address

17 18

General-Purpose Registers Accessing Parts of Registers


Named storage locations inside the CPU, optimized for • Use 8-bit name, 16-bit name, or 32-bit
speed.
name
32-bit General-Purpose Registers
• Applies to EAX, EBX,
8 8

AH ECX,
AL and EDX 8 bits + 8 bits
EAX EBP
EBX ESP
ECX AX 16 bits
ESI
EDX EDI
EAX 32 bits

16-bit Segment Registers

EFLAGS CS ES
SS FS
EIP
DS GS

19 20

5
Index and Base Registers Some Specialized Register Uses (1 of
2)
• Some registers have only a 16-bit name for • General-Purpose
their lower half: – EAX – accumulator
– ECX – loop counter
– ESP – stack pointer
– ESI, EDI – index registers
– EBP – extended frame pointer
(stack)
• Segment
– CS – code segment
– DS – data segment
21 – SS – stack segment 22

– ES, FS, GS - additional segments

Some Specialized Register Uses (2 of Status Flags


2) • Carry
– unsigned arithmetic out of range
• EIP – instruction pointer • Overflow
• EFLAGS – signed arithmetic out of range
– status and control flags • Sign
– each flag is a single binary bit – result is negative
• Zero
– result is zero
• Auxiliary Carry
– carry from bit 3 to bit 4
• Parity
23 24
– sum of 1 bits is an even number

6
Floating-Point, MMX, XMM
Intel Microprocessor History
Registers 80-bit Data Registers 48-bit Pointer Registers
• Eight 80-bit floating-point data registers ST(0)
FPU Instruction Pointer
– ST(0), ST(1), . . . , ST(7) ST(1) • Intel 8086, 80286
– arranged in a stack
ST(2) FPU Data Pointer
• IA-32 processor family
ST(3)
– used for all floating-point arithmetic • P6 processor family
ST(4) 16-bit Control Registers
• Eight 64-bit MMX registers
• Eight 128-bit XMM registers for single-
ST(5) Tag Register • CISC and RISC
ST(6) Control Register
instruction multiple-data (SIMD) operations
ST(7) Status Register

Opcode Register

25 26

Early Intel Microprocessors The IBM-AT


• Intel 8080
– 64K addressable RAM
– 8-bit registers
• Intel 80286
– CP/M operating system – 16 MB addressable RAM
– S-100 BUS architecture – Protected memory
– 8-inch floppy disks! – several times faster than 8086
• Intel 8086/8088
– introduced IDE bus architecture
– IBM-PC Used 8088
– 1 MB addressable RAM – 80287 floating point unit
– 16-bit registers
– 16-bit data bus (8-bit for 8088)
– separate floating-point unit (8087)

27 28

7
Intel IA-32 Family Intel P6 Family
• Intel386 • Pentium Pro
– 4 GB addressable RAM, 32-bit registers, – advanced optimization techniques in
paging (virtual memory) microcode
• Intel486 • Pentium II
– instruction pipelining – MMX (multimedia) instruction set
• Pentium
• Pentium III
– superscalar, 32-bit address bus, 64-bit
internal data path – SIMD (streaming extensions) instructions
• Pentium 4
– NetBurst micro-architecture, tuned for
multimedia
29 30

CISC and RISC IA-32 Memory Management


• CISC – complex instruction set
– large instruction set
– high-level operations • Real-address mode
– requires microcode interpreter
• Calculating linear addresses
– examples: Intel 80x86 family
• RISC – reduced instruction set • Protected mode
– simple, atomic instructions
– small instruction set
• Multi-segment model
– directly executed by hardware • Paging
– examples:
• ARM (Advanced RISC Machines)
• DEC Alpha (now Compaq)

31 32

8
Real-Address mode Segmented Memory
Segmented memory addressing: absolute (linear) address is a
combination of a 16-bit segment value added to a 16-bit offset
• 1 MB RAM maximum addressable
• Application programs can access
F0000
E0000 8000:FFFF
D0000

any area of memory C0000


B0000

• Single tasking A0000


90000
one segment
80000

• Supported by MS-DOS operating 70000


60000
8000:0250
system 50000
40000
0250

30000 8000:0000
20000
10000
seg ofs
00000

33 34

Calculating Linear Addresses Your turn . . .


• Given a segment address, multiply it by 16 What linear address corresponds to the segment/offset
(add a hexadecimal zero), and add it to the address 028F:0030?
offset
• Example: convert
Adjusted Segment 08F1:0100
value: 0 8 F to a linear
1 0 028F0 + 0030 = 02920
address
Add the offset: 0 1 0 0
Linear address: 0 9 0 1 0

Always use hexadecimal notation for addresses.

35 36

9
Your turn . . . Protected Mode (1 of 2)
What segment addresses correspond to the linear address • 4 GB addressable RAM
28F30h?
– (00000000 to FFFFFFFFh)
• Each program assigned a memory
Many different segment-offset addresses can produce the
linear address 28F30h. For example:
partition which is protected from other
28F0:0030, 28F3:0000, 28B0:0430, . . .
programs
• Designed for multitasking
• Supported by Linux & MS-Windows

37 38

Protected mode (2 of 2) Flat Segment Model


• Single global descriptor table (GDT).
• All segments mapped to entire 32-bit address space
• Segment descriptor tables
FFFFFFFF

• Program structure
not used

(4GB)

– code, data, and stack areas Segment descriptor, in the


Global Descriptor Table
00040000
– CS, DS, SS segment descriptors base address limit access
physical RAM

– global descriptor table (GDT) 00000000 00040 ----

• MASM Programs use the Microsoft flat


memory model
00000000

39 40

10
Multi-Segment Model Paging
• Each program has a local descriptor table (LDT)
– holds descriptor for each segment used by the program
RAM
• Supported directly by the CPU
• Divides each segment into 4096-byte blocks
Local Descriptor Table
called pages
• Sum of all programs can be larger than
base limit access
26000 physical memory
00026000
00008000
0010
000A
• Part of running program is in memory, part is
00003000 0002 8000 on disk
3000 • Virtual memory manager (VMM) – OS utility
that manages the loading and unloading of
41 42
pages

Components of an IA-32
Motherboard
Microcomputer
• Motherboard • CPU socket
• Video output • External cache memory slots
• Memory • Main memory slots
• Input-output ports • BIOS chips
• Sound synthesizer chip (optional)
• Video controller chip (optional)
• IDE, parallel, serial, USB, video, keyboard,
joystick, network, and mouse connectors
43 • PCI bus connectors (expansion cards) 44

11
Intel D850MD Motherboard mouse, keyboard,
Video
parallel, serial, and USB
connectors Video Output
Audio chip
• Video controller
– on motherboard, or on expansion card
PCI slots
memory controller hub
– AGP (accelerated graphics port
Pentium 4 socket
technology)*
• Video memory (VRAM)
AGP slot

dynamic RAM • Video CRT Display


Firmware hub
– uses raster scanning
I/O Controller – horizontal retrace
Speaker
Battery
Power connector – vertical retrace
Source: Intel® Desktop Board D850MD/D850MV Technical Product IDE drive connectors
Diskette connector
• Direct digital LCD monitors
* This link may change over time.
Specification

45
– no raster scanning required 46

Sample Video Controller (ATI Corp.) • ROM


Memory
• 128-bit 3D graphics – read-only memory
performance powered by
RAGE™ 128 PRO • EPROM
• 3D graphics performance
– erasable programmable read-only memory
• Dynamic RAM (DRAM)
• Intelligent TV-Tuner with
Digital VCR – inexpensive; must be refreshed constantly
• TV-ON-DEMAND™ • Static RAM (SRAM)
• Interactive Program Guide – expensive; used for cache memory; no refresh required
• Still image and MPEG-2 motion • Video RAM (VRAM)
video capture – dual ported; optimized for constant video refresh
• Video editing • CMOS RAM
• Hardware DVD video playback – complimentary metal-oxide semiconductor
• Video output to TV or VCR – system setup information
• See: Intel platform memory (Intel technology brief: link address may
change)

47 48

12
Input-Output Ports Input-Output Ports (cont)
• USB (universal serial bus) • Serial
– intelligent high-speed connection to
devices – RS-232 serial port
– up to 12 megabits/second – one bit at a time
– USB hub connects multiple devices – uses long cables and modems
– enumeration: computer queries – 16550 UART (universal asynchronous
devices receiver transmitter)
– supports hot connections – programmable in assembly language
• Parallel
– short cable, high speed
– common for printers
– bidirectional, parallel data transfer 49 50

– Intel 8255 controller chip

Levels of Input-Output Displaying a String of Characters


Application Program Level 3
• Level 3: Call a library function (C++, Java)
– easy to do; abstracted from hardware; details hidden
When a HLL program
– slowest performance displays a string of OS Function Level 2
• Level 2: Call an operating system function characters, the following
– specific to one OS; device-independent steps take place:
– medium performance BIOS Function Level 1

• Level 1: Call a BIOS (basic input-output system) function


– may produce different results on different systems
– knowledge of hardware required Hardware Level 0

– usually good performance


• Level 0: Communicate directly with the hardware
– May not be allowed by some operating systems

51 52

13
ASM Programming levels
ASM programs can perform input-output at
each of the following levels:

OS Function Level 2

ASM Program BIOS Function Level 1

Hardware Level 0

53

14

Das könnte Ihnen auch gefallen