Rechnerarchitektur

Vorlesung Einführung in die Technische Informatik Seite 1
Rechnerarchitektur
Sequentielle Rechner - von-Neumann-Architektur
Das grundlegende Organisationskonzept der meisten Computer, die gegenwärtig benutzt

und auf dem Markt angeboten werden, ist das etwas 50 Jahre alte Konzept von ’von-Neu-
mann, Burks, und Goldstine’.
Burks, A>W., Goldstine, H.H., von Neumann, J., Preliminary Discussion of the Logical Design of an
Electronic Computing Instrument, in: Taub, A.H. (ed.): Collected Works of John von Neumann, vol.
5, Mac Millan, New York, 1963, pp 34-79.
Hauptgrund für die Langlebigkeit des Konzepts ist die einzigartige Verbindung von Ein-
fachheit und Flexibilität.
Das folgende Blockschaltbild erläutert das Grundkonzept, wobei der Einfachheit halber an-
genommen wird, daß alle Datentransporte über eine gemeinsame Verbindungseinrichtung
(Bus) stattfinden.
CPU
Befehlsprozessor Datenprozessor I/O-Prozessor
Befehle Daten Daten
Verbindungseinrichtung (Bus)
Befehle/Daten
Speicher
Auf der Ebene der Rechnerstruktur faßt man i.a. den Befehlsprozessor und den Datenpro-
zessor zu einer Einheit zusammen, die zentrale Recheneinheit (central processing unit CPU)
genannt wird.
Die von-Neumann-Architektur kann als die Rechnerorganisation des minimalen Hard-
wareaufands bezeichnet werden, bestehend aus folgenden Einheiten:
- CPU: sie interpretiert die Befehle des Programms und führt sie aus.
- Speicher: er enthält das Maschinenprogramm und die dazugehörigen Daten.
- I/O-Prozessor: er stellt die Verbindung mit der Außenwelt her und verwendet
dazu die peripheren Ein/Ausgabegeräte.
- Datenwege: sie dienen dem Informationaustausch zwischen den Komponenten
und enthalten sowohl Datenpfade, als auch Adreß- und Kontrollpfade zur
Steuerung des Systems.
Das ist die einfachste Form der physikalischen Struktur der von-Neumann-Architektur.
Rechnerarchitektur
Definition : Rechnerarchitektur [Giloi 93]

Eine Rechnerarchitektur ist bestimmt durch ein Operationsprinzip für die Hard-
ware und die Struktur ihres Aufbaus aus den einzelnen Hardware-Betriebsmit-
teln.
Definition : Operationsprinzip
Das Operationsprinzip definiert das funktionelle Verhalten der Architektur
durch Festlegung einer Informationsstruktur und eine Kontrollstruktur.
Definition : (Hardware) - Struktur
Die Struktur einer Rechnerarchitektur ist gegeben durch Art und Anzahl der
Hardware-Betriebsmittel sowie die sie verbindenden Kommunikationseinrich-
tungen.
- Kontrollstruktur : Die Kontrollstruktur einer Rechnerarchitektur wird durch

Spezifikation der Algorithmen für die Interpretation und Transformation der
Informationskomponenten der Maschine bestimmt.
- Informationsstruktur : Die Informationsstruktur einer Rechnerarchitektur wird
durch die Typen der Informationskomponenten in der Maschine bestimmt, der
Repräsentation dieser Komponenten und der auf sie anwendbaren Operationen.
Die Informationsstruktur läßt sich als Menge von ’abstrakten’ Datentypen spe-
zifizieren.
- Hardware-Betriebsmittel : Hardware-Betriebsmittel einer Rechnerarchitektur
sind Prozessoren, Speicher, Verbindungseinrichtungen und Peripherie.
- Kommunikationseinrichtungen : Kommunikationseinrichtungen sind die Ver-
bindungseinrichtungen, wie z.B.: Busse, Kanäle, VNs ; und die Protokolle, die
die Kommunikationsregeln zwischen den Hardware-Betriebsmittel festlegen.
Neben der Struktur ist das Operationsprinzip einer Rechnerarchitektur die wichtigste Fest-
legung des funktionellen Verhaltens der Architektur. Um das Operationsprinzip der von-
Neumann Architektur angeben zu können, müssen wir die Informationsstruktur und die
Kontrollstruktur betrachten.
Grundsätzlich ist die kleinste identifizierbare Informationseinheit in einem Rechner der boo-
lesche Vektor der Länge n (n Š 1), wobei häufig 8 bit zusammengefaßt die kleinste adres-
sierbare Einheit, das Byte, darstellen.
In der von-Neumann-Maschine können diese booleschen Vektoren die folgenden Informa-
tionstypen repräsentieren:
- Befehle (Anweisungen an die Hardware)
- Daten (Zahlen oder Zeichen)
- Adressen von Speicherzellen
Die von-Neumann-Maschine kann einem Speicherzelleninhalt nicht ansehen, was er reprä-
sentiert. Die Interpretation eines solchen Maschinenwortes erfolgt auf grund des Zustands,
indem sich die Maschine zum Zeitpunkt der Interpretation befindet. Die aus dem Speicher
geholte Information wird abwechselnd als Befehl oder als Datum interpretiert, wobei beim
Start eines Programms mit der Befehlsinterpretation begonnen wird.
Rechnerarchitektur
Ablauf der Befehlsausführung
Programm-
anfang
Program Counter
ersten Befehl aus
PC Speicher holen
Befehl in das
Befehlsregister
bringen
Ausführung eventueller
Adreßänderungen und ggf.
Auswertung weiterer Angaben
im Befehl
evtl. Operanden
aus dem Speicher
holen
nächsten Befehl aus
dem Speicher holen
Umsetzen des Operationscodes
in Steueranweisungen
PC+1 Operation ausführen,

Befehlszähler um 1 erhöhen
PC:=BRA oder Sprungadresse einsetzen
Programm- Nein
ende?
Ja
Ende
Die ’abstrakte von-Neumann-Maschine‘ (der I/O-Prozessor wird nicht berücksichtigt) be-

steht aus dem Befehls-und Datenprozessor und aus dem Speicher. Alle Speicherzellen
(Hauptspeicher und Register) stellen zusammen das Zustandsregister der Maschine dar. Die
abstrakte von-Neumann-Maschine ist also nichts anderes als ein endlicher Automat (FSM),
dessen Zustände durch den jeweiligen Speicherinhalt gegeben sind, wobei mit jedem Zu-
standsübergang immer nur der Inhalt von genau einer Speicherzelle verändert werden kann.
• sequentielle Befehlsabarbeitung mit einem Datum
Dieses Prinzip der maximalen Flexibilität führt zu einem sehr hohen Transport von Maschi-
nenworten zwischen Speicher und CPU, der sich leistungsmindernd auswirkt. Er wird von-
Neumann-Flaschenhals (von-Neumann-bottleneck) genannt.
Operationsprinzip - Pipeline
Unter dem Operationsprinzip versteht man das funktionelle Verhalten der Architektur, wel-
ches auf der zugrunde liegenden Informations- und Kontrollstruktur basiert.
Verarbeitung von mehreren Datenelementen mit nur einer Instruktion
Pipeline - Prinzip parallele V.E. (P.E‘s)
Vektorrechner Feldrechner
(’array of processing elements’)
Pipeline - Prinzip
Beispiel : Automobilfertigung
Bestandteile eines ’sehr’ einfachen Autos :
- Karosserie
- Lack
- Fahrgestell
- Motor
- Räder
Beispiel : Automobilfertigung
Welche Möglichkeiten des Zusammenfügens (Assembly) gibt es ?
Fließband
(PIPELINE) Workgroup
Worauf muß man achten ?

Abhängigkeiten !
Fließband
Blech Lack Fahrgestell Motor Räder
Karosserie - Räder
assembly Lackiererei Fahrgestelleinbau Motoreinbau montieren
Produktion von verschiedenen Modellen : 3 Farben R(ot),

G(rün)
B(lau)
41 Auftrag
F R 2 Motoren N(ormal),
I(njection)
M N
K K 2 Karosserien L(imousine),
K(ombi)
Pipeline des Fertigungsvorgangs
20 min
K La F M R
10 min 10 min 10 min 10 min
Lb
20 min Optimierung der Stufe : Lackierung
L1 L2
10 min 10 min
Stufen - Zeit - Diagramm der Pipeline
stage
Auftrag 41 K L1 L2 F M R
41 1:5
42 41 2:4
43 42 41 3:3
43 42 41 3:3
43 42 41 3:3
43 42 41 3:3
time
43 42 2:4
43 1:5
ppt
Pipeline - Register
Unter einem Register verstehen wir eine Hardwarestruktur die ein- oder mehrere Bits spei-
chern kann. Register sind (normalerweise) D-Flip Flops (siehe PI2).
D-FF Register
D Q
32 32
clk
Wichtige Kenndaten eines Registers:
Clock-to-output Zeit (tco):

Zeit zwischen Taktflanke und dem Zeitpunkt an dem eine Änderung des Eingangs am Aus-
gang sichtbar ist.
Setup Zeit (tsu):
Zeit vor der Taktflanke in der der Eingangswert schon stabil sein muss (sich nicht mehr än-
dern darf; Grund-> Digitale Schaltungstechnik, Stichwort: metastabile Zustände)
Hold Zeit (th):
Zeit nach der Taktflanke an der sich der Eingangswert noch nicht ändern darf (Grund wieder
wie tsu).
cycle time
tcyc
Clock
Daten sollten am Eingang stabil sein

th
tsu tco
Ausgang gueltig mit neuen Daten
Pipelining
The performance gain achieved by pipelining is accomplished by partitioning an operation

F into multiple suboperations f1 to fk and overlapping the execution of the suboperations of
multiple operations F1 to Fn in successive stages of the pipeline [Ram77]
Assumptions for Pipelining
1 the operation F can be partitioned
all suboperations fi require approximately

2
the same amount of time
3 there are several operations F

the execution of which can be overlapped
Technology requirement for Pipelining
4 the execution time of the suboperations is long

compared with the register delay time
Linear Pipeline with k Stages

stage F
instr. result(s)
& operands f1 f2 f3 fk
[RAM77] Ramamoorthy, C.V., Pipeline Architecture, in: Computing Surveys, Vol.9, No. 1, 1977, pp. 61-
102.
Pipelined Operation Time
tp ( n, k) = k + (n-1) for this example: tp (10,5) = 5 + (10 - 1) = 14
time to fill time to process

the pipeline n-1 operations
pipeline stages
time phases 1 2 3 4 5
tcyc
start-up
or fill
processing
drain
Durchsatz Throughput
number of operations n operations
TP ( n, k) = = E ( n, k)
time unit tcyc sec
Gewinn Gain
scalar execution time n k lim S→ k
S ( n, k) = =
pipelined execution time k + (n-1) n→∞
Effizienz Efficiency initiation rate, latency

1 n k n
E ( n, k) = S ( n, k) = =
k k ( k + (n-1)) k + (n-1)
Pipeline Interrupts
data dependencies
control-flow dependencies
resource dependencies
1 the operation F can be partitioned
time tf
F
time tf / 2 1 1’
time tf
2 2’
f1 f2
F
time tf / 2 time tf / 2
time tf
all suboperations fi require approximately

2
the same amount of time
Version 1 f2
f1 f2 a f2 b f2 c f3
f1 << f2
f3 << f2 time t2 /3
f1 f2 f3
f2
f1 f2 f3
Version 2
f2
3 there are several operations F

the execution of which can be overlapped
If there is a discontinuous stream of operations F owing to a conflict, bubbles are inserted
into the pipeline. This reduces the performance gain significantly.
A typical example of this is the control dependency of the instruction pipeline of a processor.
Here, each conditional branch instruction may disrupt the instruction stream and cause (k-1)
bubbles (no-operations) in the pipeline, if the control flow is directed to the nonpredicted
path.
4 the execution time of the suboperations is long

compared with the register delay time
Item 4 is a technological requirement for the utilization of a pipeline. Assuming a parti-
tioning of the operation F into three suboperations f1, f2, f3, and also no pipelining, the ope-
ration F can be executed in the time:
t (F) = tf1 + tf2 + tf3
Introduction of registers
stage 1 stage 2 stage 3
Clock
tf1 tsu tco tf2 tsu tco tf3 tsu tco

Di Di Do
input output
t (F) = ( max (tfi) + tco + tsu ) 3=3 max (tfi) + 3 ( tco + tsu )
tcyc k stages register delay time
tcyc = max (tfi) + tco + tsu fcyc = 1 / tcyc
The registers are introduced behind each function of the suboperation and this creates the
pipeline stages. Placing the register at the output (not at the input!!!) makes suboperation sta-
ges compatible with the definition of state machines, which are used to control the pipeline.
Rechnerarchitektur
Maschinensprache
Die wichtigste Schnittstelle zwischen Hardware und Software ist die Maschinensprache.
Sie ist definiert durch eine Menge von Maschinenbefehlen.

Ein Maschinenbefehl (auch: Maschineninstruktion oder auch nur Instruktion) entspricht ei-
ner elementaren Operation des Rechners. Alle Anweisungen aus höheren Programmierspra-
chen müssen auf Maschinenbefehle abgebildet werden (->Compiler).
Ein Maschinenbefehl besteht aus einem Bitmuster, das unmittelbar vom Prozessor interpre-
tiert werden kann. Ein Teil des Bitmusters beschreibt die gewünschte Operation (Operati-
onscode), ein anderer Teil die Operandenadresse(n), also die Adresse(n) der
Speicherelemente, die durch die Operation verknüpft werden sollen.
Bei den Maschinenbefehlen unterscheidet man zwei grundlegende Typen:

• Maschinenbefehle mit fester Länge
• Maschinenbefehle mit variabler Länge
Bei den Maschinenbefehlen mit fester Länge ist der Operandenteil ebenfalls fest und somit
sind nur Adressierungsverfahren möglich, deren Operandenadresse in das Befehlsformat
hineinpassen.
OPC OPA z.B. 32-bit
Operation Code Operand Addresses
Maschinenbefehle mit variable Länge nutzen je nach Adressierungsart unterschiedliche Be-

fehlslängen, wobei die folgenden Worte die zusätzlichen Operandenadresse enthalten.
OPC z.B. 16-bit

ADR ext. fields und n x 16-bit
OPC EA effective Address

Rechnerarchitektur
Maschinentypen
Ist im Befehlsformat kein Operandenfeld für die Adressierung vorgesehen, so nennt man
diese Architektur
- 0-Adreß-Maschine oder Stack-Maschine
Die Befehle beziehen sich immer implizit auf spezielle Register, in diesem Fall auf den ’top-
of-stack’.
Ist im Befehlsformat ein Operandenfeld für die Adressierung vorgesehen, so kann man ne-
ben dem impliziten Register einen Speicheroperanden referenzieren. Monadischen Opera-
tionen werden auf das implizite Register, den Akkumulator, angewendet. Dyadischen
Operationen verknüpfen den Inhalt des Akkumulators mit dem Inhalt der durch die Operan-
denadresse referenzierten Speicherzelle.
Diese Architektur heißt
- Ein-Adreß-Maschine oder Akkumulator-Maschine
Die von-Neumann-Architektur ist eine solche Ein-Adreß-Maschine.
z.B. 16-bit
ADR ext. fields und n x 16-bit
OPC EA effective Address
Sind im Befehlsformat zwei Operandenfelder für die Adressierung vorgesehen, so kann man
bei dyadischen Operationen direkt zwei durch den Befehl referenzierten Speicherzellen ver-
knüpfen.
Diese Architektur heißt
- Zwei-Adreß-Maschine
Operandenfelder ADR ext. fields
OPC EA1 EA2
Destination Source
Beispiele
Befehl: ADD D0,D1 Befehl: ADD D0,(A0) Befehl: ADD (A1),(A0)

addiert die zwei Register D0 und D1 addiert das Register D0 mit dem Inhalt addiert den Inhalt der Speicherzelle (A1)
das Resultat wird in D0 abgelegt. der Speicherzelle, wo A0 hinzeigt mit dem Inhalt der Speicherzelle, wo A0 hinzeigt
D0 ist damit überschrieben das Resultat wird in D0 abgelegt. das Resultat wird in der Speicherzelle (A1) abgelegt.
D0 ist damit überschrieben Die Speicherzelle (A1) ist damit überschrieben
Rn <- Rn + Rx Rn <- Rn + Mem(A0) Mem(A1) <- Mem(A1) + Mem(A0)
Die Drei-Adreß-Maschine kann drei Operanden pro Befehl referenzieren. wird aber zumeist
nur als Register-Register-Maschine mit festem Befehlsformat realisiert (->RISC).
Rechnerarchitektur
Klassifikation von Befehlen
• Transportbefehle; sie dienen dazu, Daten von einem Ort an einen anderen zu
übertragen. Speicherbefehle schreiben Registerinhalte in den Arbeitsspeicher,
Ladebefehle transportieren aus dem Speicher zu lesende Daten ins Register.
Register-Register-Transporte werden ebenfalls als Laden bezeichet, Speicher-
Speicher-Transporte als Speichern. Transport mit Sofortoperanden dient zum
Initialisieren von Register- oder Speicherinhalten.
• Arithmetische Befehle; sie dienen dazu, Operanden gemäß den vom Prozessor
unterstützten Zahlen/Datentypen zu manipulieren (Betragszahlen, ganze Zah-
len, Gleitkommazahlen, Bytes; Addition, Subtraktion, Multiplikation, Divisi-
on, Invertierung usw.).
• Vergleichsbefehle; hier werden zwei Operanden gemäß einer Ordnungsrelati-

on und/oder der Bitmustergleichheit miteinander verglichen. Das Ergebnis
wird als Anzeige im Statuswort des Prozessors abgelegt (Anzeigen ’<, >, =’).
• Sprungsbefehle; der unbedingte Sprung überschreibt den Befehlszähler immer

mit einem angegebenen Operanden, der bedingte nur, wenn die Anzeigen im
Statuswort des Prozessors einer als Sofortoperand angegebenen Maske entspre-
chen. Wichtig ist weiter der Unterprogrammsprung, der an die angegebene Pro-
grammadresse verzweigt, den alten Befehlszählerstand und die
Rücksprungadresse aber vorher rettet (in ein angegebenes Register,oder auf den
Stack). Unter Umständen gibt es auch spezielle Schleifenbefehle, z. B. "dekre-
mentiere und springe wenn größer 0".
• Bitmuster-Befehle; sie dienen dazu, Operanden gemäß einer Bitmusteropera-

tion zu manipulieren (Bitmuster - UND/ODER/XOR/NEGIEREN, Schieben).
• Bit-Befehle; Transportbefehle zwischen einem Anzeigebit des Statusworts und

Bitstellen in Registern oder Speicher (Bit testen, Bit schreiben) bzw. von So-
fortoperand zu Bitstelle (Bit setzen, Bit löschen).
• E/A-Befehle; Transportbefehle zwischen E/A-Steuerung und Registern, bzw.

Speicher.
• Spezialbefehle; sie beeinflussen die Betriebsart des Prozessors durch Laden

spezieller Register (z. B. Basisadreßregister), dienen der Unterbrechungsbe-
handlung und stehen i. d. R. nur privilegierten Programmen (z. B. dem Be-
triebssystem) zur Verfügung.
Rechnerarchitektur
Adressierungstechniken
• immediate; Sofortoperand; im Befehl steht nicht die Speicheradresse des Orts

das Operanden, sondern der Operand selbst. Mit Sofortoperanden können Kon-
stanten in eine Berechnung eingebracht werden.
• short immediate; Kurzer Sofortoperand; der im Befehl stehende Sofortoperand
besitzt besonders kurze Bitlänge (z. B. 4 oder 8 Bit), hiermit können gut die
häufigen Konstanten 0, 1 eingebracht werden.
• register direct; Register direkt; der Operand steht in dem angegebenen Regi-
ster.
• Speicher direkt; der Operand steht in der per Adresse angegebenen Speicher-
zelle. Die Adresse steht direkt im Adreßteil des Befehlswortes.
• register indirect; Register indirekt; im angegebenen Register findet sich die
Speicheradresse des Operanden.
• Speicher indirekt; in der per Adresse angegebenen Speicherzelle steht die Spei-
cheradresse des Operanden.
• register indirect with displacement; ... mit Distanz; zur Operandenadresse

wird eine im Befehl stehende Konstante hinzuaddiert.
• register indirect with index; ... indiziert; zur Operandenadresse wird der Inhalt
eines zusätzlich angegebenen Indexregisters hinzuaddiert.
• ... Basisregister relativ; zur Operandenadresse wird der Inhalt eines besonderen
Registers (des Basisregisters), das nicht ausdrücklich im Befehl genannt wird,
dazuaddiert.
• program counter indirect;... Befehlszähler relativ; zur Operandenadresse wird

der momentane Inhalt des Befehlszählers hinzuaddiert.
• Register indirekt mit Postinkrement; (An)+; nach dem Operandenzugriff wird

das im Befehl angegebene Register im Inhalt um 1 erhöht.
• Register indirekt mit Prädekrement; -(An); vor dem Operandenzugriff wird das
im Befehl angegebene Register im Inhalt um 1 vermindert, dann wird der Inhalt
als Adresse des Operanden ausgewertet.
Mit den letzten beiden Adressierungsarten können gut die Stackoperationen "Push" und
"Pop" realisiert werden (->Stack).
Rechnerarchitektur
Adressierungstechniken des MC 68000
The addressing modes of a CPU determine the way in which a processor can reference an
operand held in one of its registers or in memory. For each operand, the addressing mode
specifies how the processor is to locate or calculate the actual address of the operand. This
actual address is called the effective address EA.
EA-field 6-bit
8 Datenregister und 8 Adreßregister adressierbar
3-bit 3-bit
Nr. EA-Modus EA-Reg. Adressierungsart Mnemonik

1 000 Reg.-Nr. Datenregister direkt Dn
2 001 Reg.-Nr. Adressierregister direkt An
3 010 Reg.-Nr. Adressierregister indirekt (ARI) (An)
4 011 Reg.-Nr. ARI mit Postinkrement (An)+
5 100 Reg.-Nr. ARI mit Predekrement -(An)
6 101 Reg.-Nr. ARI mit Adressdistanz d16(An)
7 110 Reg.-Nr. ARI mit Adressdistanz und Index d8(An,Rx)
8 111 000 Absolut kurz $XXXX 16-bit
9 111 001 Absolut lang $XXXXXXXX 32-bit

10 111 010 PC relativ mit Adressdistanz d16(PC)
11 111 011 PC rel. mit Adressdistanz u. Index d8(PC,Rx)
12 111 100 Konstante, Statusregister *, SR, CCR
13 111 101 (nicht verwendet)
Übersicht der Adressierungsarten
Die große Anzahl der Adressierungsmodi ist typisch für die Generation der ersten Mikro-
prozessoren, die mehr oder weniger nach dem Vorbild der PDP11 entworfen wurden. Ihre
Komplexität und die teilweise damit verbundene lange Ausführungszeit eines Befehls führte
zu der Entwicklung der -> Load/Store-Architekturen.
Rechnerarchitektur
Adressierungsart "Register direkt".
Die Operandenadresse ist ein Register (Datenregister, Adreßregister, Statusregister).
Beispiel:
Befehl MOVE.x Operad Size

15 12 11 6 5 0 .B 8 bit
00xx EA2 EA1 .W 16 bit
.L 32 bit
MOVE DO,D3 gleichbedeutend mit:

MOVE.W DO,D3
EA2
Daten- Adresse (Ziel)
Befehl Operanden
länge EA1
Adresse (Quelle)
Schematische Darstellung:
vor Ausführung des Befehls: nach Ausführung des Befehls:
Prozessor Programmspeicher Prozessor Programmspeicher

15 8 7 0 15 8 7 0
DO 12345678 8004 36 00 DO 12345678

DO 12345678 8004 36 00
D3 87654321 8006 nächster Befehl

D3 87654321 D3 12345678 8006 nächster Befehl
Datenspeicher Datenspeicher
15 8 7 0 15 8 7 0
PC 00008004 PC 00008006
Rechnerarchitektur
Generation: EA = Dn
Assembler syntax: Dn
Mode: 000 31 0
Register: n
Data register: Dn Operand
Number of extension words: 0
Generation: EA = An
Assembler syntax: An
Mode: 001 31 0
Register: n
Address register: An Operand
Generation: EA = (An)
Assembler syntax: (An)
Mode: 010 31 0
Register: n
Address register: An Memory address
31 0
Memory address: Operand
Generation: EA = (An)
An = An + Size
Assembler syntax: (An)+
Mode: 011 31 0
Register: n
Address register: An Memory address
Operand length (1,2, or 4) +
31 0
Memory address: Operand
Rechnerarchitektur
Registermodell
Programmiermodell mit demn im User-Mode ansprechbaren Registern. Im Supervisor-

Mode sind weitere Register vorhanden, die nur mit privilegierten Befehlen bearbeitet wer-
den können.
31 16 15 8 7 0
D0
D1
Acht Daten-
D2 Register
D3
D4
D5
D6
D7
31 16 15 0
A0
A1
A2 Acht Adress-
Register
A3
A4
A5
A6
USER STACK POINTER A7 davon ein Stackpointer
SUPERVISOR STACK POINTER A7’
31 24 23 0
PC Programmzähler
15 8 7 CCR 0
SYSTEM USER SR Statusregister
BYTE BYTE
Supervisor Mode User Mode

Rechnerarchitektur
Statusregister des MC68000
Statusregister (SR)
System-Byte Anwender-Byte (CCR)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
T S I2 I1 I0 X N Z V C
Trace-
Betriebsart
Extend
Supervisor- (Erweiterung)
Bit
Negativ
Interrupt-Maske
Zero
Bedingungscodes (Null)
(Condition Codes)
Überlauf
(Overflow)
Carry
(Übertrag)
Supervisor and User Byte of status register

System- und Anwender-Byte im Statusregister
Das CPU 32 Manual, das die gesamten Befehle und Adressierungsarten des MC68000 Pro-
zessors enthält, findet man im "Netz" unter folgender URL:
http://mufasa.informatik.uni-mannheim.de/lsra/tools/bsvc/bsvc.html
.
Rechnerarchitektur
Byte Ordering
There are two different conventions for ordering the bytes within a longer data type as word,
long word, etc. Bid Endian byte order puts the most significant byte of the data type (the
big end) to the address xxxx00. Little Endian puts the least significant byte of the data type
(the little end) to the address xxxx00. The data type ‘byte’ has the same arrangement on both
‘byte sexes’.
Big Endian Little Endian
Bytes
Base Address : 800
800 801 802 803 800 801 802 803

B0 B1 B2 B3 B0 B1 B2 B3
804 805 806 807 804 805 806 807
B4 B5 B6 B7 B4 B5 B6 B7
B0 = Byte 0
etc.
Words (16 bit)
800 801 802 803 800 801 802 803

WH0 WL0 WH1 WL1 WL0 WH0 WL1 WH1
804 805 806 807 804 805 806 807
WH2 WL2 WH3 WL3 WL2 WH2 WL3 WH3
WHo = Word 0, High Byte

WLo = Word 0, Low Byte
etc.
Longwords (32 bit)
800 801 802 803 800 801 802 803

LHH0 LHL0 LLH0 LLL0 LLL0 LLH0 LHL0 LHH0
804 805 806 807 804 805 806 807
LHH1 LHL1 LLH1 LLL1 LLL1 LLH1 LHL1 LHH1
LHH0 = Longword 0, High Word, High Byte

LHL0 = Longword 0, High Word, Low Byte
LLH0 = Longword 0, Low Word, High Byte
LLL0 = Longword 0, Low Word, Low Byte
etc.
When operating within one machine, the byte sex is normally unnoticable. Byte order is a
problem when exchanging data among machines with different orderings.
A special protocol is required to type tag all data types larger than bytes. These tags allow
the rearrangement of the byte order for the machine. This can be performed at the sending
or the receiving side (-> XDR Protocol).
Rechnerarchitektur
Memory Alignment
Die Ausrichtung von Datenobjekten auf Speicheradressen nennt man Alignment. Die Adres-
sierung von Datenobjekten im Speicher erfogt i.A. auf der Basis der kleinsten Einheit, mei-
stens dem Byte. Größere Datentypen könnten damit an jeder möglichen Byteadresse
beginnen. Ist die Datenbreite des Speichers z.B. 32 bit, so könnte ein long word Operand,
der auf einer ungeraden Byteadresse beginnt (-----01)2 zu mehreren Speicherzugriffen füh-
ren, bei denen nur Teile des Operanden geholt werden können. Architekturen, die solche Zu-
griffe unterstützen, haben keine Restriktionen auf die Anordnung der Datenobjekte im
Speicher (unrestricted data alignment). Der Nachteil ist, das die Anzahl der erforderlichen
Hauptspeicherzugriffe erhöht wird.
Longwords (32 bit)

Address of
Longword = 801 800 801 802 803
1.Fetch
LHH0 LHL0 LLH0
804 805 806 807
2.Fetch
LLL0
LHH0 = Longword 0, High Word, High Byte

LHL0 = Longword 0, High Word, Low Byte
LLH0 = Longword 0, Low Word, High Byte
LLL0 = Longword 0, Low Word, Low Byte
etc.
Will man diesen Nachteil vermeiden, so muß man bestimmte Restriktionen für die Anord-
nung der Datenobjekte im Speicher einführen. Dieses Alignment ist aus Leistungsgründen
und zur Vereinfachung der Hardware des Speicherports sehr wichtig. Damit verbietet man
eine Anordnung des Longwords wie im obigen Beispiel. Eine solche Anordnung der Daten-
objekte im Speicher nennt man dann missaligned.
Versucht man auf ein längeres Datenobjekt mit einer ‘missaligned’ Adresse zuzugreifen
(was die Hardware nicht kann!!), so wird eine Ausnahmebehandlung (Exception) ausgelöst,
der sogenannte ‘Missaliged Trap’ (->Trap).
Die 68xxx -Prozessorfamilie hat je nach Implementierung unterschiedliche Alignments, wo-
bei zumindest die Instruktionen immer auf 16-bit Kanten beginnen müssen. Der 68000 Si-
mulator erwartet auch für die Daten ein 16-bit Alignment.
Die meisten Architekturen benutzen als minimales Alignment die Wortbreite ihrer Register.
Die Assemblerdirektive für das Erzwingen eines Alignments lautet:
.align 2
Hierbeit stellt 2 die Anzahl der Bytes für das Memory Alignment dar.
Befehlsliste und Effekte auf CCR
Mnemonic Operation Assembler Syntax Conditioncodes

X N Z V O
ABCD Addition dezimal ABCD Dy, Dx * U * U *

mit Erweiterungsbit ABCD-(Ay), -Ax
ADD Addiere binär ADD.s<ea>,Dn * * * * *
ADD.sDn,<ea>
ADDA Addiere Adresse ADD.s<ea>,An - - - - -
ADDI Addiere direkt ADDI.s#<data>,<ea> * * * * *
ADDQ Addiere schnell ADDQ.s#<data>,<ea> * * * * *
ADDX Addiere mit ADDX.s Dy, Dx * * * * *

Erweiterungsbit
ADDX.s -(Ay), -(Ax)
AND Logisches UND AND.s <ea>, Dn - * * 0 0
AND.s Dn, <ea>
ANDI UND direkt ANDI.s#<data>,<ea> - * * 0 0
ASL,ASR Arithmetische Ver- ASd.s Dx, D * * * * *

schiebung nach links, AS.d # <data>, Dy
nach rechts ASd.s <ea>
Bcc Bedingter Sprung Bcc <label> - - - - -

BCHG Prüfe ein Bit und BCHG Dn, <ea> - - - - -
ändere es
BCHG # <data>, <ea>
BCLR Prüfe ein Bit und BCLR Dn, <ea> - - * - -
setze es auf 0
BCLR # <data>, <ea>
BRA Unbedingter Sprung BRA <label> - - - - -
BSET Prüfe ein Bit und BSET Dn, <ea> - - * - -
setze es
BSET # <data>, <ea>
Befehlsliste und Effekte auf CCR (Fortsetzung)

X N Z V O
BSR Sprung zum BSR <label> - - - - -

Unterprogramm
BTST Prüfe ein Bit BTST Dn, <ea> - - * - -
BTST # <data>, <ea>
CHK Prüfe Register CHK <ea>,An - - U U U

auf Grenzen
CLR Setze Operand auf 0 CLR.x <ea> - 0 1 0 0
CMP Vergleiche CMP.s <ea>, Dn - * * * *
CMPA Vergleiche Adresse CMPA.s <ea>, An - * * * *
CMPI Vergleiche direkt CMPI.s# <data>,<ea> - * * * *
CMPM Vergleiche Speicher CMPM.s (Ay)+, (Ax)+ - * * * *
DBcc-><- Prüfe Bedingung, DBcc Dn, <label> - - - - -
vermindere und
springe
DIVS Division m. Vorzeichen DIVS <ea>, Dn - * * * 0
DIVU Division o. Vorzeichen DIVU <ea>, Dn - * * * 0
EOR Logisches exclusiv EOR.s Dn, <ea> - * * 0 0
ODER
EORI Exclusives ODER EORI.s# <data>,<ea> - * * 0 0
direkt
EXG Datentausch EXG Rx, Ry - - - - -

zwischen Register
EXT Vorzeichen- EXT.s Dn - * * 0 0
erweiterung
JMP Springe JMP <ea> - - - - -
JSR Springe zum Unter- JSR <ea> - - - - -
programm

X N Z V O
LEA Lade die effektive LEA <ea>, An - - - - -
Adresse
LINK Verbinde und weise zu LINK An, - - - - -
<Verschiebung>
LSL,LSR Logische LSd.s Dx, Dy * * * 0 *
Verschiebung
nach links LSd.s# <data>, Dy
nach rechts LSd.s <ea>
MOVE Transportiere Daten MOVE.s<ea>, <ea> - * * 0 0

Code-Bits
MOVE Transportiere zum MOVE <ea>, CCR * * * * *

zum CCR Bedingungsspeicher
MOVE Transportiere zum MOVE <ea>, SR * * * * *
zum SR Statusregister
MOVE Transportiere vom MOVE SR, <ea> - - - - -

vom SR Statusregister
MOVE Transportiere den MOVE USP, An - - - - -

USP Anwender-Stack- MOVE An, USP
pointer
MOVEA Transportiere die MOVEA.s <ea>, An - - - - -
Adresse
MOVEM Transportiere MOVE.s <Register- - - - - -
(see note) mehrere Register liste>, <ea>
MOVEM.s <ea>,
<Registerliste>
MOVEP Transportiere MOVEP Dx, d (Ay) - - - - -
periphere Daten MOVEP d(Ay), Dx
MOVEQ Transportiere MOVEQ#<data>, Dn - * * 0 0

schnell

X N Z V O
MULS Multiplikation mit MULS <ea>, Dn - * * 0 0
Vorzeichen
MULU Multiplikation ohne MULU <ea>, Dn - * * 0 0

Vorzeichen
NBCD Negiere dezimal NBCD <ea> * U * U *

mit Erweiterung
NEG Negiere NEG.s <ea> * * * * *
NEGX Negiere mit NEGX.s <ea> * * * * *

Erweiterung
NOP Keine Operation NOP - - - - -
NOT Logisches Komplement NOT.s <ea> - * * * *
OR Logisches ODER OR.s <ea> - * * 0 0

OR Dn, <ea>
ORI Logisches ODER ORI.s# <data>, <ea> - * * 0 0

direkt
PEA Eintragen der PEA <ea> - - - - -
RESET Normieren externer RESET - - - - -

Einheiten
ROL,ROR Ringverschiebung ROd.s Dx, Dy - * * 0 *
nach links ROd.s# <data>, Dy
nach rechts ROd.s <ea>
ROXL, Ringverschiebung ROXd.s Dx, Dy * * * 0 *
ROXR mit Erweiterungsbit ROXd.s# <data>, D<
nach links, nach ROXd.s <ea>
rechts
RTE Springe zurück RTE * * * * *

von Ausnahme

X N Z V O
RTR Springe zurück RTR * * * * *
und ersetze
Bedingungscodes
RTS Zurück vom RTS - - - - -
Unter-
SBCD subtrahiere dezimal SBDC Dy, Dx * U * U *
mit Erweiterungsbit SBCD -(Ay), -(Ax)
Scc Setze in Abhängig- Scc <ea> - - - - -

keit der Bedingung
STOP Lade das Statusre- STOP # <data> - - - - -
gister und halte an
SUB Subtrahiere binär SUB.s <ea>, Dn * * * * *

SUB.s Dn, <ea>
SUBA Subtrahiere Adresse SUBA.s <ea>, An - - - - -
SUBI Subtrahiere direkt SUBI.s# <data>, <ea> * * * * *
SUBQ Subtrahiere schnell SUBQ.s# <data>, <ea> * * * * *
SUBX Subtrahiere mit SUBX.s Dy, Dx * * * * *

Erweiterungsbit SUBX.s -(Ay), -(Ax)
SWAP Vertauschte SWAP Dn - * * 0 0
Registerhälften
TAS Teste und setze TAS <ea> - * * 0 0
Operand
TRAP fangen TRAP#<vector> - - - - -
TRAPV fangen bei Überlauf TRAPV - - - - -
TST Teste einen TST.s <ea> - * * 0 0
Operanden
UNLK lösen UNLK An - - - - -

Rechnerarchitektur
Bedingte Sprünge; ‘Conditional Branches’
(-> Bcc Instruction im CPU32 User Manual)

The Bcc instructions allow selection of a control path in a program based on conditions.
IF (condition is true)
THEN (branch to new sequence)
ELSE (execute next instruction)
The new sequence of instructions may be at higher or lower memory addresses relative to
the branch instruction. The displacement is added as ‘signed int’ to the PC.
Label:
branch back
SUB generation of condition code
pair of Loop
instructions
BNE branch on condition code CC true (1)
CC ?
CC false (0) CC true (1)

CC false (0)
PC++ Branch
Successor
branch forward
Branch
Target PC + d
Label:
CC carry clear 0100 C

CS carry set 0101 C
EQ equal (to zero) 0111 Z
GE greater or equal 1100 N•V+N•V
GT greater than 1110 N•V•Z+N•V• Z
HI high 0010 C•Z
LE less or equal 1111 Z+N•V+N•V
LS low or same 0011 C+Z
LT less than 1101 N•V+N•V
MI minus 1011 N
NE not equal (to zero) 0110 Z
PL plus 1010 N
VC overflow clear 1000 V
VS overflow set 1001 V
The branch condition table presents the abreviations of the conditions and their coding
Rechnerarchitektur
(-> DBcc Instruction im CPU32 User Manual)

Don’t branch on condition !!!
REPEAT
(body of loop)
UNTIL (condition is true)
The DBcc instruction can cause a loop to be terminated when either the specified condition
CC is true or when the count held in Dn reaches -1. Each time the instruction is executed,
the value in Dn is decremented by 1.
IF (CC == true)
THEN PC++
ELSE {
Dn--
IF (Dn == -1)
THEN PC++
ELSE PC <- PC + d
}
Label: branch back

Loop
CC ? false
DBcc
pair of
instructions true Dn - 1
SUB generation of condition code
DBcc decrement and branch on condition code

Dn = -1 false
?
CC true (0) CC false (1) and Dn != -1
or Dn == -1
true
PC++ Branch
Successor PC + d
PC++ PC + d
Rechnerarchitektur
Unterprogramme; Subroutines
The subroutine is a sequence of instructions which is treated as a separate program module

within a larger program. The subroutine can be "called" or executed one or more times as
the program executes. Generally, the subroutines associated with a programm accomplish
specific tasks, each of which represents a simpler procedure than that of the entire program.
When the subroutine is called during execution of programs, its instructions are executed
and control is then returned to the next instruction in sequence following the call to the sub-
routine.
The instruction BSR and JSR cause a transfer of control to the beginning address of a sub-
routine. In the Branch to Subroutine statement
BSR <label>
the <label> operand causes the assembler to calculate the displacement between the BSR in-
struction and the instruction identified by <label>.
The instruction RTR and RTS finishes the subroutine and return to the instruction following
the "call".
Instruction Syntax Operation Comments
Branch to subroutine BSR<disp> 1.(SP)<-(SP)-4;((SP))<-(PC) <disp> is 8-bit

2.(PC)<-(PC)+<disp> or 16-bit
signed integer
Jump to subroutine JSR<EA> 1.(SP)<-(SP)-4;((SP))<-(PC) <EA> is a
2.(PC)<-(EA) control
addressing type
Return and restore RTR 1.(CCR)<-((SP))7:0;(SP)<-(SP)+2 (CCR) =

Condition codes 2.(PC)<-((SP));(SP)<-(SP)+4 (SR) 7:0
Return from RTS (PC)<-((SP));(SP)<-(SP)+4

subroutine
subroutine
Save PC++
BSR
Restore PC++
return
Rechnerarchitektur
Stack
Ein Stack ist ein wichtiges Hilfsmittel, das die Verarbeitung von Unterprogrammen und Un-
terbrechungen (Interrupts) ermöglicht. Man kann einen Stack entweder direkt in Hardware
auf dem Prozessorchip realisieren oder mit Hilfe eines Stackpointers in den Hauptspeicher
abbilden. Die erste Methode ist zwar schneller, erfordert aber mehr Chipfläche als ein ein-
ziges Register. Ein weiterer Vorteil der Stackpointer-Methode besteht darin, daß die Spei-
chertiefe des Stacks durch zusätzlichen Hauptspeicher beliebig vergrößert werden kann.
Ein Stack arbeitet nach dem LIFO-Prinzip (Last In First Out). Dabei sind nur zwei Opera-
tionen erlaubt: PUSH und POP. Mit der PUSH-Operation wird ein Datenwort auf den Stack
gelegt und mit der POP-Operation wird es wieder zurückgeholt. Während des Zugriffs auf
den Hauptspeicher wird der Stackpointer als Adresse (Zeiger) benutzt. Außerdem wird der
Stackpointer durch die Ablaufsteuerung so verändert, daß ein Zugriff nach dem LIFO-Prin-
zip erfolgt.
Es gibt zwei Möglichkeiten, den Stackpointer zu verändern:
1. vor der PUSH-Operation und nach der POP-Operation
2. nach der PUSH-Operation und vor der POP-Operation.
Stack Growing Stack Growing

into Lower Memory into Higher Memory
000000 to lower address 000000 to lower address

(free)
base
address
(SP) -> Top Bottom
base
address
Bottom Top
(SP) -> (free)
fffffe to higher address fffffe to higher address
PUSH (value to stack): PUSH (value to stack):

(SP) <- (SP) - k ((SP)) <- Operand
then then
((SP)) <- Operand (SP) <- (SP) + k
POP (value from stack): POP (value from stack):

Operand <- ((SP)) (SP) <- (SP) - k
then then
(SP) <- (SP) + k Operand <- ((SP))
Beachten Sie, daß die Darstellung des Speichers mit niedrigen und höheren Adressen auch
vertauscht dargestellt werden kann!
Rechnerarchitektur
Stack
Betrachten wir den Fall, das der Stack zu den niedrigen Adressen hin wächst, so hat die
PUSH-Operation folgende Auswirkung:
Stack Operation PUSH

for Stack growing to lower addresses
after PUSH
before PUSH
to lower address
000000 to lower address 000000
(free)
(SP) = 0ffe -> 345C
(SP) = 1000 -> 12AE 12AE
base base
address address
fffffe to higher address fffffe to higher address
MOVE.W D0, -(SP) (D0) = 345C
Die Organisation des Stacks ist prozessorspezifisch. Meist beginnt er am Ende des Haupt-
speichers und "wächst" nach niedrigeren Adressen. Für diesen Fall beginnt das Programm
am Anfang des Hauptspeichers. An das Programm schließt sich der Datenbereich an. Der
Stack darf niemals so groß werden, daß er den Daten- oder sogar Programmbereich über-
schreibt (Stack-Overflow). Wenn durch einen Programmierfehler (zu viele Unterprogramm-
aufrufe ohne entsprechende Returns from Subroutine) der Stack überläuft, muß das
Betriebssystem das betreffende Programm abbrechen.
Stack Placement
in Memory
fffffe
(free)
base
address to higher address
Stack
(SP) -> .stack
(free)
Data
.data
to lower address
Program
000000 .text
Rechnerarchitektur
Beispielprogramm: Vektoraddition
ORG $0
LEA $4000,SP init stack pointer
BRA START
VECA DC.W 1,2,3,4,5,6 first vector (6 words)

VECB DC.W 1,2,3,4,5,6 second vector (6 words)
LENGTH DC.W 6 length of one vector
RVEC DS.W 6 reserve memory for result
ORG $2000 start at location

2000 Hex
START CLR.L D0 clear D0

MOVE.W (LENGTH),D0 load LENGTH value
LEA VECA,A0 load base address
of first vector
LEA VECB,A1 load base address
of second vector
LEA RVEC,A2 load base address
of result vector
BSR SR_ADD branch to SR
BREAK end of program
SR_ADD SUB.W #1,D0 decrement counter

BMI EXIT if negative, exit SR
MOVE.W (A0)+,D1 load first operand to D1
MOVE.W (A1)+,D2 load second operand to D2
ADD.W D1,D2 add operands
MOVE.W D2,(A2)+ store result
BRA SR_ADD next loop
EXIT RTS return from SR
Rechnerarchitektur
Argument Passing
The information needed by the subroutine is defined in terms of parameters which allow the
subroutine to handle general cases rather than operate on specific values. Each subroutine
call allows different values to be supplied as input parameters and output parameter passed
back as results.
There are different methodes to transfer the parameters, depending of the memory used to
supply values.
• register
• stack
• parameter areas
• in-line
• Register transfer
When only a small number of arguments are to be transferred, they can be passed directly
between the main program and the subroutine in the processor registers. Data structures such
as arrays are passed by the address that points to the start of the data structure.
Notice the distiction between to calling mechanisms:
- call-by-value
- call-by-reference
Call-by-value passes a copy to the subroutine, which can be altered, but has no effect outside
the subroutine.
Call-by-reference passes only a pointer to the data or the data structure, thus giving the sub-
routine complete access to the data structure. The data values may be changed by the sub-
routine.
This parameter passing is restricted to only a small number of arguments which have to fit
into the registers of the processor. The advantage of this kind of is argument passing is a very
fast call, because there is no main memory access required for saving and loading of para-
meters.
• Stack transfer
A stack can be used to pass arguments by having the calling routine push values or addresses
on the stack before the call. Popping the arguments in the subroutine give access to the va-
lues or addresses. For programs running in user mode the active stack pointer is the USP.
The following example shows the passing of input parameter to a subroutine and the addres-
sing of the values by the subroutine.
MOVE.W VAL1,-(SP) ;PUSH FIRST VALUE TO STACK
MOVE.W VAL2,-(SP) ;PUSH SECOND VALUE TO STACK
;STACK increases to lower memory
BSR SUBR
...
SUBR MOVE.W D1,-4(SP) ;MOVE OF VAL1 TO D1
MOVE.W D2,-6(SP)
ADD.W D1,D2
...
Rechnerarchitektur
• Parameter areas
When large numbers of parameters are to be passed, a parameter area in memory can be
used. The area contains values and/or addresses that are accessed by the subroutine. The
same area could be used by several subroutines requiring different parameters as long as the
area is large enough to hold the maximum number of arguments.
• In-line coding
This method defines argument values which are constant and will not change after assembly.
These values can be defined by DC directives following the call.
JSR SUBR
DC.W 1 ; DEFINE CONSTANT IN CODE AREA
...
SUBR MOVEA.L (SP),A0 ; GET PC INTO A0

MOVE.W (A0)+,D1 ; GET ARGUMENT FROM CODE AREA
Stack Frames
One of the principal issues in the design of subroutines involves the concept of transparency.
Simply stated, when a subroutine finishes executing, it should have no "visible" effect ex-
cept as defined by its linkage to the calling program. For example, a subroutine should not
change the values in any registers, unless a register is used to return a result. In some pro-
grams this is accomplished by pushing the contents of the registers used by the subroutine
on the stack upon entry to the subroutine. The values are restored before returning to the cal-
ling program. The return address is automatically saved and restored by the JSR and RTS
instructions. The use of the system stack to save and restore the return address and the con-
tents of registers used within the subroutine assures that the details of the subroutine opera-
tion are transparent to the calling program. If a subroutine itself makes a subroutine call, the
use of the stack for temporary storage of register contents by each subroutine and for each
return address allows such nesting of subroutine calls without difficulty. This concept of
using the stack to store data temporarily during subroutine execution can be extended by de-
fining a stack frame.
The stack frame is a block of memory in the stack that is used for return addresses, input
parameters, output parameters, and local variables. It is the area of the stack accessed by a
subroutine during its execution. Local variables are those values used during the subroutine
execution that are not transferred back to the calling routine. A loop counter, for example,
which changes as the subroutine performs each iteration might be defined as a local variable.
On each call to the subroutine, a new set of parameters, local variables, and return addresses
can be accessed by a subroutine using the stack frame technique. If the subroutine is called
before it is completely finished, the values in the stack frame will not be destroyed.
Rechnerarchitektur
Subroutine Usage and Argument passing
;; Program operation creating a stack frame
;; ________________________________________
ORG $1000 ; SET ORIGIN

LEA $6000,SP ; INITIALISATION OF STACK
;; Calling Program
N EQU 8 ; 8 BYTES INPUT

M EQU 8 ; 8 BYTES OUTPUT
ADD.L #-N,SP ; OUPUT AREA OF STACK

MOVE.L ARG,-(SP) ; INPUT ARGUMENT
PEA X ; INPUT ADDRESS
JSR SUBR ; JUMP SUBROUTINE
ADD.L #8,SP ; SKIP OVER INPUTS ON STACK
MOVE.L (SP)+,D1 ; READ INPUTS
MOVE.L (SP)+,D2
BREAK ;
ARG DC.L $01234567 ; ARGUMENT TO PASS

X DS.B 200 ; TABLE WHOSE ADDRESS IS PASSED
;; Subroutine
SUBR LINK A1,#-M ; SAVE OLD FRAME POINTER,

MOVE NEW FP TO A1
MOVE.L LOCAL1,-4(A1) ; SAVE LOCAL VARIABLES ON STACK

MOVE.L LOCAL2,-8(A1)
ADD.L #1,-4(A1) ; CHANGE LOCAL VARIABLE

MOVEA.L 8(A1),A2 ; GET X
* ; put in some code here
MOVE.L OUTPUT1,16(A1) ; PUSH AN OUTPUT
UNLK A1 ; RESTORE
RTS ; RETURN
LOCAL1 DC.L $98765432 ; LOCAL VARIABLES

LOCAL2 DC.L $87654321
OUTPUT1 DC.L 'ABCD' ; OUTPUT VALUE
END
Rechnerarchitektur
SP Ret PC
SP X X
SP ARG ARG ARG
SP
Output Output Output Output

Area Area Area Area
(free)
SP
(a) before (b) after (c) after (d) after (e) after
first Instr. ADD.L MOVE.L PEA JSR
FFFF
(free)
SP
Local
Area
-4(A1)
Displacment Addressing
within Subroutine
FP Old FP
Ret PC SP Ret PC
FP=(A1)
X X SP X (free)
ARG ARG ARG
SP
16(A1)
Output Output Output Output
Area Area Area Area
(f) after (g) after (h) after (i) after

LINK UNLINK RTS ADD.L #8
Rechnerarchitektur
Processor Operating Modes
Exception + Reset + Trap + Interrupt + Error
User Supervisor
Mode Mode
more privileges
Supervisor Program sets status to user
privileged Instructions cannot be all Instructions can be executed !

executed !
=> Exception MOVE to SR
USP A7 Stack Pointer SSP A7’

also access to USP !
MOVE USP
T S I0 I1 I2 CCR T S I0 I1 I2
SR SR
Read only R/W R/W by special operations to SR
MOVE to SR
no modification of
ANDI to SR
• Trace bit
ORI to SR
• Supervisor bit
...
• Interrupt Mask
Special System Instructions
RESET
STOP
RTE
Rechnerarchitektur
Exception Processing
• Transfer of control from user program to supervisor program

- cause of exception
• trap instruction and unusual condition during instruction execution
software exception -> synchronous
• external events
interrupts -> synchronous
Exception Processing Sequence

user program supervisor program
TRAP Handler
Trap Save SR internally

Instruction Set S=1, T=1
which causes
determine Vector No. (cause of TRAP)
exception
Push PC++ -> SSP
Push SR -> SSP
PC <- (Vector base + Vector) Jump to handler
return Save User State
Restore User State
RTE Restore PC++
TRAPs
Program TRAPs
Vectors
TRAP Instruction
$080-$0BC Instruction to CALL service from Operating System
normal execution
Unimplemeted Instructions
$028-$02C Opcodes 1010 ; 1111 (FP-Emulation)
normal execution
CHK Instruction
$018 wrong array reference
range check
DIV_ Instruction
$014 abort program
division by zero
TRAP V Instruction
$01C overflow detection
exec. when V=1
Error TRAPs
Privilege Violation
$028-$02C Opcodes 1010 ; 1111 (FP-Emulation)
tried to execute priv. Instr.
Illegal Instruction Address Error Bus Error
Rechnerarchitektur
Interrupts
Interrupts are externally generated requests for exception processing. They allow external
devices to interupt the processor execution at any time of the program execution (asynchro-
nously). The control flow of the processor is switched to the interrupt-handling routine, na-
med interrupt service routine (ISR). There are mechanisms to disable the immediate
response to the interrupt signal (Masking and/or Priority).
Interrupt Types
• non-masked interrupt requests (IRQs)
highest priority; used for immediate error handling
• masked IRQs
Mask-bit of CPU; disabling of interrups within critical program segments
• prioritized IRQs
multiple levels; introduction of priorities in order to weight the devices con-
cerning their importance for service; can be implemented by hardware or soft-
ware.
Interrupt Processing Sequence
Device Processor
user program supervisor program
event Save SR internally
IRQ Set S=1, T=1
determine Vector No. (which device)
Push PC++ -> SSP
Push SR -> SSP
PC <- (Vector base + Vector) Jump to ISR
Save User State
ISR
Restore User State
RTE Restore PC++
Application for Interrupts

• Input/Output and Communication; - Service for Devices
- program controlled (busy waiting)
- interrupt controlled - fast reaction
• Timing Control
- Timer, Watch Dogs
- Multiprogramming Systems
Rechnerarchitektur
Hardware Structure of IRQ-Signaling
Device Processor
event
SR
DATA
IRQ to Instruction-
Processor
CR
MaskR
Interrupt Priority Logic

IRQ3 4-Levels
IRQ2
IRQ1
IRQ0
Rechnerarchitektur
Memory Management
1. Schutzfunktion => Zugriffsrechte

Eine Möglichkeit den Speicher zu organisieren
- R/W Memory
- ROM
CPU I/D ROM
R/W
R/W RAM
Beim Multi-processing oder Multi-user Systemen reicht die oben genannte Möglichkeit,
den Speicher zu organisieren, nicht aus.
2. Erweiterung des begrenzten physikalischen Hauptspeichers
In den gängigen Architekturen wird mit 32 Bit adressiert. Daraus folgt die Größe
des virtuellen Adressraums mit 232
Virtuelle Adresse Physikalische Adresse
phys. Mem. upper bound base of .text

virtual address
pid = 1
> +
4000
.data 1
.data 1
1000 access trap physical addr.
.text 1
0 .text 1
10000
.text 2 9600
pid = 2 3600
.data 2
3000 1000
.data 2
......
......
400
.text 2
0 0
Memory fragmentation
Rechnerarchitektur
Virtueller Speicher / Paging
Logischer und physikalischer Adressraum werden in Seiten fester Größe unterteilt, meist 4
KByte. Logische Pages werden in einer Pagetable auf physikalische Pageframes abgebildet,
dabei ist der logische Adressraum im allgemeinen wesentlich größer als der physikalisch
vorhandene Speicher. Nur ein Teil der Pages ist tatsächlich im Hauptspeicher, alle anderen
sind auf einen Sekundärspeicher (Platte) ausgelagert.
- Programme können größer als der Hauptspeicher sein

- Programme können an beliebige logische Adressen geladen werden, unabhän-
gig von der Aufteilung des physikalischen Speichers
- einfache Verwaltung in Hardware durch feste Größe der Seiten
- für jede Seite können Zugriffsrechte (read/write, User/Supervisor) festgelegt
und bei Zugriffen überprüft werden
- durch den virtuellen Speicher wird ein kostengünstiger großer und hinreichend
schneller Hauptspeicher vorgetäuscht (ähnlich Cache)
Die Pagetable enthält für jeden Eintrag einen Vermerk, ob die Seite im Hauptspeicher vor-
handen ist (P-Bit / present). Ausgelagerte Pages müssen bei einer Referenz in den Haupt-
speicher geladen werden, ggf. wird dabei eine andere Page verdrängt. Modifizierte Seiten
(M-Bit / modify) müssen dabei auf den Sekundärspeicher zurückgeschrieben werden. Dazu
wird ein weiteres Bit eingeführt, das bei einem Zugriff auf die Seite gesetzt wird (R-Bit /
referenced)
Replacement-Strategien :
- not recently used - NRU
mithilfe der Bits R und M werden vier Klassen von Pages gebildet
0: not referenced, not modified
1: not referenced, modified
2: referenced, not modified
3: referenced, modified
es wird eine beliebige Seite aus der niedrigsten nichtleeren Klasse entfernt
- FIFO
die älteste Seite wird entfernt (möglicherweise die am häufigsten benutzte)
- Second-Chance / Clock
wie FIFO, wurde der älteste Eintrag benutzt, wird zuerst das R-Bit gelöscht und
die nächste Seite untersucht, erst wenn alle Seiten erfolglos getestet wurden,
wird der älteste Eintrag tatsächlich entfernt
- least recently used - LRU
die am längsten nicht genutzte Seite wird entfernt, erfordert Alterungsmechnis-
mus
Rechnerarchitektur
Virtueller Speicher / Paging
Bei modernen 32 Bit Prozessoren und einer Seitengröße von z.B. 4 KByte wird die Pageta-
ble sehr groß, z.B. 4 MByte bei 32 Bit Pagetable Einträgen. Da meist nicht alle Einträge ei-
ner Tabelle wirklich genutzt werden, wird eine mehrstufige Umsetzung eingeführt. Zum
Beispiel referenzieren die obersten Adressbits eine Tabelle eährend die mittleren Bits den
Eintrag in dieser Tabelle selektieren.
- einzelne Tabellen werden kleiner und in der zweiten Stufe werden nur
wenige Tabellen benötig
- die Tabellen der zweiten Ebene können selbst ausgelagert werden
Pagetables können aufgrund ihrer Größe nur im Hauptspeicher gehalten werden. Zum Be-
schleunigen der Adressumsetzung, insbesondere bei mehrstufigen Tabellen, wird ein Cache
verwendet. Dieser Translation Lookaside Buffer (TLB) enthält die zuletzt erfolgten Adres-
sumsetzungen. Er ist meist vollassoziativ ausgeführt und enthält z.B. 64 Einträge. Neuer-
dings wird auch noch ein setassoziativer L2-Cache davorgeschaltet.
Pagetables sind prinzipiell cachable, allerdings werden die Einträge wegen ihrer relative sel-
tenen Benutzung (im Vergleich zu normalen Daten) schnell aus dem allgemeinen Cache ver-
drängt.
Rechnerarchitektur
Memory Hierarchy
Definition : A memory hierarchy is the result of an optimization process with respect to techno-
logical and economic constrains. The implementation of the memory hierarchy con-
sists of multiple levels, which are differing in size and speed. It is used for storing
the working set of the ‘process in execution’ as near as possible to the execution
unit.
The memory hierarchy consists of the following levels:
speed size cost

- registers more
faster expensive
- primary caches
- local memory on chip
- secondary caches
denser
- main memory off chip larger
The mechanisms for the data movement between levels may be explicit (for registers, by
means of load instructions) or implicit (for caches, by means of memory addressing).
CPU Chip
CPU
2-5ns Register Files
5-10ns 1st-Llevel Cache

External Processor
Interface
10-30ns 2nd-Level Cache
60-200ns Main Memory
2-10ms Disk Storage
Tapes
Registers are the fastest storage elements within a processor. Hence, they are used to keep
values locally on-chip, and to supply operands to the execution unit(s) and store the results
for further processing. The read-write cycle time of registers must be equal to the cycle time
of the execution unit and the rest of the processor in order to allow pipelining of these units.
Data is moved from and into the registers by explicit software control.
Rechnerarchitektur
Registers can be grouped in a wide variety of ways:

- evaluation stack
- register file
- multiple register window
- register banks
Example: Overlapping Register Windows (SPARC-Processor)

The desire to optimize the frequent function calls and returns led to the structure of overlap-
ping register windows. The save and restore of local variables to and from the procedure
stack is avoided, if the calling procedure can get an empty set of registers at the procedure
entry point. The passing of parameter to the procedure can be performed by overlapping of
the actual with the new window. For global variables a number of global registers can be
reserved and accessed from all procedure levels. This structure optimize the procedure call
mechanism of high level languages, and optimizing compiler can allocate many variables
and parameters directly into registers.
register file procedure A overlapping registers

between A and B
135 R31A
parameter parameter
128 R24A
127 R23A
local
variables
local
variables
procedure B overlapping registers
120 R16A between B and C‘
119
R15A R31B
parameter parameter parameter
112 R8A R24B
R23B
local
variables
local
variables
procedure C
R16B
R15B R31C
parameter parameter
R8B R24C
R23C 24 window
local registers
R16C variables
R15C
31 parameter
parameter R8C
24
23
local
variables
16
15
parameter
8
7 R7A R7B R7C 8 global
global global global global registers
0
variables R0A variables R0B variables R0C variables
Due to the fixed size of the instruction word the register select fields have the same width (5
bits) as in the register file structure. The large RF requires more address bits and therefore
the addressing is done relatively to the window pointer.
(more details in -> Rechnerarchitektur 1)

Rechnerarchitektur
Caches are the next level of the memory hierarchy. They are small high speed memories em-
ployed to hold small blocks of main memory that are currently in use.
The principle of locality, which justifies the use of cache memories, has two aspects:
- locality in space or spatial locality
- locality in time or temporal locality
Most programs exhibit this locality in space in cases where subsequent instructions or data
objects are referenced from near the current reference. Programs also exhibit locality in time
where objects located in a small region will be referenced again within a short periode. In-
structions are stored in sequence, and data objects normally being stored in the order of their
use. The following figure is an idealized space/time diagram of address references, repre-
senting the actual working set w in the time interval Δτ.
Address
Space
Data
Δτ
Data w ( T, T + Δτ)
Instruction
T T+ Δτ time
Caches are transparent to the software. Usually, no explicit control of cache entries is possi-
ble. Data is allocated automatically by cache control in the cache, when a load instruction
references the main memory.
Rechnerarchitektur
Cache memory design aims to make the slow, large main memory appear to the processor
as a fast memory by optimizing the following aspects:
- maximizing the hit ratio
- minimizing the access time
- minimizing the miss penalty
- minimizing the overhead for maintaining cache consistency
The performance gain of a cache can be described by the following formula:
Tm 1
Gcache = =
( 1 - H ) Tm + H Tc Tc
(1-H) +H
Tm
miss ratio hit ratio
1 Tm = tacc of main memory

=
Tm Tc = tacc of cache memory
1- H (1- )
Tc H = hit ratio [0, ...1]
G
5
4 example for
3 Tm 5
2 =
1
Tc 1
H
0 0.5 1
The hit ratio of the cache (in %) is the ratio of accesses that find a valid entry (hit) to accesses
that failed to find a valid entry (miss). The miss ratio is 100% minus hit ratio.
The access time to a cache should be significantly shorter than that to the main memory. On-
chip caches (L1, primary caches) normally need one clock tick to fetch an entry.
Access to off-chip caches (L2, secondary caches) is dependent on the chip-to-chip delay, the
control signal protocol, and the access time of the external memory chips used.
Rechnerarchitektur
One of the most important features of the cache organization is the mapping principle. Three
different strategies can be distinguished, but the range of caches - from directly mapped to
set associative to fully associative - can be viewed as a continuum of levels of set associati-
vity.
Cache Mapping m+n+x+z 0

m Bits n Bits x Bits z Bits
Tag Index Word Byte
select select
Address
index
ith i index
2n m n
page
size
ith cache block
Tag
m mem block
2m 2n Mem
blocks entries
of ith
cache cache
size = hit
size
Main Memory Cache Hardware Structure

(Address Path Only)
Directly Mapped Cache
The main memory is the lowest level in the semiconductor memory hierarchy. Normally all
data objects must be present in this memory for processing by the CPU. In the case of a de-
mand-paging memory, they are fetched from the disk storage to the memory pages on de-
mand before processing is started. Its organization and performance are extremely important
for the system’s execution speed. Due to the cost-performance ratio mostly all main memory
implementation uses DRAMs. The simplest form of memory organization is the word-wide
memory, matching the bus width of the external processor interface.
High-performance processors need more memory bandwidth than a simple one-word mem-
ory can provide. The access and cycle times of highly integrated dynamic RAMs are not
keeping up with the clock speed of the CPUs. Therefore, special architectures and organiza-
tions must be used to speed-up the main memory system.
Rechnerarchitektur
Schichtenmodell der Interpretationsebenen
HLL Höhere Programmiersprache
HLL-Compiler
IML
Zwischensprache
Compiler Backend
ASL
Assemblersprache
Assembler
Software CML
konventionelle Maschinensprache
Hardware
CML-Interpreter
MPL
Mikroprogrammsprache
Mikro-
Programm Interpreter
NPL
Nanoprogrammsprache
Nano-
Programm Interpreter
HW-Kontrollvektor
Funktionseinheiten des Prozessors
Auf die Ebenen der Mikroprogrammierung und Nanoprogrammierung wird heute in Hoch-
leistungsprozessoren meist verzichtet, um die Interpretation und Ausführung der Befehle
schnell zu machen. Daraus ergibt sich, daß die Befehle einfach sein sollten, um den HW-
Kontrollvektor durch einfache Schaltnetzfunktionen aus der Instruktion ableiten zu können.
Die Nachbildung der komplexen Befehle durch eine Sequenz einfacher Befehle wird in den
Backendteil des Compilers verschoben.

Rechnerarchitektur

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Rechnerarchitektur

Hochgeladen von

Copyright:

Verfügbare Formate

Vorlesung Einführung in die Technische Informatik Seite 1

Das grundlegende Organisationskonzept der meisten Computer, die gegenwärtig benutzt

Befehle Daten Daten

Definition : Rechnerarchitektur [Giloi 93]

- Kontrollstruktur : Die Kontrollstruktur einer Rechnerarchitektur wird durch

PC+1 Operation ausführen,

Die ’abstrakte von-Neumann-Maschine‘ (der I/O-Prozessor wird nicht berücksichtigt) be-

Verarbeitung von mehreren Datenelementen mit nur einer Instruktion

Pipeline - Prinzip parallele V.E. (P.E‘s)

Bestandteile eines ’sehr’ einfachen Autos :

Welche Möglichkeiten des Zusammenfügens (Assembly) gibt es ?

Worauf muß man achten ?

Blech Lack Fahrgestell Motor Räder

Produktion von verschiedenen Modellen : 3 Farben R(ot),

Pipeline des Fertigungsvorgangs

20 min Optimierung der Stufe : Lackierung

Stufen - Zeit - Diagramm der Pipeline

Wichtige Kenndaten eines Registers:

Clock-to-output Zeit (tco):

Daten sollten am Eingang stabil sein

The performance gain achieved by pipelining is accomplished by partitioning an operation

Assumptions for Pipelining

1 the operation F can be partitioned

all suboperations fi require approximately

3 there are several operations F

4 the execution time of the suboperations is long

Linear Pipeline with k Stages

Pipelined Operation Time

tp ( n, k) = k + (n-1) for this example: tp (10,5) = 5 + (10 - 1) = 14

time to fill time to process

Effizienz Efficiency initiation rate, latency

Assumptions for Pipelining

1 the operation F can be partitioned

all suboperations fi require approximately

Assumptions for Pipelining

3 there are several operations F

4 the execution time of the suboperations is long

t (F) = tf1 + tf2 + tf3

tf1 tsu tco tf2 tsu tco tf3 tsu tco

tcyc k stages register delay time

tcyc = max (tfi) + tco + tsu fcyc = 1 / tcyc

Sie ist definiert durch eine Menge von Maschinenbefehlen.

Bei den Maschinenbefehlen unterscheidet man zwei grundlegende Typen:

OPC OPA z.B. 32-bit

Operation Code Operand Addresses

Maschinenbefehle mit variable Länge nutzen je nach Adressierungsart unterschiedliche Be-

OPC z.B. 16-bit

Operation Code Operand Addresses

- Ein-Adreß-Maschine oder Akkumulator-Maschine

Die von-Neumann-Architektur ist eine solche Ein-Adreß-Maschine.

Operation Code Operand Addresses

Befehl: ADD D0,D1 Befehl: ADD D0,(A0) Befehl: ADD (A1),(A0)

• Vergleichsbefehle; hier werden zwei Operanden gemäß einer Ordnungsrelati-

• Sprungsbefehle; der unbedingte Sprung überschreibt den Befehlszähler immer

• Bitmuster-Befehle; sie dienen dazu, Operanden gemäß einer Bitmusteropera-

• Bit-Befehle; Transportbefehle zwischen einem Anzeigebit des Statusworts und

• E/A-Befehle; Transportbefehle zwischen E/A-Steuerung und Registern, bzw.

• Spezialbefehle; sie beeinflussen die Betriebsart des Prozessors durch Laden

• immediate; Sofortoperand; im Befehl steht nicht die Speicheradresse des Orts

• register indirect with displacement; ... mit Distanz; zur Operandenadresse

• program counter indirect;... Befehlszähler relativ; zur Operandenadresse wird

• Register indirekt mit Postinkrement; (An)+; nach dem Operandenzugriff wird

Nr. EA-Modus EA-Reg. Adressierungsart Mnemonik

8 111 000 Absolut kurz $XXXX 16-bit