Beruflich Dokumente
Kultur Dokumente
Assemblers
Outline
Basic Assembler Functions
Machine-Dependent Assembler Features
Machine-Independent Assembler Features
Assembler Design Options
Implementation Examples
Basic Assembler
Functions
Assemblers
The fundamental functions that assemblers must
perform:
Translate mnemonic operation codes to their machine
language equivalents.
Assign machine addresses to symbolic labels used by
the programmer.
Assembler Directives
In addition to the mnemonic machine instructions, SIC and
SIC/XE include the following assembler directives:
Directives
START
END
BYTE
WORD
RESB
RESW
Description
Specify name and starting address for the program.
Indicate the end of the source program and (optionally) specify
the first executable instruction in the program.
Generate character or hexadecimal constant, occupying as
many bytes as needed to represent the constant.
Generate one-word integer constant.
Reserve the indicated number of bytes for a data area.
Reserve the indicated number of words for a data area.
Instruction
An assembly instruction is a statement that is executed at run time. It
usually consists of four parts:
Label (optional)
Instruction (required)
Operands (instruction specific)
Comment (optional)
Instruction
E.g.
ADDLP
Label
ADDR
Operands
Comment
The program in the next slide is a routine that reads records from an
input device (device cdoe F1) and copies them to an output device
(code 05). This program will be used throughout the whole chapter.
Subroutine RDREC: read a record into a buffer (buffer size = 4096B).
Subroutine WRREC: write the record from the buffer to the output device.
Copyright All Rights Reserved by Yuan-Hao Chang
Line number
Label
Instruction Operand
Comment
Start of the
program
Read 4096 bytes
in each loop
(A) : (ZERO)
A (EOF)
L (RETADR)
The number of
bytes stored in
the read buffer
BUFFER
X (ZERO)
A (ZERO)
RETADR (L)
JSUB RDREC
L (PC);
PC RDREC
RSUB
PC (L)
The reference to
RETADR is defined
later in the program
Line number
Instruction
Operand
Object
November
26,code
2010
Starting address
SIC
No object code
generated for
addresses 10332038.
This storage is
reserved by the
loader for use by
the program
during
execution.
10
11
Col. 1
H
Col. 2-7 Program name
Col. 8-13 Starting address of object program (hexadecimal)
Col. 14-19Length of object program in bytes (hexadecimal)
Text record: Contain the translated (i.e., machine code) instructions and
data of the program, together with an indication of the addresses where
these are to be loaded.
-
Col. 1
T
Col. 2-7 Starting address for object code in this record (hexadecimal)
Col. 8-9 Length of object code in this record in bytes (hexadecimal)
Col. 10-69Object code, represented in hexadecimal (2 columns per byte of
object code)
End record: Mark the end of the object program and specify the address in
the program to begin.
- Col. 1
- Col. 2-7
E
Address of first executable instruction in object program (hexadecimal)
To avoid confusion, the term column is used to refer to positions within object program records.
Copyright All Rights Reserved by Yuan-Hao Chang
30 bytes
12
13
Pass 2
The second pass assembles instructions and generates object
program.
- 1. Assemble instructions: Translating operation codes and looking up
addresses.
- 2. Generate data values defined by BYTE, WORD, etc.
- 3. Perform processing of assembler directives not done during Pass 1.
- 4. Write the object program and the assembly listing.
Copyright All Rights Reserved by Yuan-Hao Chang
14
15
16
17
18
comment
Handle the
LABEL field of
the current input
line
The instruction length of
SIC machine is 3 bytes.
Copyright All Rights Reserved by Yuan-Hao Chang
19
20
Reserve # of bytes
Declare a BYTE constant
21
Handle the
OPERAND of
an instruction.
There is no symbol
in the OPERAND
field
The current line is to
declare a BYTE or
WORD constant.
22
Machine-Dependent
Assembler Features
24
Line number
Label
Instruction
Operand
ObjectNovember
code 26, 2010
SIC/XE
25
26
Line number
Operand
Object
November
26,code
2010
Relocatable program
27
28
29
0006
CLOOP
+JSUB RDREC
4B101036
opcode = 48
n i x b p e
1 1 0 0 0 1
addr = 01036
30
0000
FIRST
STL
95
0030
RETADR
RESW 1
Line
Loc
Label
Instruction
RETADR
17202D
opcode n i x b p e
disp
14
1 1 0 0 1 0 02D=30-3
disp = 0006 (PC) = 0006 001A = -14 = FEC (2s complement in 12 bits)
15
0006
40
0017
CLOOP
+JSUB RDREC
4B101036
3F2FEC
CLOOP
n i x b p e
1 1 0 0 1 0
Copyright All Rights Reserved by Yuan-Hao Chang
31
0003
LDB
#LENGTH
BASE LENGTH
100
105
0033
0036
LENGTH
BUFFER
RESW 1
RESB 4096
160
104E
FIRST
STCH BUFFER, X
opcode = 10
n i x b p e
1 1 0 1 0 0
disp = 000
(33-33 = 00)
175
1056
EXIT
STX
opcode = 68
n i x b p e
0 1 0 0 1 0
disp = 02D
(33-06 = 2D)
69202D
57C003
opcode = 54
n i x b p e
1 1 1 1 0 0
disp = 003
(36-33 = 03)
LENGTH
Copyright All Rights134000
Reserved by Yuan-Hao Chang
32
0003
LDB
#LENGTH
BASE LENGTH
Suppose we choose PC
relative addressing before
choosing base relative
addressing.
PC
relative
69202D
20
000A
LDA
100
0033
LENGTH
RESW 1
175
1056
EXIT
STX
LENGTH
LENGTH
Base
relative
reg. B = 33
032026
134000
opcode = 10
n i x b p e
1 1 0 1 0 0
disp = 000
(33-33 = 00)
33
Immediate Addressing
The assembly of an instruction that specifies immediate addressing is
simpler because no memory reference is involved.
It is to convert the immediate operand to its internal representation and
insert it into the instruction.
Immediate addressing
PC relative addressing is not combined because 3 is a value.
55
0020
LDA
133
103C
+LDT
Format 4
The operand (4096) is too large to fit in
the 12-bit displacement field.
The value of a symbol is
the address assigned to it.
Therefore, #LENGTH = 33
12
0003
LDB
opcode = 00
n i x b p e
#3
010003
0 1 0 0 0 0
#4096 75101000 disp = 003
opcode = 68
n i x b p e
0 1 0 0 1 0
disp = 02D
(33-06 = 2D)
#LENGTH
opcode = 74
n i x b p e
0 1 0 0 0 1
addr = 01000
69202D
34
Indirect Addressing
70
002A
95
0030
J
RETADR
@RETADR
RESW 1
Indirect
addressing
3E2003
opcode = 3C
n i x b p e
1 0 0 0 1 0
disp = 003
(030-02D=003)
35
Program Relocation
We usually do not know exactly when jobs will be submitted,
exactly how long they will run.
In such a situation, the actual starting address of the program is not
known until load time.
SIC 55
101B
LDA
THREE
00102D
The operand only refers to
address 102D so that the
program cant be relocated to
other addresses.
opcode = 00
x = 0
addr = 102D
The assembler does not know the actual location where the program
will be loaded, but it can identify for the loader these parts of the
object program that need modification.
An object program that contains the information necessary to perform
this kind of modification is called a relocatable program.
Copyright All Rights Reserved by Yuan-Hao Chang
36
1036
No matter where the
program is loaded,
RDREC is always
1036 bytes past the
starting address of the
program.
1036
37
Program Relocation
Some values in the object program depend on the starting address of
the program.
In order to support program relocation, the values depending on the starting
address of the program need to be relocated when program is loaded.
The assembler produces a command for the loader to add the beginning
address of the program to the address field in the +JSUB instruction at
load time.
The command for the loader must be a part of the object program.
38
15
0006
CLOOP
+JSUB
RDREC
4B101036
+JSUB
+JSUB
WRREC 4B10105D
WRREC 4B10105D
Copyright All Rights Reserved by Yuan-Hao Chang
39
40
Machine-Independent
Assembler Features
42
43
235
240
245
255
106E
1070
1073
1076
TIXR
JLT
RSUB
END
*
=X05
Copyright
All
Rights
T
WLOOP
B850
3B2FEF
4F0000
FIRST
05 Chang
Reserved by Yuan-Hao
44
45
Literals
Literals are constants written in the operand of instructions.
This avoids having to define the constant elsewhere in the program
and make up a label for it.
In SIC/XE program, a literal is identified with the prefix =, followed
by a specification of the literal value, using the same notation.
45
001A
ENDFIL
LDA
002D
=CEOF
=CEOF 032010
454F46
literal
215
1062
WLOOP
TD
1076
=X05
=X05
E32011
05
opcode = 00
n i x b p e
1 1 0 0 1 0
disp = 010
(02D-01D=010)
opcode = 0E
n i x b p e
1 1 0 0 1 0
disp = 011
(1076-1065=011)
46
001A
ENDFIL
LDA
=CEOF
002D
=CEOF
032010
454F46
0020
LDA
#3
opcode = 00
n i x b p e
1 1 0 0 1 0
disp=2D-1D
010003
opcode = 00
n i x b p e
0 1 0 0 0 0
disp
= 003
Copyright All Rights Reserved
by Yuan-Hao
Chang
Literal Pool
All of the literal operands used in a program are gathered together into
one or more literal pools.
Normally, literals are placed into a pool at the end of the program.
The assembly listing of a program containing literals usually includes a
listing of this literal pool.
LTORG allows to place literals into a pool at some other location in the
object program.
When the assembler encounters an LTORG statement, it creates a literal
pool that contains all of the literal operands used since the previous LTORG.
This literal pool is placed in the object program at the location where the
LTORG directive was encountered.
E.g., If we had not used the LTORG statement on line 93, the literal
=EOF would be placed in the pool at the end of the program.
This literal pool would begin at address 1073. It is too far away from the
instruction referencing it to allow PC relative addressing.
The problem is the large amount of storage reserved for BUFFER.
Copyright All Rights Reserved by Yuan-Hao Chang
47
Duplicate Literals
A duplicate literal is a literal used in more than one place in
the program.
E.g., the literal =X05
0003
0020
LDB
LDA
=* (* equals to 3)
=* (* equals to 20)
48
49
Pass 2
- 1. The operand address for use in generating object code is obtained by
searching LITTAB for each literal operand encountered.
- 2. The data values specified by the literal in each literal pool are inserted at the
appropriate places in the object program.
- 3. If a literal value represents an address in the program (e.g., location counter
value), the assembler must also generate the appropriate Modification record.
Copyright All Rights Reserved by Yuan-Hao Chang
50
Symbol-Defining Statements
Symbol definition:
Label: The value of labels on instructions or data areas
is the address assigned to the statement on which it
appears.
EQU: The assembler directive EQU allows the
programmer to define symbols and specify their value.
(Similar to #define or MACRO in C language)
ORG: The assembler directive ORG indirectly assign
values to symbols.
51
EQU
Symbol EQU value
This statement defines the given symbol (i.e., enters it into SYMTAB)
and assigns to it the value specified.
The value may be given as a constant or as any expression.
One common use of EQU is to establish symbolic names that can be
used for improved readability in place of numeric value.
+LDT
#MAXLEN
+LDT #4096
MAXLEN
EQU 4096
Another common user of EQU is defining mnemonic names for registers.
A
EQU 0
B
EQU 1
L
EQU 2
Copyright All Rights Reserved by Yuan-Hao Chang
52
ORG
ORG value
When this statement is encountered during assembly of a program, the
assembler resets its location counter (LOCCTR) to the specified value.
- Since values of symbols used as labels are taken from LOCCTR, the ORG
statement will affect the values of all labels defined until the next ORG.
RESB
ORG
RESB
RESW
RESB
ORG
1100
STAB
6
1
2
STAB+1100
6 bytes
VALUE
FLAGS
3 bytes
2 bytes
53
RESW
EQU
1
ALPHA
BETA
EQU
ALPHA
ALPHA RESW 1
Allowed
Not allowed
ORG
RESB
RESB
RESB
ORG
RESB
ALPHA
1
1
1
1
Not allowed
Copyright All Rights Reserved by Yuan-Hao Chang
54
Expressions
Most assemblers allow the use of expressions whenever
such a single operand is permitted.
Each expression must be evaluated by the assembler to produce a
single operand address or value.
Arithmetic expressions usually formed by using the operators +, -, *,
and /.
The most common special term is the current value of the location
counter (often designated by *).
55
Expression (Cont.)
Expressions are classified as either absolute expressions
or relative expressions depending on the type of value
they produce.
An expression contains only absolute terms is an absolute
expression.
Absolute expressions may also contain relative terms:
- Relative terms occur in pairs. Bother of the terms are positive.
- A relative term or expression represents some value that may be written
as (S + r), where
S is the starting address of the program and
r is the value of the term or expression relative to the starting address.
56
Expression (Cont.)
107
1000
MAXLEN
EQU
BUFFEND-BUFFER
Symbol
Type
Value
RETADR
0030
BUFFER
0036
BUFEND
1036
MAXLEN
1000
57
58
59
60
61
62
Block name
(default)
CDATA
CBLKS
Block number
0
1
2
Address
0000
0066
0071
Length
0066
000B
1000
For code generation during Pass 2, the assembler needs to revise the
address for each symbol relative to the start of the object program (not
the start of an individual program block).
The assembler simply adds the location of the symbol, relative to the start of
its block.
Copyright All Rights Reserved by Yuan-Hao Chang
63
Loc/Block
Label
Instr
Operand
Object code
64
Loc/Block
Label
Instr
Operand
Object code
65
Loc/Block
Label
Instr
Operand
Object code
66
opcode = 00
n i x b p e
0 1 0 0 1 0
disp = 069-009
The value of the symbol MAXLEN (line 107) is shown without a block
number because MAXLEN is an absolute symbol (not relative to the
DEFAULT
start of any program block).
20
100
0006
0003
0
1
LENGTH
LDA
LENGTH
RESW 1
032060
SYMTAB shows the value of the operand (the symbol LENGTH) as relative
CDATA
location 0003 within program block 1 (CDATA).
The desired target address for this instruction is 0003 + 0066 = 0069.
The separation of the program into blocks has considerably reduced the
addressing problem.
The large buffer area is moved to the end of the object program, so that
- 1. Extended format instructions are no longer needed. (Lines 15, 35 ,65)
- 2. The base register is no longer necessary. (Lines 13 and 14 are deleted: LDB
and BASE instructions).
- 3. Literal placement and references are simplified. (We include a LTORG in
CDATA block to ensure literals are placed ahead of large data areas.)
67
CDATA = 66+06=6C
CDATA = 66+07=6D
Copyright All Rights Reserved by Yuan-Hao Chang
68
69
70
71
72
73
74
75
Control Sections
Control sections are handled separately by the assembler.
Symbols that are defined in one control section may not be used
directly by another control section. They must be identified as
external reference for the loader to handle.
76
77
78
79
15
0003
CLOOP
80
+JSUB RDREC
4B100000
160
0017
+STCH BUFFER, X
57900000
The operand (RDREC) is named in the EXTREF statement for the
control section, so this is an external reference.
opcode = 48
The assembler has no idea where the control section containing
RDREC will be loaded, so it cant assemble the address for this
instruction.
- Relative addressing is not possible during assembling.
- An extended format instruction must be used to
provide room for the actual address to be inserted.
190
107
0028
1000
MAXLEN
MAXLEN
External
reference.
WORD BUFEND-BUFFER
EQU BUFEND-BUFFER
000000
81
82
83
0003
CLOOP +JSUB
RDREC 4B100000
Byte 4 5 6 7
84
85
0028
MAXLEN WORD
BUFEND-BUFFER 000000
Restrictions
Both terms in each pair must be relative within the same
control section.
If two terms represent relative locations in the same control section,
their difference is an absolute value.
If two terms are in different control sections, their difference
has a value that is unpredictable.
For Example:
BUFEND-BUFFER
RDREC-COPY
86
One-Pass Assemblers
The main problem in trying to assemble a program in one
pass involves forward references.
The assembler does not know what address to insert in the
translated instruction.
It is too severe to require all such areas defined before their uses.
Forward reference to labels on instruction cannot be eliminated
easily due to the forward jump.
88
Load-and-Go Assembler
No program is written out, and no loader is needed.
It is useful in a system that is oriented toward program development and
testing.
E.g., University computing system for students use.
Because programs are re-assembled nearly every time they are run,
efficiency of the assembly process is an important consideration.
The handling of forward references becomes less difficult due to the inmemory process.
An instruction operand is a symbol that has not yet been defined, the
operand address is omitted when it is assembled.
- The address of the operand field of the instruction that refers to the undefined
symbol is added to a forward reference list associated with the symbol table
entry.
- When the definition for a symbol is encountered, the forward reference list for that
symbol is scanned, and the proper address is inserted into any instructions
previously generated.
89
90
91
2013
201C
92
The address of
the operand field
(2013) was
inserted a list.
201F
2030
93
One-Pass Assembler
One-pass assembler usually adopts absolute program.
It produces object programs as output and is usually adopted on systems
where external working-storage devices are not available or slow.
94
WRREC
Lines 10-40
RDREC
To replace the address at 2013
95
Multi-Pass Assemblers
Constraints in two-pass assemblers:
Any symbol used in the right-hand side (i.e., in the expression giving the
value of the new symbol) of assembly directives (e.g., EQU and ORG) be
defined previously in the source program.
E.g., ALPHA EQU
BETA
BETA EQU
DELTA
DELTA RESW 1
- BETA cannot be assigned a value in the first pass because DELTA has not yet
been defined.
96
Expression
HALFSZ depends on MAXLEN
Symbols depend
on MAXLEN.
Copyright All Rights Reserved by Yuan-Hao Chang
97
Two
unsolved
symbols
Expression
98
99
Evaluated
Updated to &1
Address of BUFFER
100
101
Implementation Examples
Selected Examples
We focus on some of the most interesting or
unusual features of each assembler.
We are also particularly interested in areas where
the assembler design differs from the basic
algorithm and data structures.
Selected examples:
Pentium(x86)
SPARC
PowerPC
Copyright All Rights Reserved by Yuan-Hao Chang
103
MASM Assembler
MASM assembler is for Pentium and other x86 systems.
Programmer of an x86 system views memory as a
collection segments.
An MASM assembler language program is written as a collection of
segments.
Each segment is defined as belonging to a particular class
(corresponding to its contents).
- Commonly used classes are CODE, DATA, CONST and STACK.
104
105
AX, DATASEG2
ES, AX
106
TARGET
- If the label TARGET occurs in the program before the JMP instruction, the
assembler can tell whether this is a near jump or a far jump.
- If this is a forward reference to TARGET, the assembler does not know how many
bytes to reserve for the instruction.
107
108
The object program from MASM may be in several different formats to allow
easy and efficient execution of the program in a variety of the operation
environment.
MASM can also produce an instruction timing listing that shows the number
of clock cycles required to execute each machine instruction. (This allows the
programmer to exercise a great deal of control in optimizing timing-critical
sections of code).
Copyright All Rights Reserved by Yuan-Hao Chang
109
Executable instructions
Initialized read/write data
Read-only data
Uninitialized data areas
Copyright All Rights Reserved by Yuan-Hao Chang
110
SPARC Assembler
The assembler maintains a separate location counter for each named
section.
Each time the assembler switches to a different section, it also switch to
the location counter associated with that section.
Sections are similar to the program blocks in SIC.
References between different sections are resolved by the linker, not by
the assembler.
111
112
%L0, 10
LOOP
%L2, %L3, %L4
regardless of whether or
not the conditional
branch is taken.
Copyright All Rights Reserved by Yuan-Hao Chang
113
LOOP: .
.
CMP
BLE
ADD
%L0, 10
LOOP
%L2, %L3, %L4
114
%L0, 10
LOOP
LOOP: .
.
CMP %L0, 10
BLE, A LOOP
Copyright
All Rights Reserved
Yuan-Hao%L4
Chang
ADD
%L2,by%L3,
115
116
AIX Assembler
AIX assembler is for PowerPC and other similar
systems designed by IBM.
The AIX includes support for various models of
PowerPC microprocessor.
The programmer can declare which architecture is
being used with the assembler directive .MACHINE.
When the object program is generated, the assembler
includes a flag that indicates which processors are
capable of running the program.
Copyright All Rights Reserved by Yuan-Hao Chang
117
118
LENGTH, 1
BUFFER, 4
119
120
121
L
2, 8(4)
This form of addressing may be useful when some register is known
to contain the starting address of a table or data record, and the
programmer wishes to refer to a fixed location within that table or
record.
This assembler simple inserts the specified values into the object
code instruction: base register GPR4 and displacement 8 (in this
example)
This base register is no involved in this way.
The register used in this way need not have appeared in a .USING
statement.
Copyright All Rights Reserved by Yuan-Hao Chang
122
123
124
Second pass
Read the source program again, instead of using an
intermediate file.
- It means that any warning messages that were generated during
Pass 1are lost.
125
126