Sie sind auf Seite 1von 126

Chapter 2

Assemblers

November 26, 2010

Outline
Basic Assembler Functions
Machine-Dependent Assembler Features
Machine-Independent Assembler Features
Assembler Design Options
Implementation Examples

Copyright All Rights Reserved by Yuan-Hao Chang

Basic Assembler
Functions

November 26, 2010

Assemblers
The fundamental functions that assemblers must
perform:
Translate mnemonic operation codes to their machine
language equivalents.
Assign machine addresses to symbolic labels used by
the programmer.

The design of assemblers depends on the machine


language because of the existence of different
machine instruction formats and codes.

Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Assembler Directives
In addition to the mnemonic machine instructions, SIC and
SIC/XE include the following assembler directives:
Directives
START
END
BYTE
WORD
RESB
RESW

Description
Specify name and starting address for the program.
Indicate the end of the source program and (optionally) specify
the first executable instruction in the program.
Generate character or hexadecimal constant, occupying as
many bytes as needed to represent the constant.
Generate one-word integer constant.
Reserve the indicated number of bytes for a data area.
Reserve the indicated number of words for a data area.

Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Instruction
An assembly instruction is a statement that is executed at run time. It
usually consists of four parts:

Label (optional)
Instruction (required)
Operands (instruction specific)
Comment (optional)

Instruction

E.g.
ADDLP

Label

ADDR

S, X . ADD 3 to index value

Operands

Comment

The program in the next slide is a routine that reads records from an
input device (device cdoe F1) and copies them to an output device
(code 05). This program will be used throughout the whole chapter.
Subroutine RDREC: read a record into a buffer (buffer size = 4096B).
Subroutine WRREC: write the record from the buffer to the output device.
Copyright All Rights Reserved by Yuan-Hao Chang

Line number

Label

Instruction Operand

Comment

Start of the
program
Read 4096 bytes
in each loop

Read 4096 bytes or


Reach End-Of-Record
from the INPUT
device

(A) : (ZERO)
A (EOF)

L (RETADR)

The number of
bytes stored in
the read buffer
BUFFER

X (ZERO)
A (ZERO)

Read one byte in


each loop
Write data in the
buffer to the OUTPUT
device
Write one byte in
each loop
End of the program

RETADR (L)

JSUB RDREC
L (PC);
PC RDREC
RSUB
PC (L)

November 26, 2010

Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Simple SIC Assembler


The translation of source program to object code requires
the following functions:
1. Convert mnemonic operation codes to their machine language
equivalents.
- E.g., Translate STL to 14 (line 10).

2. Convert symbolic operands to their equivalent machine


addresses. (This needs forward references)
- E.g., Translate RETADR to 1033 (line 10).

STL RETADR 141033

The reference to
RETADR is defined
later in the program

3. Build the machine instruction in the proper format.


4. Convert the data constants specified in the source program into
their internal machine representations.
- E.g., Translate EOF to 454F46 (line 80).

EOF BYTE CEOF 454F46

5. Write the object program and the assembly listing.


Copyright All Rights Reserved by Yuan-Hao Chang

Line number

Machine address Label

Instruction

Operand

Object
November
26,code
2010
Starting address

SIC

No object code
generated for
addresses 10332038.
This storage is
reserved by the
loader for use by
the program
during
execution.

Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Simple SIC Assembler (Cont.)


The assembler must process assembler directives
(pseudo-instructions).
Assembler directives are not translated into machine instructions
although they may have an effect on the object program.
Example:
- BYTE and WORD direct the assembler to generate constants as part of
the object program.
- RESB and RESW instruct the assembler to reserve memory locations
without generating data values.

The assembler must write the generated object code into


some output device.
The generated object program will be loaded into memory for
execution.
Copyright All Rights Reserved by Yuan-Hao Chang

10

November 26, 2010

11

Simple SIC Object Program


The simple SIC object program contains three types of records.
Header record: Contain the program name, starting address, and length.
-

Col. 1
H
Col. 2-7 Program name
Col. 8-13 Starting address of object program (hexadecimal)
Col. 14-19Length of object program in bytes (hexadecimal)

Text record: Contain the translated (i.e., machine code) instructions and
data of the program, together with an indication of the addresses where
these are to be loaded.
-

Col. 1
T
Col. 2-7 Starting address for object code in this record (hexadecimal)
Col. 8-9 Length of object code in this record in bytes (hexadecimal)
Col. 10-69Object code, represented in hexadecimal (2 columns per byte of
object code)

End record: Mark the end of the object program and specify the address in
the program to begin.
- Col. 1
- Col. 2-7

E
Address of first executable instruction in object program (hexadecimal)

To avoid confusion, the term column is used to refer to positions within object program records.
Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Simple SIC Object Program (Cont.)


Start
address of
the record

30 bytes

Programs start address


Length of object program:
2079 1000 + 1 = 107A (Hex)

Point to the start


address of the program
^ is used to separate fields
visually. It is not present in
the actual object program.
Copyright All Rights Reserved by Yuan-Hao Chang

12

November 26, 2010

13

Two Passes of Assemblers


Pass 1 (define symbols)
The first pass does little more than scan the source program for label
definitions and assign addresses.
- 1. Assign addresses to all statements in the program.
- 2. Save values (addresses) assigned to all labels for use in Pass 2.
- 3. Perform some processing of assembler directives.
This includes processing that affects address assignment, such as determining
the length of data areas defined by BYTE, RESW, etc.

Pass 2
The second pass assembles instructions and generates object
program.
- 1. Assemble instructions: Translating operation codes and looking up
addresses.
- 2. Generate data values defined by BYTE, WORD, etc.
- 3. Perform processing of assembler directives not done during Pass 1.
- 4. Write the object program and the assembly listing.
Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Data Structures in Assembling


Two major internal data structures used in assembling:
Operation Code Table (OPTAB)
- Use to look up mnemonic operation codes and translate them to their
machine language equivalents.

Symbol Table (SYMBTAB)


- Use to store values (addressed) assigned to labels.

Location Counter (LOCCTR)


LOCCTR is a variable used to help in the assignment of addresses.
LOCCTR is initialized to the beginning address specified in the
START statement.
After each source statement is processed, the length of the
assembled instruction or data area to be generated is added to
LOCCTR.
- When we reach a label in the source program, the current value of
LOCCTR gives the address to be associated with that label.
Copyright All Rights Reserved by Yuan-Hao Chang

14

November 26, 2010

Operation Code Table (OPTAB)


OPTAB must contain the mnemonic operation code and its machine
language equivalent.
In more complex assemblers, OPTAB might also contain information
about instruction format and length.
OPTAB in two passes:
In Pass 1, OPTAB is used to look up and validate operation codes in the
source program.
In Pass 2, OPTAB is used to translate the operation codes to machine
language.

The SIC/XE machine has instructions of different lengths,


Pass 1 uses OPTAB to find the instruction length for incrementing LOCCTR.
Pass 2 uses OPTAB to tell which instruction format to use in assembling the
instruction.

OPTAB is usually organized as a hash table, with mnemonic operation


code as the key.
OPTAB is usually predefined and is a static table.
The hash table to maintain OPTAB provides fast retrieval with a minimum of
searching.
Copyright All Rights Reserved by Yuan-Hao Chang

15

November 26, 2010

Symbol Table (SYMTAB)


SYMTAB includes the name and value (address) for each
label in the source program, together with flags to indicate
error conditions.
SYMTAB may also contain other information about the data
area or instruction labeled (e.g., type or length).
During Pass 1, labels are entered into SYMTAB as they are
encountered in the source program, along with their assigned
addresses (from LOCCTR).
During Pass 2, symbols used as operands are looked up in
SYMTAB to obtain the addresses to be inserted in the assembled
instructions.

SYMTAB is usually organized as a hash table for efficiency


of insertion and retrieval.
Entries in SYMTAB are rarely deleted, so that efficiency of deletion
is not an important consideration.
A prime table length often gives good hashing efficiency.
Copyright All Rights Reserved by Yuan-Hao Chang

16

November 26, 2010

Communications Between Two Passes


There is certain information that should be
communicated between two passes. For this
reason,
Pass 1 usually writes an intermediate file that contains
each source statement with its assigned address, error
indicators, etc.
- The intermediate file retains the results of certain operations that
may be performed during Pass 1, so as to avoid repeating many
of the table-searching operations.
- E.g.,
The operand field for symbols and addressing flags is scanned.
Pointers into OPTAB and SYMTAB may be retained for each used
operation code and symbol.
Copyright All Rights Reserved by Yuan-Hao Chang

17

November 26, 2010

Algorithm for Assembler


For simplicity, we assume the source lines are written in a
fixed format with fields LABEL, OPCODE, and OPERAND.
We assume that Pass 1 write four fields to each line of the
intermediate file:
LOC , LABEL, OPCODE, and OPERAND.
(LOC is the assigned machine address)

If one of these fields contains a character string that


represents a number, we denote its numeric value with the
prefix #. E.g., #[OPERAND]
The simplified algorithm for Pass 1 and Pass 2 of
assembler is listed in the following four slides.
Copyright All Rights Reserved by Yuan-Hao Chang

18

November 26, 2010

Algorithm for Pass 1


Handle the first line to
initialize the location counter

comment

Handle the
LABEL field of
the current input
line
The instruction length of
SIC machine is 3 bytes.
Copyright All Rights Reserved by Yuan-Hao Chang

19

November 26, 2010

20

Algorithm for Pass 1 (Cont.)


Declare a WORD constant
Reserve # of WORDs
Handle the
OPCODE field of
the current input
line

Reserve # of bytes
Declare a BYTE constant

Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

21

Algorithm for Pass 2


Output the current line to the listing file that is for the
debugging purpose.
Each line of the listing file includes:
Line, Loc, LABEL, OPCODE, OPERAND , and Object code

The current line is


an instruction.

Use the operand


address to replace
the symbol in the
OPERAND field

Handle the
OPERAND of
an instruction.

Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Algorithm for Pass 2 (Cont.)


Handle the OPERAND
of an instruction.

There is no symbol
in the OPERAND
field
The current line is to
declare a BYTE or
WORD constant.

Current Text record


is full so finalize it
and initialize a new
Text record.

Copyright All Rights Reserved by Yuan-Hao Chang

22

Machine-Dependent
Assembler Features

November 26, 2010

Machine-Dependent Assembler Features


Many real machines have certain architectural features that
are similar to those we consider in SIC/XE machine.
Indirect addressing is indicated by the prefix @ to the operand.
Immediate operands are denoted with the prefix #.
The assembler directive BASE is used in conjunction with base
relative addressing.
The extended instruction format (Format 4) is specified with the
prefix + added to the operation code in the source statement.

The main advantages of SIC/XE, compared to SIC:


Involve the use of register-to-register instructions. E.g.,
- COMPR A, S
- TIXR T

Immediate and indirect addressing is supported.

Programmers need to specify the used addressing mode.


Copyright All Rights Reserved by Yuan-Hao Chang

24

Line number

Label

Instruction

Operand

ObjectNovember
code 26, 2010

SIC/XE

Copyright All Rights Reserved by Yuan-Hao Chang

25

November 26, 2010

Advantage of SIC/XE Architecture


Register-to-register instructions are faster than
the corresponding register-to-memory operations.
Register-to-register instructions are shorter and do not
require another memory reference.
E.g., Changing COMP to COMPR results in an
improvement in execution speed.

When using immediate addressing, the operand


is already present in the instruction and need not
be fetched from anywhere.
When using indirect addressing, it often avoids
the need for another instruction.
Copyright All Rights Reserved by Yuan-Hao Chang

26

Line number

Machine address Label Instruction

Operand

Object
November
26,code
2010
Relocatable program

Copyright All Rights Reserved by Yuan-Hao Chang

27

November 26, 2010

28

Instruction Formats and Addressing


Modes
The START statement specifies a beginning program
address of 0.
This indicates a relocatable program. Program will be translated as if
it were really to be loaded at machine address 0.

Register-to-register instructions such as CLEAR (line 125)


and COMPR (line 150) only need to
Convert the mnemonic operation code (using OPTAB).
Change each register mnemonic to its numeric equivalent. (Pass 2)
- It often preloads register names and their values to SYMTAB.

Register-to-memory instructions are assembled with either


program-counter relative or base relative addressing.
The assembler must calculate a displacement to be assembled as
part of the object instruction.
The displacement must be small enough to fit in the 12-bit field in
the instruction.
Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

29

Instruction Formats and Addressing


Modes (Cont.)
If the displacements are too large, the 4-byte extended
instruction format (Format 4) must be used to contain the
full memory address.
15

0006

CLOOP

+JSUB RDREC

Translation precedence of our assembler:


If the extended format is not specified,

4B101036
opcode = 48
n i x b p e
1 1 0 0 0 1
addr = 01036

- First the program-counter relative addressing is adopted.


- Then the base relative addressing is adopted if the required
displacement is out of range of program-counter relative addressing.

Displacement of program-counter relative addressing is between


-2048 and 2047.
Displacement of program-counter relative addressing is between
0 and 4095.
Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

30

Program-Counter (PC) Relative Addressing


The computation that the assembler needs to perform is the target
address calculation in reverse.
Operand
Object code
PC is advanced before instruction is executed.
10

0000

FIRST

STL

95

0030

RETADR

RESW 1

Line

Loc

Label

Instruction

RETADR

17202D

opcode n i x b p e
disp
14
1 1 0 0 1 0 02D=30-3

disp = 0006 (PC) = 0006 001A = -14 = FEC (2s complement in 12 bits)

15

0006

40

0017

CLOOP

+JSUB RDREC

4B101036

3F2FEC

CLOOP

opcode = 3C, disp = FEC

n i x b p e
1 1 0 0 1 0
Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

31

Base Relative Addressing


The base relative addressing needs to reference the base register. But the
programmer must tell the assembler what the base register will contain.
The statement BASE LENGTH (line 13) informs the
assembler that the base register will contain the
address of LENGTH.
The preceding instruction (LDB #LENGTH) loads this value into the register
during program execution.
The assembler assumes for addressing purposes that register B contains this
register until it encounters another BASE statement.
The programmer must use another assembler directive (perhaps NOBASE) to
inform the assembler that the contents of the base register can no longer be
relied upon for addressing.
The programmer must provide instructions that load the proper value into the
base register during execution.
BASE and NOBASE are assembler directives and produce no executable code.
Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Base Relative Addressing (Cont.)


Register B would contain 0033.
12
13

0003

LDB
#LENGTH
BASE LENGTH

100
105

0033
0036

LENGTH
BUFFER

RESW 1
RESB 4096

160

104E

FIRST

STCH BUFFER, X
opcode = 10
n i x b p e
1 1 0 1 0 0
disp = 000
(33-33 = 00)

Base relative addressing is adopted


because the displacement of PC
relative addressing is out of range.
PC = 1052, BUFFER = 0036 (larger
than 7FF)

175

1056

EXIT

STX

opcode = 68
n i x b p e
0 1 0 0 1 0
disp = 02D
(33-06 = 2D)

69202D

57C003
opcode = 54
n i x b p e
1 1 1 1 0 0
disp = 003
(36-33 = 03)

LENGTH
Copyright All Rights134000
Reserved by Yuan-Hao Chang

32

November 26, 2010

PC Relative vs. Base Relative Addressing


12
13

0003

LDB
#LENGTH
BASE LENGTH

Suppose we choose PC
relative addressing before
choosing base relative
addressing.

PC
relative

69202D

opcode = 00, disp = 026 (33-0D = 26)


n i x b p e
1 1 0 0 1 0

20

000A

LDA

100

0033

LENGTH

RESW 1

175

1056

EXIT

STX

If we use PC relative addressing, the


displacement would be too large to fit in the
12-bit disp field.
disp = 0x1059 0x33 = 0x1026 > 0x7FF (=2047)

LENGTH
LENGTH
Base
relative
reg. B = 33

032026
134000
opcode = 10
n i x b p e
1 1 0 1 0 0
disp = 000
(33-33 = 00)

Copyright All Rights Reserved by Yuan-Hao Chang

33

November 26, 2010

Immediate Addressing
The assembly of an instruction that specifies immediate addressing is
simpler because no memory reference is involved.
It is to convert the immediate operand to its internal representation and
insert it into the instruction.
Immediate addressing
PC relative addressing is not combined because 3 is a value.

55

0020

LDA

133

103C

+LDT

Format 4
The operand (4096) is too large to fit in
the 12-bit displacement field.
The value of a symbol is
the address assigned to it.
Therefore, #LENGTH = 33

12

0003

LDB

opcode = 00
n i x b p e
#3
010003
0 1 0 0 0 0
#4096 75101000 disp = 003
opcode = 68
n i x b p e
0 1 0 0 1 0
disp = 02D
(33-06 = 2D)

#LENGTH

opcode = 74
n i x b p e
0 1 0 0 0 1
addr = 01000

69202D

Copyright All Rights Reserved by Yuan-Hao Chang

34

November 26, 2010

Indirect Addressing

Jump to the place whose address is stored in the


variable RETADR.

70

002A

95

0030

J
RETADR

@RETADR

RESW 1

Indirect
addressing

3E2003
opcode = 3C
n i x b p e
1 0 0 0 1 0
disp = 003
(030-02D=003)

Copyright All Rights Reserved by Yuan-Hao Chang

35

November 26, 2010

Program Relocation
We usually do not know exactly when jobs will be submitted,
exactly how long they will run.
In such a situation, the actual starting address of the program is not
known until load time.
SIC 55
101B
LDA
THREE
00102D
The operand only refers to
address 102D so that the
program cant be relocated to
other addresses.

opcode = 00
x = 0
addr = 102D

The assembler does not know the actual location where the program
will be loaded, but it can identify for the loader these parts of the
object program that need modification.
An object program that contains the information necessary to perform
this kind of modification is called a relocatable program.
Copyright All Rights Reserved by Yuan-Hao Chang

36

November 26, 2010

Example of Program Relocation


1036

1036
No matter where the
program is loaded,
RDREC is always
1036 bytes past the
starting address of the
program.

1036

Copyright All Rights Reserved by Yuan-Hao Chang

37

November 26, 2010

Program Relocation
Some values in the object program depend on the starting address of
the program.
In order to support program relocation, the values depending on the starting
address of the program need to be relocated when program is loaded.

Solve the relocation problem as follows:


When the assembler generates the object code for the +JSUB instruction,
it inserts the address of RDREC relative to the start of the program.
- This is the reason we initialized the location counter to 0 for the assembly.

The assembler produces a command for the loader to add the beginning
address of the program to the address field in the +JSUB instruction at
load time.
The command for the loader must be a part of the object program.

Copyright All Rights Reserved by Yuan-Hao Chang

38

November 26, 2010

Program Relocation (Cont.)


Modification record
Col. 1
M
Col. 2-7
Starting location of the address field to be modified,
relative to the beginning of the program (hexadecimal).
Col. 8-9
Length of the address field to be modified, in halfbytes (hexadecimal).
The length is stored in half-bytes because the address field to be
modified may not occupy an integral number of bytes.
E.g., the address field in the +JSUB instruction occupies 5 half-bytes.

If length of the to-be-modified field is an odd number of half-bytes, it is


assumed to begin in the middle of the first byte at the starting location.
E.g., for the instruction +JSUB RDREC (4B101036),
(4B101036) its corresponding
Modification record would be M 000007 05
^

15

0006

CLOOP

+JSUB

RDREC

4B101036

Other two instructions need to be modified as well.


65 0026
35 0013

+JSUB
+JSUB

WRREC 4B10105D
WRREC 4B10105D
Copyright All Rights Reserved by Yuan-Hao Chang

39

November 26, 2010

Program Relocation (Cont.)


PC-relative or base relative addressing needs no
modification because operands are relative to PC or base
registers, instead of the absolute address.
The only parts of the program that require modification at
load time are those that specify direct addresses.
In SIC/XE program,
the only such direct
addresses are found
in extended format
(4-byte) instructions.
For the three +JSUB
instruction in the
SIC/XE program.
Copyright All Rights Reserved by Yuan-Hao Chang

40

Machine-Independent
Assembler Features

November 26, 2010

Sample Program without Object Code

Copyright All Rights Reserved by Yuan-Hao Chang

42

November 26, 2010

Sample Program without Object Code


(Cont.)

Copyright All Rights Reserved by Yuan-Hao Chang

43

November 26, 2010

Sample Program with Object Code

235
240
245
255

106E
1070
1073
1076

TIXR
JLT
RSUB
END
*
=X05
Copyright
All
Rights

T
WLOOP

B850
3B2FEF
4F0000

FIRST
05 Chang
Reserved by Yuan-Hao

44

November 26, 2010

45

Literals
Literals are constants written in the operand of instructions.
This avoids having to define the constant elsewhere in the program
and make up a label for it.
In SIC/XE program, a literal is identified with the prefix =, followed
by a specification of the literal value, using the same notation.
45

001A

ENDFIL

LDA

002D

=CEOF

=CEOF 032010
454F46

literal
215

1062

WLOOP

TD

1076

=X05

=X05

E32011
05

opcode = 00
n i x b p e
1 1 0 0 1 0
disp = 010
(02D-01D=010)
opcode = 0E
n i x b p e
1 1 0 0 1 0
disp = 011
(1076-1065=011)

The notation used for literals varies from assembler from


assembler.
Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

46

Literal vs. Immediate Operand


With a literal, the assembler generates the specified value
as a constant at some other memory location.
The address of this generated constant is used as the target
address for the machine instruction.
45

001A

ENDFIL

LDA

=CEOF

002D

=CEOF

032010

454F46

With immediate addressing, the operand value is


assembled as part of the machine instruction.
55

0020

LDA

#3

opcode = 00
n i x b p e
1 1 0 0 1 0
disp=2D-1D

010003
opcode = 00
n i x b p e
0 1 0 0 0 0
disp
= 003
Copyright All Rights Reserved
by Yuan-Hao
Chang

November 26, 2010

Literal Pool
All of the literal operands used in a program are gathered together into
one or more literal pools.
Normally, literals are placed into a pool at the end of the program.
The assembly listing of a program containing literals usually includes a
listing of this literal pool.

LTORG allows to place literals into a pool at some other location in the
object program.
When the assembler encounters an LTORG statement, it creates a literal
pool that contains all of the literal operands used since the previous LTORG.
This literal pool is placed in the object program at the location where the
LTORG directive was encountered.

E.g., If we had not used the LTORG statement on line 93, the literal
=EOF would be placed in the pool at the end of the program.
This literal pool would begin at address 1073. It is too far away from the
instruction referencing it to allow PC relative addressing.
The problem is the large amount of storage reserved for BUFFER.
Copyright All Rights Reserved by Yuan-Hao Chang

47

November 26, 2010

Duplicate Literals
A duplicate literal is a literal used in more than one place in
the program.
E.g., the literal =X05

The easiest way to recognize duplicate literals is by


comparison of the character strings defining them.
If we use the character string defining a literal to recognize
duplicates, we must be careful of literals whose value
depends on their location in the program.
E.g., We allow literals that refer to the current value of the location
counter (denoted by the symbol *).
13
55

0003
0020

LDB
LDA

=* (* equals to 3)
=* (* equals to 20)

The same literal name *


has different values.
Copyright All Rights Reserved by Yuan-Hao Chang

48

November 26, 2010

Literal Table (LITTAB)


For each used literal, LITTAB contains
The literal name
The operand value and length
The address assigned to the operand when it is placed in
a literal pool.

LITTAB is often organized as a hash table, using


the literal name or value as the key.

Copyright All Rights Reserved by Yuan-Hao Chang

49

November 26, 2010

Literal Table (LITTAB) (Cont.)


The creation of LITTAB
Pass 1
- 1. The assembler searches LITTAB for the specified literal name (or value)
because each literal operand is recognized during Pass 1.
- 2. If the literal is already present in the table, no action is needed.
- 3. If the literal is not present, the literal is added to LITTAB (leaving the address
unassigned).
- 4. When Pass 1 encounters an LTORG statement or the end of the program,
the assembler makes a scan of the literal table.
Each literal currently in the table is assigned an address (unless such an address has
already been filled in).

- 5. As these addresses are assigned, the location counter is updated to reflect


the number of bytes occupied by each literal.

Pass 2
- 1. The operand address for use in generating object code is obtained by
searching LITTAB for each literal operand encountered.
- 2. The data values specified by the literal in each literal pool are inserted at the
appropriate places in the object program.
- 3. If a literal value represents an address in the program (e.g., location counter
value), the assembler must also generate the appropriate Modification record.
Copyright All Rights Reserved by Yuan-Hao Chang

50

November 26, 2010

Symbol-Defining Statements
Symbol definition:
Label: The value of labels on instructions or data areas
is the address assigned to the statement on which it
appears.
EQU: The assembler directive EQU allows the
programmer to define symbols and specify their value.
(Similar to #define or MACRO in C language)
ORG: The assembler directive ORG indirectly assign
values to symbols.

Copyright All Rights Reserved by Yuan-Hao Chang

51

November 26, 2010

EQU
Symbol EQU value
This statement defines the given symbol (i.e., enters it into SYMTAB)
and assigns to it the value specified.
The value may be given as a constant or as any expression.
One common use of EQU is to establish symbolic names that can be
used for improved readability in place of numeric value.
+LDT
#MAXLEN
+LDT #4096
MAXLEN
EQU 4096
Another common user of EQU is defining mnemonic names for registers.
A
EQU 0
B
EQU 1
L
EQU 2
Copyright All Rights Reserved by Yuan-Hao Chang

52

November 26, 2010

ORG
ORG value
When this statement is encountered during assembly of a program, the
assembler resets its location counter (LOCCTR) to the specified value.
- Since values of symbols used as labels are taken from LOCCTR, the ORG
statement will affect the values of all labels defined until the next ORG.

Sometimes ORG can be useful in label definition:


LDA VALUE, X
(fetch VALUE field from the table entry indicated by register X)
STAB
RESB
1100
SYMBOL
SYMBOL
EQU
STAB
STAB
VALUE
EQU
STAB+6 (100 entries)
FLAGS
EQU
STAB+9
STAB
SYMBOL
VALUE
FLAGS

RESB
ORG
RESB
RESW
RESB
ORG

1100
STAB
6
1
2
STAB+1100

6 bytes

VALUE

FLAGS

3 bytes

2 bytes

Copyright All Rights Reserved by Yuan-Hao Chang

53

November 26, 2010

Restrictions of EQU and ORG


All symbols used on the right-hand side of the EQU
statement must have been defined previously in the program.
ALPHA
BETA

RESW
EQU

1
ALPHA

BETA

EQU

ALPHA

ALPHA RESW 1

Allowed
Not allowed

ORG requires that all symbols used to specify the new


location counter value must have been previously defined.
BYTE1
BYTE2
BYTE3
ALPHA

ORG
RESB
RESB
RESB
ORG
RESB

ALPHA
1
1
1
1

Not allowed
Copyright All Rights Reserved by Yuan-Hao Chang

54

November 26, 2010

Expressions
Most assemblers allow the use of expressions whenever
such a single operand is permitted.
Each expression must be evaluated by the assembler to produce a
single operand address or value.
Arithmetic expressions usually formed by using the operators +, -, *,
and /.
The most common special term is the current value of the location
counter (often designated by *).

The values of terms and expressions are either relative or


absolute.
A constant is an absolute term.
Labels on instructions and data areas, and references to the location
counter value are relative terms.
A symbol whose value is given by EQU (or some similar assembler
directive) may be either an absolute term or a relative term
depending on the expression used to define its value.
Copyright All Rights Reserved by Yuan-Hao Chang

55

November 26, 2010

Expression (Cont.)
Expressions are classified as either absolute expressions
or relative expressions depending on the type of value
they produce.
An expression contains only absolute terms is an absolute
expression.
Absolute expressions may also contain relative terms:
- Relative terms occur in pairs. Bother of the terms are positive.
- A relative term or expression represents some value that may be written
as (S + r), where
S is the starting address of the program and
r is the value of the term or expression relative to the starting address.

- A relative term usually represents some location within the program.

Copyright All Rights Reserved by Yuan-Hao Chang

56

November 26, 2010

Expression (Cont.)
107

1000

MAXLEN

EQU

Value associated with the


symbol

BUFFEND-BUFFER

Both BUFFEND and BUFFER are relative terms, each representing an


address within the program, such that the expression represents an
absolute value:
- The difference between the addresses is the length of the buffer area in bytes.

BUFEND+BUFEND, 100-BUFFER, or 3*BUFFER are considered errors.

To determine the type of an expression, we must keep track of the types


of all symbols defined in the program.
For this purpose, we need a flag in the
symbol table to indicate type of value
(absolute or relative) in addition to the
value itself.
With the Type information, the assembler
can easily determine the type of each
expression to generate Modification records
in the object program for relative values.

Symbol

Type

Value

RETADR

0030

BUFFER

0036

BUFEND

1036

MAXLEN

1000

Copyright All Rights Reserved by Yuan-Hao Chang

57

November 26, 2010

Program Blocks and Control Sections


Many assemblers provide features that allow more
flexible handling of the source and object program.
Some allows the generated machine instructions and
data to appear in different order from the corresponding
source statements. (Program Blocks)
Some results in the creation of several independent
parts of the object program. (Control Sections)

The program parts maintain their identity and are


handled separately by the loader.
Program blocks: Segments of code that are rearranged
within a single object program unit.
Control sections: Segments that are translated into
independent object program units. Copyright All Rights Reserved by Yuan-Hao Chang

58

November 26, 2010

Example of Multiple Program Blocks

Copyright All Rights Reserved by Yuan-Hao Chang

59

November 26, 2010

Example of Multiple Program Blocks


(Cont.)

Copyright All Rights Reserved by Yuan-Hao Chang

60

November 26, 2010

Program Block Example


In this example, three program blocks are used.
First (unnamed) program block:
- The executable instructions of the program.

The second (named CDATA) program block:


- All data areas that are a few words or less in length.

The third (named CBLKS) program block:


- All data areas that consist of larger blocks of memory.

The assembler directive USE indicates which portions of the source


program belong to the various block.
At the beginning of the program, statements are assumed to be part of the
unnamed (default block).
If no USE statements are included, the entire program belongs to this single
block.
- Line 123, 208: resume default block.
- Line 92, 183, 252: CDATA block.
- Line 103: CBLKS block.
Copyright All Rights Reserved by Yuan-Hao Chang

61

November 26, 2010

Program Block Example (Cont.)


Each program block may actually contain several separate
segments of the source program.
The assembler will logically rearrange these segments to
gather together the pieces of each block.
These blocks are assigned its addresses in the object
program, with the blocks appearing in the same order in
which they were first begun in the source program.
The result is the same as if the programmer had physically
rearranged the source statements to group together all the
source lines belonging to each block.
Copyright All Rights Reserved by Yuan-Hao Chang

62

November 26, 2010

Program Block Accomplishment


The assembler accomplishes the program blocks by maintaining a
separate location for each program block. (In Pass 1)
Each label in the program is assigned an address that is relative to the start
of the block that contains it.
When labels enter the symbol table, the block name or number is stored
along with the assigned relative address.
Created at the end of Pass 1

Block name
(default)
CDATA
CBLKS

Block number
0
1
2

Address
0000
0066
0071

Length
0066
000B
1000

For code generation during Pass 2, the assembler needs to revise the
address for each symbol relative to the start of the object program (not
the start of an individual program block).
The assembler simply adds the location of the symbol, relative to the start of
its block.
Copyright All Rights Reserved by Yuan-Hao Chang

63

November 26, 2010

Example of Multiple Program Blocks with


Object Code
Line

Loc/Block

Label

Instr

Operand

Object code

Copyright All Rights Reserved by Yuan-Hao Chang

64

November 26, 2010

Example of Multiple Program Blocks with


Object Code (Cont.)
Line

Loc/Block

Label

Instr

Operand

Object code

Copyright All Rights Reserved by Yuan-Hao Chang

65

November 26, 2010

Example of Multiple Program Blocks with


Object Code (Cont.)
Line

Loc/Block

Label

Instr

Operand

Object code

Copyright All Rights Reserved by Yuan-Hao Chang

66

November 26, 2010

Multiple Program Blocks

opcode = 00
n i x b p e
0 1 0 0 1 0
disp = 069-009

The value of the symbol MAXLEN (line 107) is shown without a block
number because MAXLEN is an absolute symbol (not relative to the
DEFAULT
start of any program block).
20
100

0006
0003

0
1

LENGTH

LDA
LENGTH
RESW 1

032060

SYMTAB shows the value of the operand (the symbol LENGTH) as relative
CDATA
location 0003 within program block 1 (CDATA).
The desired target address for this instruction is 0003 + 0066 = 0069.

The separation of the program into blocks has considerably reduced the
addressing problem.
The large buffer area is moved to the end of the object program, so that
- 1. Extended format instructions are no longer needed. (Lines 15, 35 ,65)
- 2. The base register is no longer necessary. (Lines 13 and 14 are deleted: LDB
and BASE instructions).
- 3. Literal placement and references are simplified. (We include a LTORG in
CDATA block to ensure literals are placed ahead of large data areas.)

Copyright All Rights Reserved by Yuan-Hao Chang

67

November 26, 2010

Multiple Program Blocks (Cont.)


The use of program blocks could satisfy both machine and human
factors.
Machine considerations suggest that the parts of the object program appear
in memory in a particular order.
Human factors suggest that the source program should be in a different
order.

The assembler can simply write the object code as it is generated


during Pass 2 and insert the proper load address in each Text record.
Lines 5-70
Lines 125-180
Line 185
Lines 210-245
Line 253

CDATA = 66+06=6C
CDATA = 66+07=6D
Copyright All Rights Reserved by Yuan-Hao Chang

68

November 26, 2010

Multiple Program Blocks (Cont.)


It does not matter that the Text records of the
object program are not in sequence by address.
The loader will simply load the object code from each
record at the indicated address.
When this loading is completed,
- The generated code from the default block will occupy relative
locations 0000 through 0065;
- The generated code and reserved storage for CDATA will occupy
locations 0066 through 0070.
- The storage reserved for CBLKS will occupy locations 0071
through 1070.

Copyright All Rights Reserved by Yuan-Hao Chang

69

November 26, 2010

Multiple Program Blocks and Loading


Process

Copyright All Rights Reserved by Yuan-Hao Chang

70

November 26, 2010

Control Sections and Program Linking


A control section is a part of the program that maintains its
identity after assembly.
Each control section can be loaded and relocated independently of
the others.
Different control sections are most often used for subroutines or
other logical subdivisions of a program.
Control section allows programmers to assemble, load, and
manipulate separately to enhance flexibility.

Control sections need to provide some means for linking


them together.
Some external references that reference instruction or data among
control sections need extra information during linking.
Because control sections are independently loaded and relocated,
the assembler is unable to process these references in the usually
way.
- The assembler generates information for each external reference that
will allow the loader to perform the required linking.
Copyright All Rights Reserved by Yuan-Hao Chang

71

November 26, 2010

Example of Control Sections

Copyright All Rights Reserved by Yuan-Hao Chang

72

November 26, 2010

Example of Control Sections (Cont.)

Copyright All Rights Reserved by Yuan-Hao Chang

73

November 26, 2010

Example of Control Sections (Cont.)

Copyright All Rights Reserved by Yuan-Hao Chang

74

November 26, 2010

Control Section Example


In this example, there are three control sections:
One for the main program.
- The START statement identifies the beginning of the assembly
and gives a name (COPY) to the first control section.

One for the RDREC subroutine.


- The CSECT statement on line 109 begins the control section
RDREC.

One for the WDREC subroutine.


- The CSECT statement on line 193 begins the control section
WDREC.

Copyright All Rights Reserved by Yuan-Hao Chang

75

November 26, 2010

Control Sections
Control sections are handled separately by the assembler.
Symbols that are defined in one control section may not be used
directly by another control section. They must be identified as
external reference for the loader to handle.

Two assembler directives are adopted to solve external


reference problem:
EXTDEF (external definition):
- Name symbols as external symbols that are defined in this control
section and may be used by other sections.
- Control section names (e.g., COPY, RDREC, and WRREC) are
automatically considered to be external symbols (Line 7).

EXTREF (external reference):


- Name symbols that are used in this control section and are defined
elsewhere.
- E.g., BUFFER, BUFEND, and LENGTH are defined in the control section
named COPY and made available to the other section (Lines 122, 193).
Copyright All Rights Reserved by Yuan-Hao Chang

76

November 26, 2010

Example of Control Sections with Object Code

Copyright All Rights Reserved by Yuan-Hao Chang

77

November 26, 2010

Example of Control Sections with Object Code


(Cont.)

Copyright All Rights Reserved by Yuan-Hao Chang

78

November 26, 2010

Example of Control Sections with Object Code


(Cont.)

Copyright All Rights Reserved by Yuan-Hao Chang

79

November 26, 2010

Using External References


opcode = 48

15

0003

CLOOP

80

Cant assemble now


due to external
reference.

+JSUB RDREC

4B100000

160
0017
+STCH BUFFER, X
57900000
The operand (RDREC) is named in the EXTREF statement for the
control section, so this is an external reference.
opcode = 48
The assembler has no idea where the control section containing
RDREC will be loaded, so it cant assemble the address for this
instruction.
- Relative addressing is not possible during assembling.
- An extended format instruction must be used to
provide room for the actual address to be inserted.

190
107

0028
1000

MAXLEN
MAXLEN

External
reference.

WORD BUFEND-BUFFER
EQU BUFEND-BUFFER

000000

Local reference so that the value can be calculated during assembling.


Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Using External References (Cont.)


Conflicting definitions (e.g., MAXLEN on lines 107
and 190) cause no problem.
A reference to MAXLEN in the control section COPY
would see the definition on line 107.
A reference to MAXLEN in the control section RDREC
would see the definition on line 190.

We need two new record types in the object


program and a change in a previously defined
record type to include information for the loader to
insert the proper values when required.
Copyright All Rights Reserved by Yuan-Hao Chang

81

November 26, 2010

Define and Refer Records


A Define record gives information about external symbols that are
defined in this control section (named by EXTDEF).
Define record
Col. 1
D
Col. 2-7
Name of external symbol defined in this control section.
Col. 8-13 Relative address of symbol within this control section
(hexadecimal).
Col. 14-73 Repeat information in Col. 2-13 for other external symbols.
A Refer record lists symbols that are used as external references by the
control section (named by EXTREF).
Refer record (revised)
Col. 1
R
Col. 2-7
Name of external symbol referred to in this control section.
Col. 8-73 Names of other external reference symbols.
Copyright All Rights Reserved by Yuan-Hao Chang

82

November 26, 2010

Define and Refer Records (Cont.)


The symbol used for modification may be defined either in
this control section or in another one.
Modification record (revised)
Col. 1
M
Col. 2-7
Starting address of the field to be modified, relative to the
beginning of the control section (hexadecimal).
Col. 8-9
Length of the field to be modified, in half-bytes
(hexadecimal).
Col. 10
Modification flag (+ or -).
Col. 11-16 External symbol whose value is to be added to or
subtracted from the indicated field.

Copyright All Rights Reserved by Yuan-Hao Chang

83

November 26, 2010

Object Program for Control Sections


- Control Section COPY
Each control section
has its own Start and
End records.

In Define records, each external


symbols also indicates its relative
address within the control section.
In Refer records, the referred
external symbols have no
address information availabe.

For +JSUB on Line 15,


begin at relative address
0004 to plus RDREC.
15

0003

CLOOP +JSUB

RDREC 4B100000

Byte 4 5 6 7

Copyright All Rights Reserved by Yuan-Hao Chang

84

November 26, 2010

85

Object Program for Control Sections


- Control Sections RDREC and WRREC

Add BUFEND and subtract BUFFER


because the assembler generats an
initial value of zero for this word..
190

0028

MAXLEN WORD

BUFEND-BUFFER 000000

Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Restrictions
Both terms in each pair must be relative within the same
control section.
If two terms represent relative locations in the same control section,
their difference is an absolute value.
If two terms are in different control sections, their difference
has a value that is unpredictable.
For Example:
BUFEND-BUFFER
RDREC-COPY

When an expression involves external references, the


assembler cannot in general determine whether the
expression legal.
The pairing of relative terms to test legality cannot be done
without knowing which of the terms occur in the same
control sections.
The assembler can generate Modification records so the loader can
finish the evaluation.
Copyright All Rights Reserved by Yuan-Hao Chang

86

Assembler Design Options

November 26, 2010

One-Pass Assemblers
The main problem in trying to assemble a program in one
pass involves forward references.
The assembler does not know what address to insert in the
translated instruction.
It is too severe to require all such areas defined before their uses.
Forward reference to labels on instruction cannot be eliminated
easily due to the forward jump.

There are two types of one-pass assembler.


One produces object code directly in memory for immediate
execution. (Load-and-go assembler, One-pass assembler)
The other produces the usual kind of object program for later
execution.
Copyright All Rights Reserved by Yuan-Hao Chang

88

November 26, 2010

Load-and-Go Assembler
No program is written out, and no loader is needed.
It is useful in a system that is oriented toward program development and
testing.
E.g., University computing system for students use.

Because programs are re-assembled nearly every time they are run,
efficiency of the assembly process is an important consideration.
The handling of forward references becomes less difficult due to the inmemory process.
An instruction operand is a symbol that has not yet been defined, the
operand address is omitted when it is assembled.
- The address of the operand field of the instruction that refers to the undefined
symbol is added to a forward reference list associated with the symbol table
entry.
- When the definition for a symbol is encountered, the forward reference list for that
symbol is scanned, and the proper address is inserted into any instructions
previously generated.

At the end of assembling, the assembler searches SYMTAB for the


value of the symbol named in the END statement (in this case: FIRST).
Copyright All Rights Reserved by Yuan-Hao Chang

89

November 26, 2010

One-Pass Assembler Example


SIC
Forward
reference

Copyright All Rights Reserved by Yuan-Hao Chang

90

November 26, 2010

One-Pass Assembler Example (Cont.)

Copyright All Rights Reserved by Yuan-Hao Chang

91

November 26, 2010

Object Code and Symbol Table

2013

201C

92

The address of
the operand field
(2013) was
inserted a list.

201F

The first forward reference occurred


on line 15 because the operand
(RDREC) was not defined.
It is entered the symbol table.

Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

Object Code and Symbol Table (Cont.)


201F

2030

The situation of forward reference table after Line 160.

Copyright All Rights Reserved by Yuan-Hao Chang

93

November 26, 2010

One-Pass Assembler
One-pass assembler usually adopts absolute program.
It produces object programs as output and is usually adopted on systems
where external working-storage devices are not available or slow.

If the actual address is assigned by the assembly, the location counter


would be initialized to the actual program starting address at assembling
time.
When the definition of a symbol is encountered, instructions that made
forward references to that symbol may no longer be available in
memory for modification.
In general, they have already been written out as part of a Text record in
the object program.
- In this case, the assembler must generate another Text record with the correct
operand address.
- When the program is loaded, this address will be inserted into the instruction by
the action of the loader.
Copyright All Rights Reserved by Yuan-Hao Chang

94

November 26, 2010

Object Code from One-Pass Assembler


ENDFIL

WRREC
Lines 10-40

RDREC
To replace the address at 2013

Copyright All Rights Reserved by Yuan-Hao Chang

95

November 26, 2010

Multi-Pass Assemblers
Constraints in two-pass assemblers:
Any symbol used in the right-hand side (i.e., in the expression giving the
value of the new symbol) of assembly directives (e.g., EQU and ORG) be
defined previously in the source program.
E.g., ALPHA EQU
BETA
BETA EQU
DELTA
DELTA RESW 1
- BETA cannot be assigned a value in the first pass because DELTA has not yet
been defined.

Forward reference prohibition is not a serious inconvenience for the


programmer.

Multi-pass assemblers can solve the definitions of symbols.


Forward references in symbol definition are saved during Pass 1.
Additional passes through these stored definitions are made as the
assembly progresses.
Copyright All Rights Reserved by Yuan-Hao Chang

96

November 26, 2010

Multi-Pass Assemble Example


MAXLEN has not yet been defined, no
value for HALFSZ can be computed.
Indicate one symbol (i.e.,
MAXLEN) in the defining
expression is undefined.

MAXLEN also enters the


table with the flag *
identifying it as undefined.

Expression
HALFSZ depends on MAXLEN

Symbols depend
on MAXLEN.
Copyright All Rights Reserved by Yuan-Hao Chang

97

November 26, 2010

Multi-Pass Assemble Example (Cont.)

Two
unsolved
symbols

Expression

Copyright All Rights Reserved by Yuan-Hao Chang

98

November 26, 2010

Multi-Pass Assemble Example (Cont.)

Copyright All Rights Reserved by Yuan-Hao Chang

99

November 26, 2010

Multi-Pass Assemble Example (Cont.)

Evaluated
Updated to &1

Suppose that the location


counter is hexadecimal 1034
when Line 4 is read.

Address of BUFFER

Copyright All Rights Reserved by Yuan-Hao Chang

100

November 26, 2010

Multi-Pass Assemble Example (Cont.)

Copyright All Rights Reserved by Yuan-Hao Chang

101

Implementation Examples

November 26, 2010

Selected Examples
We focus on some of the most interesting or
unusual features of each assembler.
We are also particularly interested in areas where
the assembler design differs from the basic
algorithm and data structures.
Selected examples:
Pentium(x86)
SPARC
PowerPC
Copyright All Rights Reserved by Yuan-Hao Chang

103

November 26, 2010

MASM Assembler
MASM assembler is for Pentium and other x86 systems.
Programmer of an x86 system views memory as a
collection segments.
An MASM assembler language program is written as a collection of
segments.
Each segment is defined as belonging to a particular class
(corresponding to its contents).
- Commonly used classes are CODE, DATA, CONST and STACK.

During program execution, segments are addressed via the


x86 segment registers.
Code segments are addressed using register CS.
Stack segments are addressed using register SS.
Data segments (including constant segments) are normally
addressed DS, ES, FS, or GS.
Copyright All Rights Reserved by Yuan-Hao Chang

104

November 26, 2010

105

MASM Assembler - Segments


Segment registers are automatically set by the system loader when a
program is loaded for execution.
Register CS is set to indicate the segment that contains the starting label
specified in the END statement of the program.
Register SS is set to indicate the last stack segment processed by the
loader.

Segment registers can be specified by the programmer or auto-selected


by the assembler.
By default, the assembler assumes that all references to data
segments use registers DS.
It is possible to collect several segments into a group and use ASSUME
to associate a segment register with the group.
ASSUME ES:DATASEG2

Tell the assembler to assume that register ES


indicates the segment DATASEG2.
Any references to labels that are defined in
DATASEG2 will beCopyright
assembled
usingbyregister
ES.
All Rights Reserved
Yuan-Hao Chang

November 26, 2010

MASM Assembler Segments (Cont.)


Registers DS, ES, FS, and GS must be loaded by the
program before they can be used to address data
segments.
E.g., MOV
MOV

AX, DATASEG2
ES, AX

Set ES to indicate the data segment


DATASEG2.

ASSUME directive is similar to the BASE directive in


SIC/XE.
The BASE directive tells an SIC/XE assembler the contents of
register B; the programmer must provide executable instructions to
load this value into the register.
ASSUME tells MASM the contents of a segment register; the
programmer must provide instructions to load this register when the
program is executed.
Copyright All Rights Reserved by Yuan-Hao Chang

106

November 26, 2010

MASM Assembler JMP Instruction


Forward references to labels could cause problems.
E.g., JMP

TARGET

- If the label TARGET occurs in the program before the JMP instruction, the
assembler can tell whether this is a near jump or a far jump.
- If this is a forward reference to TARGET, the assembler does not know how many
bytes to reserve for the instruction.

By default, MASM assumes that a forward jump is a near jump. If the


target of the jump is in another code segment, the programmer must
warn the assembler by writing: (Similar to extended format in SIC/XE)
JMP
FAR
PTR
TARGET
If the jump address is with 128 bytes of the current instruction, the
programmer can specify the shorter (2-byte) near jump by writing:
JMP
SHORT
TARGET
If the JMP to TARGET is a far jump and the programmer does not
specify FAR PTR, a problem occurs.
The later versions of MASM assembler can repeat Pass 1 to generate
correct location counter values.
Copyright All Rights Reserved by Yuan-Hao Chang

107

November 26, 2010

MASM Assembler Instruction


There are many other instructions in which the length of an
assembled instruction depends on the operands that are
used.
The operands of an ADD instruction may be registers, memory
locations, or immediate operands.
- Immediate operands may occupy from 1 to 4 bytes in the instruction.
- An operand that specifies a memory location may take varying amounts
of space in the instruction, depending on the location of the operand.

Pass 1 of an x86 assembler must be considerably ore complex than


Pass 1 of a SIC assembler.
- The first pass of x86 assembler must analyze the operands of an
instruction, in addition to looking at the operation code.
- The operation code table must also be more complicated, since it must
contain information on which addressing modes are valid for each
operand.

Copyright All Rights Reserved by Yuan-Hao Chang

108

November 26, 2010

MASM Assembler - References


Segments in an MASM source program can be written in more than one part.
If a SEGMENT directive specifies the same name as a previously defined segment, it
is considered to be a continuation of that segment.
All of the parts of a segment are gathered together by the assembly process.
Segments can perform a similar function to the program blocks in SIC/XE.

References between segments that are assembled togher are automatically


handled by the assembler.
External references between separately assembly modules must be handled by the
linker.
The MASM directive PUBLIC has approximately the same function as the SIC/XE
directive EXTREF.
The MASM directive EXTERN has approximately the same function as the SIC/XE
directive EXTREF.

The object program from MASM may be in several different formats to allow
easy and efficient execution of the program in a variety of the operation
environment.
MASM can also produce an instruction timing listing that shows the number
of clock cycles required to execute each machine instruction. (This allows the
programmer to exercise a great deal of control in optimizing timing-critical
sections of code).
Copyright All Rights Reserved by Yuan-Hao Chang

109

November 26, 2010

(SunOS) SPARC Assembler


SPARC assembler can be found in Sun Microsystems.
A SPARC assembler language program is divided into units
called sections.
The assembler provides a set of predefined section names.
It is also possible to define other sections, specifying
section attributes such as executable and writable.
.TEXT
.DATA
.RODATA
.BSS

Executable instructions
Initialized read/write data
Read-only data
Uninitialized data areas
Copyright All Rights Reserved by Yuan-Hao Chang

110

November 26, 2010

SPARC Assembler
The assembler maintains a separate location counter for each named
section.
Each time the assembler switches to a different section, it also switch to
the location counter associated with that section.
Sections are similar to the program blocks in SIC.
References between different sections are resolved by the linker, not by
the assembler.

Symbols used in a source program are assumed to be local to that


program.
Symbols that are used in linking separately assembled programs may
be declared to be either global or weak.
A global symbol is either a symbol that is
- Defined in the program and made accessible to others or
- Referenced in a program and defined externally.
- (This combines EXTDEF and EXTREF in SIC)

A weak symbol is similar to a global symbol.


- The definition of a weak symbol is similar to a global symbol, but can be
overwritten by a global symbol with the same name.
- Weak symbols could remain undefined when the program is linked.
Copyright All Rights Reserved by Yuan-Hao Chang

111

November 26, 2010

SPARC Assembler Object File


The object file written by the SPARC assembler
contains
Translated version of the segments of the program.
A list of relocation and linking operations that need to be
performed.
A symbol table that describes the symbols used during
relocation and lining.

References between different segments of the


same program are resolved when the program is
linked.
Copyright All Rights Reserved by Yuan-Hao Chang

112

November 26, 2010

SPARC Assembler Delay Slot


SPARCE branch instructions (including
subroutine calls) are delayed branches.
This is an unusual feature directly related to the machine
architecture.
The instruction immediately following a branch
instruction is actually executed before the branch is
taken.
E.g., The ADD instruction is executed before the
conditional branch BLE.
In the delay slot: Execute
CMP
BLE
ADD

%L0, 10
LOOP
%L2, %L3, %L4

regardless of whether or
not the conditional
branch is taken.
Copyright All Rights Reserved by Yuan-Hao Chang

113

November 26, 2010

SPARC Assembler Delay Slot (Cont.)


To simplify debugging, SPARC assembly language
programmers often place NOP (no-operation) instruction in
delay slots when a program is written.
This code is later rearranged to move useful instructions into the
delay slots.
LOOP: .
.
ADD
CMP
BLE
NOP

%L2, %L3, %L4


%L0, 10
LOOP

LOOP: .
.
CMP
BLE
ADD

%L0, 10
LOOP
%L2, %L3, %L4

CMP instruction sets the condition codes


that must be tested by the BLE so that it
can not be moved to the delay slot.
Copyright All Rights Reserved by Yuan-Hao Chang

114

November 26, 2010

SPARC Assembler Delay Slot (Cont.)


The SPARC architecture defines a solution to this problem.
A conditional branch instruction like BLE can be annulled.
- If a branch is annulled, the instruction in its delay slot is executed if the
branch is taken, but not executed if the branch is not taken.

Annulled branches are indicated in SPARC assembler language by


writing , A following the operation code.
LOOP: ADD
.
.
CMP
BLE
NOP

%L2, %L3, %L4

%L0, 10
LOOP

If the ADD instruction is moved to the delay


slot, it creates a problem: On the last
execution of the loop, the ADD instruction
should not be executed.

LOOP: .
.
CMP %L0, 10
BLE, A LOOP
Copyright
All Rights Reserved
Yuan-Hao%L4
Chang
ADD
%L2,by%L3,

115

November 26, 2010

SPARC Assembler Delay Slot (Cont.)


The SPARC assembler provides warning
messages to alert the programmer to possible
problems with the delay slots.
For example:
A label on an instruction in a delay slot usually indicates
an error.
A segment that ends with a branch instruction (with
nothing) is also likely to be incorrect.
- Before the branch is executed, the machine will attempt to
execute whatever happens to be stored at the memory location
immediately following the branch.
Copyright All Rights Reserved by Yuan-Hao Chang

116

November 26, 2010

AIX Assembler
AIX assembler is for PowerPC and other similar
systems designed by IBM.
The AIX includes support for various models of
PowerPC microprocessor.
The programmer can declare which architecture is
being used with the assembler directive .MACHINE.
When the object program is generated, the assembler
includes a flag that indicates which processors are
capable of running the program.
Copyright All Rights Reserved by Yuan-Hao Chang

117

November 26, 2010

AIX Assembler USING Directive


PowerPC load and store instructions use a base
register and a displacement value to specify an
address in memory.
Any general-purpose registers (except GPR0) can be
used as a base register.
Decisions about which registers to use are left to the
programmer.

In a long program, it is not unusual to have several


different base registers in use at the same time.
The programmer uses the .USING directive to specifies
which registers are available for use as base registers,
and the contents of these registers.
(Similar to the BASE statement in Copyright
SIC/XE)
All Rights Reserved by Yuan-Hao Chang

118

November 26, 2010

AIX Assembler USING Directive (Cont.)


.USING
.USING

LENGTH, 1
BUFFER, 4

Identify GPR1 and GPR4 as base registers.


- GPR1 would be assigned to contain the
address of LENGTH.
- GPR4 would be assumed to contain the
address of BUFFER.

Programmer must provides instructions to place these


values into the registers at execution time.
If a base register is to be used later for some other purpose,
the programmer uses the .DROP statement to indicate that
this register is no longer available for addressing purposes.

Copyright All Rights Reserved by Yuan-Hao Chang

119

November 26, 2010

AIX Assembler USING Directive (Cont.)


A base register table is used to remember which of the
GPRs are currently available as base registers, and what
base addresses they contain.
Processing a .USING statement causes an entry to be made in this
table or modified.
Processing a .DROP statement removes the corresponding table
entry.

For each instruction whose operand is an address in


memory, the assembler scans the table to find a base
register that can be used to address the operand.
If more than one register can be used, the assembler selects the
base register that results in the smallest signed displacement.
If no suitable base register is available, the instruction cannot be
assembled.
(The displacement calculation is the same as SIC/XE).

Copyright All Rights Reserved by Yuan-Hao Chang

120

November 26, 2010

121

AIX Assembler Register and Displacement


The AIX assembler language allows the programmer to
write base registers and displacement explicitly in the
source program.

Specify an operand address that is 8 bytes


past the address contained in GPR4.

L
2, 8(4)
This form of addressing may be useful when some register is known
to contain the starting address of a table or data record, and the
programmer wishes to refer to a fixed location within that table or
record.
This assembler simple inserts the specified values into the object
code instruction: base register GPR4 and displacement 8 (in this
example)
This base register is no involved in this way.
The register used in this way need not have appeared in a .USING
statement.
Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

AIX Assembler Control Sections


An AIX assembler program can be divided into
control sections using the .CSECT directive.
Each control section has an associated storage
mapping class that describes the kind of data it
contains.
Some of commonly used classes are PR (executable
instructions), RO (read-only data), RW (read/write data),
and BS (uninitialized read/write data).

One control section may consist of several different


parts of the source program, and gathered together
by the assembler.
AIX control sections combine some of the features of
the SIC control section and program blocks.

Copyright All Rights Reserved by Yuan-Hao Chang

122

November 26, 2010

AIX Assembler Control Sections (Cont.)


AIX assembler supports dummy sections.
Data item in a dummy section do not actually become
part of the object program.
- They serve only to define labels within the section.
- Dummy sections are most commonly used to describe the layout
of a record or table that is defined externally.

AIX also provides common blocks, which are


uninitialized blocks of storage that can be shared
between independently assembled programs.

Copyright All Rights Reserved by Yuan-Hao Chang

123

November 26, 2010

124

AIX Assembler Linking of Control Sections


Linking of control sections is similar to SIC.
.GLOBL directive makes a symbol available to the linker.
.EXTERN directive declares that a symbol is defined in
another source module.

The AIX assembler provides different linking


methods.
The programmer can create a table of contents (TOC)
for the assembled program.
- The TOC contains addresses of control sections and global
symbols defined within the control sections.
- The program retrieves the needed address from the TOC.
- If all references to external symbol are done in this way, the TOC
entries are the only parts of the program involved in relocation
and linking when the program is loaded.Copyright All Rights Reserved by Yuan-Hao Chang

November 26, 2010

AIX Assembler Two-Pass Structure


First pass
Write a listing file that contains warnings and error
messages.
If errors are found during the first pass, the assembler
terminates and does not continue to the second pass.

Second pass
Read the source program again, instead of using an
intermediate file.
- It means that any warning messages that were generated during
Pass 1are lost.

The assembly listing will contain only errors and


warnings generated during Pass 2.
Copyright All Rights Reserved by Yuan-Hao Chang

125

November 26, 2010

AIX Assembler Object Program


Assembled control sections are placed into the object
program according to their storage mapping class.
.TEXT includes executable instructions, read-only data, and various
kinds of debugging tables.
.DATA includes read/write data and TOC entries.
.BSS includes uninitialized data.

When the object program is generated, the assembler first


Writes all of the .TEXT control sections,
Followed by all of the .DATA control sections.
The TOC is written after the other .DATA control sections.

Relocation and linking operations are specified by entries in


a relocation table. (similar to the Modification records for
SIC).
Copyright All Rights Reserved by Yuan-Hao Chang

126

Das könnte Ihnen auch gefallen