Sie sind auf Seite 1von 44

CHAPTER 2

ASSEMBLER

Role of Assembler

Object

Source
Program

Assembler

Code

General Design Procedure

In general following procedure is used in design of


an assembler.
Define problem statement
Define data structures
Define format of data structures
Define algorithm
Check for modularity
Repeat steps from1 to 5 for all modules.

Forward Reference Problem

In an assembly language program we can use symbols


which are the names associated with data or
instructions.
It may happen that the symbols are referred before
they are defined. This is called as forward reference.
One approach to solve this problem is to have two
passes over the source program. So the first pass just
defines the symbols and second pass finds the
addresses.

Design of two pass assembler


IBM 360/370 System

Elements of Assembly language

Mnemonic operation codes: It is symbolic name given to


each machine instruction. It eliminates the need of
memorizing the numeric op-codes.
Pseudo-op : These are the instructions for the assembler
during the assembly process of program.
Machine-op: These are actual machine instructions
The general format of assembly language statement is:
[label] <op-code> operand (s) ;

Symbols: These are the names associated with data


or instructions. These names can be used as
operand in program.
Literal: It is an operand which has syntax like
=<value>.
Assembler creates data area for literals containing
the constant values.
E.g. =F10

Location Counter : Used to hold address of current


instruction being executed

Meaning of some pseudo-op

START: It indicates start of the source program


END : It indicates end of the source program.

EQU : It associates symbol with some address specification.

<symbol> EQU <address spec>


USING: It tells assembler that which register is used as
base register and its contents
DROP: It makes the base register unavailable.
LTORG: It tells assembler to place all literals at earlier
place
DC: Define constant
DS: Define storage

addressing scheme
offset
A 1, 250 (0,15)
index register

base register

It is add instruction which adds some number to


contents of register 1. The location of number is
calculated as :
location = offset + contents of index register
+contents of base register

5 Instruction Formats

RR (register-register)
RX (register-indexed)
RS (register-storage)
SI (storage-immediate)
SS (storage-storage)

Two passes

Pass 1 : It defines symbols and literals


Find length of machine instructions
Maintain location counter
Remember values of symbols till pass2
Process some pseudo ops
Remember literals

Pass 2: Generate object program


Look up values of symbols
Generate instructions
Generate data
Process pseudo ops

Pass1 requires following databases

Source program
Location counter(LC) which stores location of each
instruction
Machine Operation Table (MOT):This table indicates the
symbolic mnemonic for each instructions and its length.
Pseudo Operation Table (POT): This table indicates the
symbolic mnemonic and action taken for each pseudo-op in
pass1.
Symbol Table (ST) which stores each label along with its
value.
Literal Table(LT) which stores each literal and its
corresponding address
A copy of input which will be used by pass2.

Format of databases

Machine-op Table (MOT)

Pseudo-op Table (POT)

Symbol Table (ST)

Literal Table (LT)

Flowchart

Example

Pass 2 requires following databases:

Copy of Source program


Location counter(LC)
Machine Operation Table (MOT).
Pseudo Operation Table (POT).
Symbol Table (ST) generated by pass1
Base Table (BT) which indicates which register is used as
base register and what its contents are.
A work space (INST) which holds each instruction as its
various parts.
A work space (PRINTLINE) which produces a printed listing
A work space (PUNCH CARD) which is used to output the
assembled instructions in format needed by loader.
An output deck of assembled instructions in format needed
by loader.

Base Table

Flowchart

Design of one pass assembler


IBM PC
Intel 8088

One pass processing


Analysis Phase

Isolate label, mnemonic opcode and operand field


If label present enter (symb, LC contents) in
Symbol Table
Perform LC processing

Synthesis Phase

Obtain machine opcode


Obtain address from symbol table

Addressing
Segment based addressing scheme is used.

Code segment(CS)
Data segment(DS)
Stack segment(SS)
Extra Segment(ES)

Assembler directives
1. EQU : It associates symbol with some address
specification.
<symbol> EQU <address spec>
2. ORG : It is used to set location counter to specified address.
ORG <address spec>
3. ASSUME : This directive tells the assembler which
segment register contains the segment base.
ASSUME <register> :<segment name>
4. SEGMRNT : It indicates start of segment
5. ENDS : It indicates end of segment

Databases required
1.
2.

3.

4.

5.

6.

Source program
Mnemonic Operation Table (MOT). This table indicates
the symbolic mnemonic for each instruction.
Symbol Table (ST) which stores each label along with its
relevant information.
Segment Register Table (SRTAB) which stores
information about segment name and segment register.
Forward Reference Table (FRT) which stores information
about forward references.
Cross reference table (CRT) which list out all references to
a symbol in ascending order of statements.

Mnemonic Table (MOT)


Mnemonic
op-codes
(6)
JNE

Machine
op-codes
(2)
75 H

Alignment/forma
t information
(1)
00H

Routine id
(4)
binary
R2

Symbol Tabel

Segment Register Table Array


(SRTAB)

Forward Reference Table (FRT)

Pointer
(2)

SRTAB #
(1)

Instruction Usage
Address
Code
(2)
(1)

Source
statement
#
(2)

Cross reference table (CRT)

Pointer to next entry (2)

Source statement # (2)

The stepwise processing is as


follows:

Initialization of some parameters: LC=0 , size=0, srtab_no=1,


SYMTAB_segmrnt_entry=0,
ERRTAB and SRTAB_ARRAY is cleared
Read the statement from source program
Examine the op-code field to check whether it is pseudo-op or machine-op.
If it is machine-op then MOT is searched to find match for the op-code and call
the appropriate routine.

Every type statement requires different processing. The statemets are processed
in following way.
If it is EQU pseudo-op then
Evaluate expression in operand field,
Make entry for the label in SYMTAB
set offset = value of operand
Enter stmt_no in the CRT list of the label in operand field
Process forward references to the label.
size=0

If

it is ASSUME statement then

Create a new SRTAB and make entry for segment


register and SYMTAB_segment_entry for the
segment name in operand field..
srtab_no= srtab_no+1
size=0

If

SEGMENT statement then

make entry for label in SYMTAB with


segment_name =true
size=0
LC=0
SYMTAB_segment_entry=entry no in SYMTAB

If ENDS statement then SYMTAB_segment_entry=0


If DC statement then
Align LC according to specification in operand field
Assemble constant if any
size=size of memory required
If Imperative statement then
If operand is symbol then make entry in CRT
If operand symbol is already defined then check its alignment
and addressability and generate address specification for
symbol using SYMTAB entry
else Make appropriate entry in for the symbol in SYMTAB
Assemble instruction in machine_code buffer
size=size of instruction

If

size!=0

If label is present then Make appropriate entry in for


the symbol in SYMTAB with current LC
Move contents of machine_code_buffer to address
code_area_address
code_area_address= code_area_address+size
process forward references for symbol
Enter errors in ERRTAB
List statements with errors contained in ERRTAB
Clear ERRTAB

If

END statement then

Report undefined symbols from SYMTAB


Produce cross reference listing
Write code_area into output file.