Sie sind auf Seite 1von 49

Assembler

Ravi Bhushan Thakur

Ravi Bhushan Thakur 1


Role of Assembler

Source Object
Assembler Linker
Program Code

Executable
Code

Loader

2 Ravi Bhushan Th
Introduction to Assemblers
 Fundamental functions
 translating
mnemonic operation codes to their
machine language equivalents
 assigning machine addresses to symbolic
labels

 Machine dependency
 different machine instruction formats and codes

3 Ravi Bhushan Th
Assembler Directives
 Pseudo-Instructions
 Not translated into machine instructions
 Providing information to the assembler, to organize
the program better.
 May vary from assembler to assembler

 Basic assembler directives


 START,Global= export symbols, Common
 END, Extern= import symbol from other modules
 BYTE, Section= changes the section of o/p file
 WORD, BITS= target processor mode

4 Ravi Bhushan Th
Assembler’s functions
 Convert mnemonic operation codes to
their machine language equivalents
 Convert symbolic operands to their
equivalent machine addresses
 Build the machine instructions in the
proper format
 Convert the data constants to internal
machine representations
 Write the object program and the
assembly listing
5 Ravi Bhushan Th
Data Structures

 Operation Code Table (OPTAB)/ MOT


 Symbol Table (SYMTAB)
 Location Counter(LOCCTR)
 Pseudo Opcode Table (POT)

6 Ravi Bhushan Th
OPTAB (Operation Code Table)
 Used to look up mnemonic operation codes and
translate them into machine language equivalents
 Contains the mnemonic operation code and its machine
language equivalent
 In more complex assemblers, contains information like
instruction format and length
 Content
 mnemonic, machine code (instruction format, length)
etc.
 Characteristic
 static table
 Implementation
 array or hash table, easy for search
7 Ravi Bhushan Th
OPTAB (Operation Code Table)
MOT Structure

Mnemonic Size Opcode

* The basic structure of the MOT may vary based upon


the instruction set and on the designer of the assembler
* Opcode: machine code corresponding to the Opcode.
Organization of MOT:
a. Hash function
b. Binary search tree
c. Link list implementation

8 Ravi Bhushan Th
SYMTAB (Symbol Table)
 Used to store values (addresses) assigned to labels
 Includes the name and value for each label
 Flags to indicate error conditions, e.g. duplicate definition of
labels
 May contain other info like type or length about the data area
or instruction labeled

 Content
 label name, value, flag, (type, length), variables, constants,
procedures etc.
 Characteristic
 dynamic table (insert, delete, search)
 Implementation
 hash table, non-random keys, hashing function

9 Ravi Bhushan Th
SYMTAB (Symbol Table)
 As SYMTAB is an essential data structure used by the
assembler to remember information about identifiers
appearing in the program to be assembled.
 Symbols used may vary from one assembler to other
assembler for the same assembly language.
 Fundamental operations:
a. Insertion of new symbol
b. Lookup or search
c. Modify information regarding a symbol stored earlier in the
table
Name Type Location
X variable Offset of x
L1 Label Offset of L1
pqr procedure Offset of pqr

10 Ravi Bhushan Th
POT (Pseudo Opcode Table)
 Used to store pseudo opcodes supported by the assembler.
 These are used to reserve memory space and possibly
initialize it.
 e.g.
 DB: Define byte
 DW: Define Word
 DD: Define Double Word
 DQ: Define Double Precision Float
 DT: Define Double Precision Float
 RESB: Reserve Byte
 RESW: Reserve Word
 RESD: Reserve Double Word

11 Ravi Bhushan Th
POT (Pseudo Opcode Table)
 Structure

Pseudo-opcode Type Size Initializable

 Entries are less compare to MOT so organization doesn’t


matter much.

12 Ravi Bhushan Th
LOCCTR (Location Counter)
 LOCCTR
 Used to help in the assignment of addresses
 Initialized to the beginning address specified in the
START statement
 After each source statement is processed, the
length of the assembled instruction or data area to
be generated is added
 Gives the address of a label

13 Ravi Bhushan Th
A Simple Machine
Assume a simple hypothetical accumulator based
processor. Considerations:
• All memory load and store operations are through the
accumulator register.
• Arithmetic & Logical operations used mostly A as
source and destination register.
• There are two more 32 bit GPRs B, C and one Index
register I ( arithmetic is permitted on I also)
• Addressing modes are immediate an indirect through
I.
• All memory addresses are 32 bit wide.

14 Ravi Bhushan Th
Manual Assembler
 Machine Opcode table for a simple processor

15 Ravi Bhushan Th
Contd..
 Program to add 10 nos.

16 Ravi Bhushan Th
Contd..
 Symbol Table

17 Ravi Bhushan Th
Contd..
 Code generated

18 Ravi Bhushan Th
Contd..

19 Ravi Bhushan Th
Two Pass Assembler
 The two pass assembly process scans the
I/P assembly language program twice.
 These scans are known as Pass-1 and
Pass-2

20 Ravi Bhushan Th
Two Pass Assembler
 Read from input line
 LABEL, OPCODE, OPERAND, LITERALS

Source
program

Intermediate Object
Pass 1 Pass 2
file codes

OPTAB SYMTAB SYMTAB

21 Ravi Bhushan Th
Pass-1 of Two Pass Assembler

Symbol table: it holds information about the


symbols defined in the program.
Name Type Location Size Section_Id Is_Global

Location stores the offset of the symbol from


start of the section
Is_Global is a boolean value i.e. either true or
false.
Apart from these two there is one more table
named as Common Table, contains the name
and size of the common variable .
22 Ravi Bhushan Th
Flow chart of Pass-I

23 Ravi Bhushan Th
Pass-1 of Two Pass Assembler

The main task of this pass is to scan the input


file and compute the offsets of all symbols
appearing in the program. During this scanning
it fills up some tables as:
Section table: it has detailed information
regarding all sections present in the program

Name Size Attributes Pointer to content

At the end of Pass-2 pointer will point to the


content of section after translation
24 Ravi Bhushan Th
Pass-1 of Two Pass Assembler
 In the beginning, it initializes the Global list into which all global
declarations will be stored.
 The location counter is initialized to 0. the process is then scanning the
next line and updating the tables as needed, till the end of file marker
is reached.
 Each line is parsed into components like label, mnemonics, and
operands. Mnemonics can be machine instruction, a pseudo opcode or
assembler directive.
 Labels, pseudo opcodes, variables are placed in symbol table with
their locations as provided by the locctr.
 Locctr/lc is updated by the size of instruction based on the system. For
pseudo opcodes, lc is updated based on the amount of memory
required.
 For assembler directives:
 For a section directive indicating the beginning of a new section, the
size of the last section in section table is set to the current lc value. A
25
new entry in section table is created and set the lc value as 0. Ravi Bhushan Th
Pass-1 of Two Pass Assembler
 For an extern declaration, it is put into symbol table with type external.
 For global declaration, symbols are put in global list similarly common
symbols are putted into common table. Common symbol entries are
also their in Symbol table with type common.
 When end of file is reached, section table is updated to mark the size
of the last section as lc. Symbol table is also updated to set the Is-
Global field to true for all entries corresponding to symbols in global
list.
 Control is passed to Pass-2.

26 Ravi Bhushan Th
A Simple Program

27 Ravi Bhushan Th
Pass-1 of Two Pass Assembler
(Section & Symbol Tables)

28 Ravi Bhushan Th
Pass-2 of Two Pass Assembler
 This phase is responsible for generating the code.
 It uses the tables created in pass 1 and writes the
generated code in object file.
 At first object file offset and lc are initialized with 0.
The source program lines are now read one by one
and corresponding object code is generated.
 The source lines are parsed into its components and
mnemonics and symbols are getting separated.
 For machine instructions the address of the operands
are found via symbol table.
 For pseudo opcodes like db, dw the corresponding
initialization values are also written in object file but for
RESB, RESW etc only object file offset is updated
29
reserving the space in the object file Ravi Bhushan Th
Pass-2 of Two Pass Assembler
Flowchart

30 Ravi Bhushan Th
Machine code after Pass-2

31 Ravi Bhushan Th
Explanation of Two Pass Assembler
with an example

32 Ravi Bhushan Th
 In this example, there are three types of
statements.
 1. Imperative Statements: Actions to be
performed. e.g. Machine opcodes
 2. Declarative Statements: like pseudo
opcodes. e.g. DS, DC
 3. Assembler Directives: START, ORIGIN
etc.

33 Ravi Bhushan Th
 Let us assume that in this program we have some assembler
directives as:
 START: Directive to place first word of the target program(for
pass 1) e.g. START 100 setting lc to 100
 END: end of source program.
 ORIGIN: setting lc at some address value. e.g. ORIGIN 200,
ORIGIN label+20 etc.
 EQU: Assign address of one symbol to another label, e.g. LABEL
EQU LOOP;
 LTORG: Specifies where literals be placed after LTORG and
END statements(assembler allocates memory to literals from
literal pool and then clears them).
 There is no intermediate code for ORIGIN and EQU.

34 Ravi Bhushan Th
MOT Table
Index Mnemonics Type OP-Code Length
1 MOVER IS 01 1
2 MOVEM IS 02 1
3 ADD IS 03 1
4 SUB IS 04 1
5 MULT IS 05 1
6 DIV IS 06 1
7 BC IS 07 1
8 COMP IS 08 1
9 PRINT IS 09 1
10 READ IS 10 1

35 Ravi Bhushan Th
ASSEMBLER DIRECTIVE Table

Index Mnemonics Type OP-Code Length


1 START AD 01 -
2 END AD 02
3 EQU AD 03
4 ORIGIN AD 04
5 LTORG AD 05

36 Ravi Bhushan Th
POT Table

Index Mnemonic Type OP-Code Length


01 DS DL 06 -
02 DC DL 07 -

37 Ravi Bhushan Th
Register Table
Apart from above three tables there is one
more table which contains the available
registers of the system. With these tables as
prerequisite we can start pass-1 of two pass
assembler.

Register No Name
01 AREG
02 BREG
03 CREG
04 DREG

38 Ravi Bhushan Th
Source Program
START 100
MOVER AREG, A
LOOP: PRINT B
ADD BREG, =‘9’
SUB BREG, D
COMP CREG, =‘23’
LTORG

A DS 3
LABEL EQU LOOP
ORIGIN 500
L1: MULT CREG , =‘7’
SUB BREG, =‘9’
LTORG

B DC 10
MOVEM CREG , =‘7’
PRINT =‘7’
D DC 8
END

39 Ravi Bhushan Th
Generating intermediate code
SOURCE PROGRAM INTERMEDIATE CODE SYMBOL TABLE
START 100 (AD, 01) (C, 100)
MOVER AREG, A 100 (IS, 01) 01 (S, 01) SYM_NO SYMBOL ADDRESS
LOOP: PRINT B 101 (IS, 09) -- (S, 03) 1 A 107
ADD BREG, ='9' 102 (IS, 03) 02 (L, 1) 2 LOOP 101
SUB BREG, D 103 (IS, 04) 02 (S, 04) 3 B 504
COMP CREG, ='23' 104 (IS, 08) 03 (L, 02) 4 D 507
LTORG 105 (AD, 05) 009 5 LABEL 101
106 (AD, 05) 023 6 L1 500
A DS 3 107 (DL, 01) -- 03
LABEL EQU LOOP NO INTERMEDIATE CODE
ORIGIN 500 LITERAL TABLE
L1: MULT CREG, ='7' 500 (IS, 05) 03 (L, 03)
SUB BREG, ='93' 501 (IS, 04) 02 (L, 04) LIT_NO LITERAL ADDRESS
LTORG 502 (AD, 05) -- 007 1 ='9' 105
503 (AD, 05) -- 093 2 ='23' 106
B DC 10 504 (DL, 02) -- 010 3 ='7' 502
MOVEM CREG, ='7' 505 (IS, 02) 03 (L, 05) 4 ='93' 503
PRINT ='7' 506 (IS, 09) -- (L, 05) 5 ='7' 508
D DC 8 507 (DL, 02) -- 008
END 508 (AD, 02) 007

40 Ravi Bhushan Th
Pass -2 of Two Pass Assembler
 After Pass-1 of Two pass assembler now we are ready to move
on to the second pass of the assembling process.

 Main data structures available with us after pass-1 are, symbol


table, Literal table and Intermediate code of the source program
generated after pass-1.

 These three are the prerequisite for the second Pass.

 With the help of these data structures the final target code or
object code will be generated during pass-2.

41 Ravi Bhushan Th
Object Code Generation
INTERMEDIATE CODE OBJECT CODE
(AD, 01) (C, 100) 01-100
100 (IS, 01) 01 (S, 01) 100 01 01 107
101 (IS, 09) -- (S, 03) 101 09 -- 504
102 (IS, 03) 02 (L, 1) 102 03 02 105
103 (IS, 04) 02 (S, 04) 103 04 02 507
104 (IS, 08) 03 (L, 02) 104 08 03 106
105 (AD, 05) 009 105 -- -- 009
106 (AD, 05) 023 106 -- -- 023
107 (DL, 01) -- 03 107 -- -- --
NO INTERMEDIATE CODE

500 (IS, 05) 03 (L, 03) 500 05 03 502


501 (IS, 04) 02 (L, 04) 501 04 02 503
502 (AD, 05) -- 007 502 -- -- 007
503 (AD, 05) -- 093 503 -- -- 093
504 (DL, 02) -- 010 504 -- -- 010
505 (IS, 02) 03 (L, 05) 505 02 03 508
506 (IS, 09) -- (L, 05) 506 09 -- 508
507 (DL, 02) -- 008 507 -- -- 008
508 (AD, 02) 007 508 -- -- 007
42 Ravi Bhushan Th
One-Pass Assemblers
 Unlike two pass assembler, in one pass assembler ,the
assembler scans the input source file only once.
Hence no intermediate code is generated.
 To get rid of forward reference problem, one pass
assembler uses a technique known as Backpatching.
 Like Two pass, here also we have to fill-up Symbol
table and literal table. Apart from these two tables we
need to fill up one more table known as “ Table for
incomplete instructions(TII Table)”
 To understand the working of one pass assembler, let
us have a source program as we have in two pass
assembler.
 As prerequisite we have MOT, POT & Register table
43 Ravi Bhushan Th
One-Pass Assemblers
 Problem Associated
 Forward reference:
 Consider an assembly code extract:
 ….
 X

44 Ravi Bhushan Th
Source Program
START 100
MOVER AREG, A
PRINT B
ADD BREG, =‘9’
SUB BREG, D
COMP CREG, =‘23’
LTORG

A DS 3
LABEL EQU A
ORIGIN 500
L1: MULT CREG , =‘7’
B DC 10
MOVEM CREG , =‘7’
D DC 8
END

45 Ravi Bhushan Th
One-Pass Assemblers
SOURCE PROGRAM TARGET PROGRAM SYMBOL TABLE TII TABLE
START 100 01- 100
MOVER AREG, A 100 01 01 107 SYM_NO SYMBOL ADDRESS LC_NO INCOMPLETE INS.
PRINT B 101 09 -- 501 1 A 107 100 A
ADD BREG, ='9' 102 03 02 105 2 B 501 101 B
SUB BREG, D 103 04 02 503 3 D 503 102 ='9'
COMP CREG, ='23' 104 08 03 106 4 LABEL 107 103 D
LTORG 105 -- -- 9 5 L1 500 104 ='23'
106 -- -- 23 500 ='7'
A DS 3 107 -- -- LITERAL TABLE 502 ='7'
LABEL EQU A
NO CODE
ORIGIN 500 LIT_NO LITERAL ADDRESS
L1: MULT CREG, ='7' 500 05 03 504 1 ='9' 105
B DC 10 501 -- -- 10 2 ='23' 106
MOVEM CREG, ='7' 502 02 03 504 3 ='7' 504
D DC 8 503 -- -- 8
END 504 -- -- 7
46 Ravi Bhushan Th
One-Pass Assemblers
 Main problem
 forward references
 data items
 labels on instructions

 Solution
 data items: require all such areas be defined
before they are referenced
 labels on instructions: no good solution

47 Ravi Bhushan Th
One-Pass Assemblers
 Main Problem
 forward reference
 data items
 labels on instructions

 Two types of one-pass assembler


 load-and-go
 producesobject code directly in memory for
immediate execution
 the other
 produces usual kind of object code for later
execution
48 Ravi Bhushan Th
Forward Reference in One-pass Assembler
 For any symbol that has not yet been
defined
1. omit the address translation
2. insert the symbol into SYMTAB, and mark this
symbol undefined
3. the address that refers to the undefined
symbol is added to a list of forward references
associated with the symbol table entry
4. when the definition for a symbol is
encountered, the proper address for the
symbol is then inserted into any instructions
previous generated according to the forward
reference list

49 Ravi Bhushan Th

Das könnte Ihnen auch gefallen