Sie sind auf Seite 1von 156

SAFFRONY INSTITUTE OF TECHNOLOGY

2150708 System Programming (SP)


Assembler
Q. 1

Describe elements of Assembly language.


An assembly language provides three basic features:
1. Mnemonics operation codes
2. Symbolic operands
3. Data declaration
Let us consider an assembly instruction
MOVER AREG,X

MOVER is a mnemonic opcode for the operation to be performed.

AREG is a register operand in a symbolic form.

X is a memory operand in a symbolic form.

Let us consider another instruction for data declaration X


DS 1

Q. 2

DS(Declare storage) reserves area of memory.

Name of variable is X

It reserves a memory area of 1 word and associates the name X with it

Explain types of Assembly statement


1) Imperative statement
An imperative statement indicates an action to be performed during the execution of the
assembled statement.
Each imperative statement typically translates into one machine instruction.
These are executable statements.
Some example of imperative statement are given below
MOVER BREG,X
STOP
READ X
PRINT Y
ADD AREG,Z
2) Declaration statement
Declaration statements are for reserving memory for variables.
The syntax of declaration statement is as follow:
[Label]

DS

<constant>

[Label]

DC

<value>

DS: stands for Declare storage, DC: stands for Declare constant.
The DS statement reserves area of memory and associates name with them.
A DS 10
Above statement reserves 10 word of memory for variable A.
The DC statement constructs memory words containing constants.
ONE DC 1
Above statement associates the name ONE with a memory word containing the value 1

Prepared By: TEJAS PATEL

Page 1

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler
Any assembly program can use constant in two ways- as immediate operands, and as
literals.
Many machine support immediate operands in machine instruction. Ex: ADD AREG, 5
But hypothetical machine does not support immediate operands as a part of the machine
instruction. It can still handle literals.
A literal is an operand with the syntax=<value>. EX: ADD AREG,=5
It differs from constant because its location cannot be specified in assembly program.
3) Assembler Directive
Assembler directives instruct the assembler to perform certain action during the assembly
program.
I.

START
This directive indicates that first word of machine should be placed in the memory
word with address <constant>.
START <Constant>
Ex: START 500
First word of the target program is stored from memory location 500 onwards.

II.

END
This directive indicates end of the source program.
The operand indicates address of the instruction where the execution of program
should begin.
By default it is first instruction of the program.
END <operand 2>
Execution control should transfer to label given in operand field.

III.

ORIGIN
This directive is like START instruction, which indicates address of the next
consecutive instruction or data.
Format of this statement is as follows ORIGIN
<operand2>
Operand may constant, symbol or symbolic expression.
The ORIGIN directive is useful when the machine code is not stored in consecutive
memory location.

IV.

Sr. no.

Assembly program

START 100

LOOP

LC

MOVER BREG=2

100

MOVER AREG,N

101

ADD AREG=1

102

ORIGIN LOOP

NEXT BC ANY,LOOP

100

EQU
This directive simply associate the name <symbol> with <operand>.where
<operand> may be constant or symbol.

Prepared By: TEJAS PATEL

Page 2

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler
<symbol> EQU <operand2>
Ex: A

EQU B

Address of B is assigned to A in symbol table.


V.

LTORG
This directive allocates memory to all literals of current pool and update literal
table, pool table.
Format of this instruction is as follows
LTORG.
If LTORG statement is not present, literals are placed after the END statement.

Q. 3

Explain assembly scheme.

OR

Explain analysis and synthesis phases of an assembler by clearly stating their tasks. OR
Design specification of an assembler.
Analysis Phase
The primary function performed by the analysis phase is the building of the symbol table.
For this purpose it must determine address of the symbolic name.
It is possible to determine some address directly, however others must be inferred. And
this function is called memory allocation.
To implement memory allocation a data structure called location counter (LC) is used, it is
initialized to the constant specified in the START statement.
We refer the processing involved in maintaining the location counter as LC processing.
Tasks of Analysis phase
1. Isolate the label, mnemonics opcode, and operand fields of a constant.
2. If a label is present, enter the pair (symbol, <LC content>) in a new entry of
symbol table.
3. Check validity of mnemonics opcode.
4. Perform LC processing.

Sourcepr
og.

mnemonics

opcode

length

ADD

01

SUB

02

Analysis
phase

Synthesis
phase

Target
prog.

symbol

address

AGAIN

104

113
Symbol table

Prepared By: TEJAS PATEL

Page 3

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler

Synthesis Phase
Consider the assembly statement,
MOVER

BREG, ONE

We must have following information to synthesize the machine instruction corresponding


to this statement:
1.

Address of name ONE

2.

Machine operation code corresponding to mnemonics MOVER.

The first item of information depends on the source program; hence it must be available
by analysis phase.
The second item of information does not depend on the source program; it depends on the
assembly language.
Based on above discussion, we consider the use of two data structure during synthesis
phase:
1. Symbol table:
Each entry in symbol table has two primary field- name and address. This table is
built by analysis phase
2. Mnemonics table:
An entry in mnemonics table has two primary field- mnemonics and opcode.
Task of Synthesis phase
1. Obtain machine opcode through look up in the mnemonics table.
2. Obtain address of memory operand from the symbol table.
3. Synthesize a machine instruction.

Q. 4

Explain single pass and two pass assembler.

OR

Write difference between one pass and two pass assembler.

OR

Pass structure of assembler.


Two pass translation
Two pass translationsconsist of pass I and pass II.
LC processing is performed in the first pass and symbols defined in the program are
entered into the symbol table, hence first pass performs analysis of the source program.
So, two pass translation of assembly lang. program can handle forward reference easily.
The second pass synthesizes the target form using the address information found in the
symbol table.
First pass constructs an intermediate representation of the source program and that will
be used by second pass.
IR consists of two main components: data structure + IC (intermediate code)
Single pass translation
A one pass assembler requires 1 scan of the source program to generate machine code.
The process of forward references is talked using a process called back patching.
The operand field of an instruction containing forward references is left blank initially.

Prepared By: TEJAS PATEL

Page 4

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler
A table of instruction containing forward references is maintained separately called table of
incomplete instruction (TII).
This table can be used to fill-up the addresses in incomplete instruction.
The address of the forward referenced symbols is put in the blank field with the help of
back patching list.

Q. 5

Explain Data structures of assembler pass I

OR

Explain the role of mnemonic opcode table, symbol table, literal table, and pool table in
assembling process of assembly language program.

OR

Describe following data structures:OPTAB, SYMTAB, LITTAB& POOLTAB.


OPTAB
A table of mnemonics opcode and related information
OPTAB contains the field mnemonics opcodes, class and mnemonics info.
The class field indicates whether the opcode belongs to an imperative statement (IS), a
declaration statement (DS), or an assembler directive (AD).
If an imperative, the mnemonics info field contains the pair (machine code, instruction
length), else it contains the id of a routine to handle the declaration or directive
statement.
Mnemonics
opcode

Mnemonics
Class

info

MOVER

IS

(04,1)

DS

DL

R#7

START

AD

R#11

.
.

SYMTAB
A SYMTAB entry contains the symbol name, field address and length.
Some address can be determining directly, e.g. the address of the first instruction in the
program, however other must be inferred.
To find address of other we must fix the addresses of all program elements preceding it.
This function is called memory allocation.
Symbol

Address

Length

LOOP

202

NEXT

214

LAST

216

217

BACK

202

Prepared By: TEJAS PATEL

218

Page 5

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler

LITTAB
A table of literals used in the program.
A LITTAB entry contains the field literal and address.
The first pass uses LITTAB to collect all literals used in a program.
Awareness of different literal pools is maintained using the auxiliary table POOLTAB.
This table contains the literal number of the starting literal of each literal pool.
At any stage, the current literal pool is the last pool in the LITTAB.
On encountering an LTORG statement (or the END statement), literals in the current pool
are allocated addresses starting with the current value in LC and LC is appropriately
incremented.

Literal no
1

#1

#3

3
POOLTAB

LITTAB

Q. 6

Detail design of two pass assembler.

Pass I
Algorithm for Pass I
1) loc_cntr=0(default value)
pooltab_ptr=1; POOLTAB[1]=1;
littab_ptr=1;
2) While next statement is not END statement
a) If a label is present then
this_label=symbol in label field
Enter (this_label, loc_cntr) in SYMTAB
b) If an LTORG statement then
(i)

Process

literals

LITTAB

to

allocate

memory

and

put

the

address

field.updateloc_cntr accordingly

c)

(ii)

pooltab_ptr= pooltab_ptr+1;

(iii)

POOLTAB[ pooltab_ptr]= littab_ptr

If a START or ORIGIN statement then


loc_cntr=value specified in operand field;

d) If an EQU statement then


(i)

this_address=value specified in <address spec>;

(ii)

Correct the symtab entry for this_label to (this_label, this_address);

e) If a declaration
(i)

Code= code of the declaration statement

(ii)

Size= size of memory area required by DC/DS

Prepared By: TEJAS PATEL

Page 6

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler

f)

(iii)

loc_cntr=loc_cntr+size;

(iv)

Generate IC (DL,code)..

If an imperative statement then


(i)

Code= machine opcode from OPTAB

(ii)

loc_cntr=loc_cntr+instruction length from OPTAB;

(iii)

if operand is a literal then


this_literal=literal in operand field;
LITTAB[littab_ptr]=this_literal;
littab_ptr= littab_ptr +1;
else
this_entry= SYMTAB entry number of operand
generate IC (IS, code)(S, this_entry);

3) (processing END statement)


a) Perform step2(b)
b) Generate IC (AD,02)
c)

Go to pass II

Intermediate code forms:


Intermediate code consist of a set of IC units, each unit consisting of the following three
fields
1. Address
2. Representation of mnemonics opcode
3. Representation of operands
Mnemonics field
The mnemonics field contains a pair of the form
(statement class, code)
Where statement class can be one of IS, DL, and AD standing for imperative statement,
declaration statement and assembler directive respectively.
For imperative statement, code is the instruction opcode in the machine language.
For declarations and assembler directives, code is an ordinal number within the class.
Thus, (AD, 01) stands for assembler directive number 1 which is the directive START.
Codes for various declaration statements and assembler directives.

Declaration statement
DC

01

DS

02

Assembler directive
START

01

END

02

ORIGIN

03

EQU

04

LTORG

05

The information in the mnemonics field is assumed to have the same representation in all
the variants.

Prepared By: TEJAS PATEL

Page 7

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler

Intermediate code for Imperative statement


Variant I
First operand is represented by a single digit number which is a code for a register or the
condition code
Register

Cod

AREG

01

BREG

02

CREG

03

DREG

04

Condition

Code

LT

01

LE

02

EQ

03

GT

04

GE

05

ANY

06

The second operand, which is a memory operand, is represented by a pair of the form
(operand class, code)
Where operand class is one of the C, S and L standing for constant, symbol and literal.
For a constant, the code field contains the internal representation of the constant itself.
Ex: the operand descriptor for the statement START 200 is (C,200).
For a symbol or literal, the code field contains the ordinal number of the operands entry in
SYMTAB or LITTAB.
Variant II
This variant differs from variant I of the intermediate code because in variant II symbols,
condition codes and CPU register are not processed.
So, IC unit will not generate for that during pass I.

LOOP

START

200

(AD,01)

(C, 200)

(AD,01)

(C, 200)

READ

(IS, 09)

(S, 01)

(IS, 09)

AREG, A

(IS, 04)

(1)(S, 01)

(IS, 04)

AREG, A

MOVER

..
SUB

AREG, =1 BC
LOOP

STOP
A

DS

1 LTORG
..

.
GT,

(IS, 02)

(1)(L, 01)

(IS, 02)

AREG,(L, 01)

(IS, 07)

(4)(S, 02)

(IS, 07)

GT, LOOP

(IS, 00)
(DL, 02)

(IS, 00)
(C,1)

(AD, 05)
Variant I

Prepared By: TEJAS PATEL

(DL, 02)

(C,1)

(AD, 05)
Variant II

Page 8

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler

Comparison of the variants


Variant I

Variant II

Extra work in pass I

Extra work in pass II

Simplifies tasks in pass II

Simplifies tasks in pass I

Occupies more memory then pass II

Memory utilization of two passes get


better balanced.

Pass II(Algorithm)
It has been assumed that the target code is to be assembled in the are named code_area.
1. Code_area_adress= address of code_ares;
Pooltab_ptr=1;
Loc_cntr=0;
2. While next statement is not an END statement
a) Clear machine_code_buffer;
b) If an LTORG statement
i)

Process

literals

in

LITTAB

and

assemble

the

literals

in

machine_code_buffer.

c)

ii)

Size= size of memory area required for literals

iii)

Pooltab_ptr=pooltab_ptr +1;

If a START or ORIGIN statement


i)

Loc_cntr=value specified in operand field;

ii)

Size=0;

d) If a declaration statement
i)

If a DC statement then assemble the constatnt in machine_code_buffer;

ii)

Size= size of memory area required by DC/DS;

e) If an imperative statement

f)

i)

Get operand address from SYMTAB or LITTAB

ii)

Assemble instruction in machine_code_buffer;

iii)

Size=size of instruction;

If size 0 then
i)

Move

contents

of

machine_code_buffer

code_area_address+loc_cntr;
ii)

Loc_cntr=loc_cntr+size;

3. Processing end statement


a) Perform steps 2(b) and 2(f)
b) Write code_area into output file.

to

the

address

Prepared By: TEJAS PATEL

Page 9

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler

Q. 7Explain error reporting of assembler.


Error reporting in pass I
Listing an error in first pass has the advantage that source program need not be preserved
till pass II
But, listing produced in pass I can only reports certain errors not all.
From the below program, error is detected at statement 9 and 21.
Statement 9 gives invalid opcode error because MVER does not matchwith any mnemonics
in OPTAB.
Statement 21 gives duplicate defination error because entry of A is already exist in symbol
table.
Undefined symbol B at statement 10 is harder to detect during pass I, this error can be
detected only after completing pass I.
Sr.no

Statements

START 200

MOVER AREG,A

200

MVER BREG, A

207

Address

**ERROR* Invalid opcode


10

ADD BREG, B

208

14

A DS 1

209

21

A DC 5

227

**ERROR**

dulicate

defination

of

symbol

in

symbol A
.
.
35

END
**ERROR**

undefined

statememt 10

Error reporting in pass II


During pass II data structure like SYMTAB is availble.

Error indication at statement 10 is also easy because symbol table is searched for an entry
B. if match is not found, error is reported.

Prepared By: TEJAS PATEL

Page 10

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler

Q. 8

Write N! program and its equivalent machine code.

1
2
3
4
5
6
7
12
13
14
15
16
17
18
19
20
21
22
Q.9

START
READ
MOVER
MOVEM
MULT
MOVER
ADD
MOVEM
COMP
BC
MOVEM

AGAIN

101
N
BREG, ONE
BREG, TERM
BREG,TERM
CREG, TERM
CREG, ONE
CREG, TERM
CREG, N
LE, AGAIN
BREG,
RESULT
RESULT

PRINT
STOP
DS
DS
DC
DS
END

N
RESULT
ONE
TERM

1
1
1
1

Opcode
(2
digit)

Register
operand (1
digit)

Memory
operand
(3 digit)

101)
102)
103)
104)
105)
106)
107)
108)
109)
110)

09
04
05
03
04
01
05
06
07
05

0
2
2
2
3
3
3
3
2
2

113
115
116
116
116
115
116
113
104
114

111)
112)
113)
114)
115)
116)

10
00

0
0

114
000

00

001

Generate intermediate code and symbol table for following programs


Program-1

START

100

READ

READ

READ

MOVER

AREG,A

ADD

AREG,B

ADD

AREG,C

MULT

AREG,C

MOVEM

AREG,RESULT

PRINT

RESULT

STOP
A

DS

DS

DS

RESULT

DS

END

Prepared By: TEJAS PATEL

Page 11

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler

Program-1 IC in variant-I
(AD,01)

(C,100)

(IS,09)

(S,01)

(IS,09)

(S,02)

(IS,09)

(S,03)

(IS,04)

(01)(S,01)

(IS,01)

(01)(S,02)

(IS,01)

(01)(S,03)

(IS,03)

(01)(S,03)

(IS,05)

(01)(S,04)

(IS,10)

(S,04)

(IS,00)
(DL,02)

(C,01)

(DL,02)

(C,01)

(DL,02)

(C,01)

(DL,02)

(C,01)

(AD,02)

Program-1 Symbol table

Symbol

Address

111

112

113

RESULT

114

Program-2

Program-2 symbol table

START

101

READ

READ

MOVER

BREG,A

MULT

BREG,B

MOVEM

BREG,D

STOP
A

DS

DS

DS

END

Symbol

Address

108

109

110

Program-2 Variant-I

Program-2 Variant-II

(AD,01)

(C,101)

(AD,01)

(C,101)

(IS,09)

(S,01)

(IS,09)

(IS,09)

(S,02)

(IS,09)

Prepared By: TEJAS PATEL

Page 12

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Assembler
(IS,04)

(2)(S,01)

(IS,04)

BREG,A

(IS,03)

(2)(S,02)

(IS,03)

BREG,B

(IS,05)

(2)(S,03)

(IS,05)

BREG,D

(IS,00)

(IS,00)

(DL,02)

(C,01)

(DL,02)

(C,01)

(DL,02)

(C,01)

(DL,02)

(C,01)

(DL,02)

(C,01)

(DL,02)

(C,01)

(AD,02)

(AD,02)

Prepared By: TEJAS PATEL

Page 13

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Q.1

Compiler

List out aspects of compilation and its implementation issue.

Two aspects of compilation are:

a) Generate code to implement meaning of a source program in the execution domain (target
code generation)
b) Provide diagnostics for violations of PL semantics in a program (Error reporting)

There are four issue involved in implementing these aspects(Q. What are the issue in
code generation in relation to compilation of expression? Explain each issue in
brief. (June-13 GTU))
1.

Data types : semantics of a data type require a compiler to ensure that variable of
a type are assigned or manipulated only through legal operation
Compiler must generate type specific code to implement an operation.

2.

Data structures: to compile a reference to an element of a data structure, the


compiler must develop a memory mapping to access the memory word allocated to
the element.

3.

Scope rules: compiler performs operation called scope analysis and name resolution
to determine the data item designated by the use of a name in the source program

4.

Control structure: control structure includes conditional transfer of control,


conditional execution, iteration control and procedure calls. The compiler must
ensure that source program does not violate the semantics of control structures.

Issues in design of code generator are:


1. Input to the Code Generator
input to the code generator consists of the intermediate representation of the source
program
There are several types for the intermediate language, such as postfix notation,
quadruples, and syntax trees or DAGs.
The detection of semantic error should be done before submitting the input to the code
generator
The code generation phase require complete error free intermediate code as an input.
2. Target program
The output of the code generator is the target program. The output may take on a variety
of forms: absolute machine language, relocatable machine language, or assembly
language.
Producing an absolute machine language program as output has the advantage that it can
be placed in a location in memory and immediately executed.
Producing a relocatable machine language program as output is that the subroutine can be
compiled separately. A set of relocatable object modules can be linked together and loaded
for execution by a linking loader.
Producing an assembly language program as output makes the process of code generation
somewhat easier .We can generate symbolic instructions and use the macro facilities of the
assembler to help generate code

TEJAS PATEL

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

3. Memory management
Mapping names in the source program to addresses of data objects in run time memory is
done cooperatively by the front end and the code generator. We assume that a name in a
three-address statement refers to a symbol table entry for the name.
4. Instruction selection
If we do not care about the efficiency of the target program, instruction selection is
straightforward. It requires special handling. For example, the sequence of statements
a := b + c
d := a + e
would be translated into
MOV

b, R0

ADD

c, R0

MOV

R0, a

MOV

a, R0

ADD

e, R0

MOV

R0, d

Here the fourth statement is redundant, so we can eliminate that statement.


5. Register allocation
If the instruction contains register operands then such a use becomes shorter and faster
than that of using in memory.
The use of registers is often subdivided into two sub problems:
During register allocation, we select the set of variables that will reside in registers at a
point in the program.
During a subsequent register assignment phase, we pick the specific register that a
variable will reside in.
6. Choice of evaluation
The order in which computations are performed can affect the efficiency of the target code.
Some computation orders require fewer registers to hold intermediate results than others.
Picking a best order is another difficult, NP-complete problem
7. Approaches to code generation
The most important criterion for a code generator is that it produces correct code.
Correctness takes on special significance because of the number of special cases that code
generator must face.
Given the premium on correctness, designing a code generator so it can be easily
implemented, tested, and maintained is an important design goal.

TEJAS PATEL

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Q.2

Compiler

What is Memory binding? Explain types of memory allocation.

Memory Binding:

A memory binding is an association between the 'memory address'

attribute of a data item and the address of a memory area.

Three important tasks of memory allocation are:

1. Determine the amount of memory required to represent the value of a data item.
2.

Use an appropriate memory allocation model to implement the lifetimes and scopes of data
items.

3.

Determine appropriate memory mappings to access the values in a non scalar data item,
e.g. values in an array.

Memory allocation are mainly divides into two types:


1.

Static binding

2.

Dynamic binding

Static memory allocation


In static memory allocation, memory is allocated to a variable before the execution of a
program begins.
Static memory allocation is typically performed during compilation.
No memory allocation or deallocation actions are performed during the execution of a
program. Thus, variables remain permanently allocated
Dynamic memory allocation
In dynamic memory allocation, memory bindings are established and destroyed during the
execution of a program
Dynamic memory allocation has two flavors-automatic allocation and program controlled
allocation.
In automatic dynamic allocation, memory is allocated to the variables declared in a
program unit when the program unit is entered during execution and is deallocated when
the program unit is exit. Thus the same memory area may be used for the variables of
different program units
In program controlled dynamic allocation, a program can allocate or deallocate memory at
arbitrary points during its execution.
It is obvious that in both automatic and program controlled allocation, address of the
memory area allocated to a program unit cannot be determined at compilation time
Dynamic memory allocation techniques
1. Explicit deallocation

Explicit Allocation of Fixed Sized Blocks

It is the simplest form of dynamic allocation.

By linking the blocks in a list allocation and deallocation can be done quickly with little or
no storage overhead.

Initialization of the area is done by using a portion of each block for a link to the next
block. Pointer available points to the first block.

Allocation consists of taking a block off the list and deallocation consists of putting the
block back on the list. We can treat each block as a variant record.

TEJAS PATEL
3

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

There is no space overhead because the user program can use the entire block for its own
purposes.

When the block is de-allocated then the compiler routine uses some of the space from the
block itself to link it into the list of available blocks.
Explicit Allocation of Variable-Sized Blocks

When blocks are allocated and de-allocated storage can become fragmented that is the
heap may consists of alternate blocks that are free and in use.

Fragmentation will not occur if blocks are of fixed size, but if they are of variable-size then
it occurs.

One method for allocating variable sized blocks is first fit method. When a block of size s if
allocated we search for the first free block that is of size f s (where f - size of free block).
This block is then subdivided into a used block of size s and a free block of size (f - s).
Because of that it incurs a time overhead as we have to search for a free block that is
large enough.

When a block is de-allocated, we check to see if it is next to a free block. If possible the
de-allocated block is combined with a free block next to it to create a larger free block.

Combining a adjacent free blocks into a larger free block prevent further fragmentation

from occurring.
2. Implicit De-allocation

Implicit de-allocation requires cooperation between the user program and the run-time
package, because run time package needs to know when a storage block is no longer in
use.

This cooperation is implemented by fixing the format of storage blocks.

The first problem is that of recognizing block boundaries. If the size of blocks 75 fixed,
then position information can be used.

For example if each block occupies 20 words then a new block begins every 20 words.
Otherwise in the inaccessible storage attached to a block we keep the size of a block. So
we can determine where the next block begins.

The second problem it that of recognizing if a block is in use we assume that a block is in
use if it is possible for the user program to refer to the information in the block.

The reference may occur through a pointer or after following a sequence of pointers, so the
compiler needs to know the position in storage of all pointers.

Two approaches can be used for implicit de-allocation

Reference counts

We keep track of the number of blocks that point directly to the present block. If this count
ever drops to 0 then the block can be de-allocated because it cannot be referred to i.e. the
block has become garbage that can be collected. Maintaining the reference counts can be
costly. Reference counts are best used when pointer between blocks never appear in cycles

Marking Techniques

An alternative approach is to suspend temporarily execution of the user program and use
the frozen pointers to determine which blocks are in use. This approach requires all the
pointers into the heap to be known.

Conceptually we pour paint into the heap through these pointers. Any block that is reached

TEJAS PATEL
4

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

by the paint is in use and the rest can be de-allocated.

In more detail, we go through the heap and mark all blocks unused. Then we follow
pointers marking as used any block that is reached in the process. A final sequential scan
of the heap allows all blocks still marked unused to be allocated.

Q.3

Explain memory allocation in block structured language.


The block is a sequence of statements containing the local data and declarations which are
enclosed within the delimiters.
Ex:
A
{
Statements
..
}
The delimiters mark the beginning and the end of the block. There can be nested blocks for
ex: block B2 can be completely defined within the block B1.
Finding the scope of the variable means checking the visibility within the block
Following are the rules used to determine the scope of the variable:
1.

Variable X is accessed within the block B1 if it can be accessed by any statement situated
in block B1.

2. Variable X is accessed by any statement in block B2 and block B2 is situated in block B1.
There are two types of variable situated in the block structured language
1. Local variable
2. Non local variable
To understand local and non local variable consider the following example
Procedure A
{
Intx,y,z
Procedure B
{
Inta,b
}
Procedure C
{
Intm,n
}
}
Procedure

Local variables

Non local variables

x,y,z

a,b

x,y,z

m,n

x,y,z

Variables x,y and z are local variables to procedure A but those are non local to block B and

TEJAS PATEL

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

C because these variable are not defined locally within the block B and C but are accessible
within these blocks.

Q.4

Explain activation record.


The activation record is a block of memory used for managing information needed by a
single execution of a procedure.
Return value
Actual parameter

Control link
Access link
Saved M/c status

Local variables
Temporaries

1. Temporary values: The temporary variables are needed during the evaluation of
expressions. Such variables are stored in the temporary field of activation record.
2. Local variables: The local data is a data that is local to the execution procedure is stored
in this field of activation record.
3. Saved machine registers: This field holds the information regarding the status of machine
just before the procedure is called. This field contains the registers and program counter.
4.

Control link: This field is optional. It points to the activation record of the calling
procedure. This link is also called dynamic link.

5. Access link: This field is also optional. It refers to the non-local data in other activation
record. This field is also called static link field.
6. Actual parameters: This field holds the information about the actual parameters. These
actual parameters are passed to the called procedure.
7. Return values: This field is used to store the result of a function call.

Q.5

What is side effect? Explain parameter passing methods.


Side effect: A side effect of a function call is a change in a value of a variable which is not local
to the called function.
Parameter passing mechanism
1. Call by value:

This is the simplest method of parameter passing.

The actual parameters are evaluated and their values are passed to caller procedure(formal
parameter).

TEJAS PATEL
6

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

The operations on formal parameters do not change the values of a parameter.

Example: Languages like C, C++ use actual parameter passing method

Compiler

2. Call by value name

This extends the capability of the call by value mechanism by copying the value of formal
parameter back to corresponding actual parameter at return

Thus side effect realize at return.

3. Call by reference :

This method is also called as call by address or call by location

The address of actual parameter is passed to the formal parameter.

4. Call by name:

This is less popular method of parameter passing.

Procedure is treated like macro. The procedure body is substituted for call in caller with
actual parameters substituted for formals.

Q.6

The actual parameters can be surrounded by parenthesis to preserve their integrity.

The local names of called procedure and names of calling procedure are distinct

Explain operand descriptor and register descriptor with example


An operand descriptor has the following fields:
1. Attributes: Contains the subfields type, length and miscellaneous information
2. Addressability: Specifies where the operand is located, and how it can be accessed. It has
two subfields
Addressability code: Takes the values 'M' (operand is in memory), and 'R' (operand is in
register). Other addressability codes, e.g. address in register ('AR') and address in memory
('AM'), are also possible,
Address: Address of a CPU register or memory word.
Ex: a*b
MOVER AREG, A
MULT AREG, B
Three operand descriptors are used during code generation. Assuming a, b to be integers
occupying 1 memory word, these are:

Register descriptors

Attribute

Addressability

(int, 1)

Address(a)

(int, 1)

Address(b)

(int, 1)

Address(AREG)
A register descriptor has two fields
1. Status: Contains the code free or occupied to indicate register status.
2.

Operand descriptor #: If status = occupied, this field contains the descriptorfor the operand
contained in the register.

Register descriptors are stored in an array called Register_descriptor. One register


descriptor exists for each CPU register.

TEJAS PATEL
7

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

In above Example the register descriptor for AREG after generating code for a*b would be
Occupied

#3

This indicates that register AREG contains the operand described by descriptor #3.

Q.7

Explain intermediate codes for an expression


There are two types of intermediate representation
1. Postfix notation
2. Three address code.
1) Postfix notation

Postfix notation is a linearized representation of a syntax tree.

it a list of nodes of the tree in which a node appears immediately after its children

the postfix notation of x=-a*b + -a*b will be


x a b * a-b*+=

2) Three address code

In three address code form at the most three addresses are used to represent
statement. The general form of three address code representation is -a:= b op c

Wherea,b or c are the operands that can be names, constants.

For the expression like a = b+c+d the three address code will be
t1=b+c
t2=t1+d

Here t1 and t2 are the temporary names generated by the compiler. There are most three
addresses allowed. Hence, this representation is three-address code.

Q.8

Explain implementation of three address code

There are three representations used for three code such as quadruples, triples and
indirect triples.

Quadruple representation
The quadruple is a structure with at the most tour fields such as op,arg1,arg2 and result.
The op field is used to represent the internal code for operator, the arg1 and arg2
represent the two operands used and result field is used to store the result of an
expression.

Consider the input statement x:= -a*b + -a*b

t1=uminus a

(0)

t2 := t1 * b

(1)

:t3= - a

(2)

Op

Arg1

uminus

*
uminus

t1
a

Arg2

result
t1

t2
t3

TEJAS PATEL
8

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

t4 := t3 * b

(3)

t3

t4

t5 := t2 + t4

(4)

t2

t4

t5

x= t5

(5)

:=

t5

Triples

The triple representation the use of temporary variables is avoided by referring the pointers
in the symbol table.

the expression x : = - a * b

- a * b the triple representation is as given below

Number

Op

Arg1

(0)

uminus

(1)

(2)

uminus

(3)

(2)

(4)

(1)

(3)

(5)

:=

(4)

Indirect Triples

The indirect triple representation the listing of triples is been done. And listing pointers
are used instead of using statements.
Number

Op

Arg1

(0)

uminus

(1)

(2)

uminus

(3)

(13)

(4)

(12)

(5)

:=

(11)

Ar

(0)

Arg2

Statement

(0)

(11)

(1)

(12)

(2)

(13)

(3)

(14)

(4)

(15)

(5)

(16)

Q.9

Explain code optimization methods


I.

Compile Time Evaluation

Compile time evaluation means shifting of computations from run time to compilation.

There are two methods used to obtain the compile time evaluation.

TEJAS PATEL
9

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

1. Folding

In the folding technique the computation of constant is done at compile time instead of
run time.

example : length = (22/7) * d

Here folding is implied by performing the computation of 22/7 at compile time

2. Constant propagation

In this technique the value of variable is replaced and computation of an expression is


done at the compilation time.
example :pi = 3.14; r = 5;
Area = pi * r * r

Here at the compilation time the value of pi is replaced by 3.14 and r by 5 then
computation of 3.14 * 5 * 5 is done during compilation.

II.

Common Sub Expression Elimination


The common sub expression is an expression appearing repeatedly in the program which
is computed previously.

Then if the operands of this sub expression do not get changed at all then result of such
sub expression is used instead of recomputing it each time

Example:
t1 := 4 * i
t2 := a[t1]
t3 := 4 * j
t4 : = 4 * i
t5:= n
t6 := b[t4]+t5

The above code can be optimized using common sub expression elimination
t1=4*i
t2=a[t1]
t3=4*j
t5=n
t6=b[t1]+t5

The common sub expression t4:= 4 * i is eliminated as its computation is already in t1 and
value of i is not been changed from definition to use.
}

III.

Loop invariant computation (Frequency reduction)


Loop invariant optimization can be obtained by moving some amount of code outside the
loop and placing it just before entering in the loop.

This method is also called code motion.

TEJAS PATEL
10

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

Example:

while(i<=max-1)
{
sum=sum+a[i];
}
Can be optimized as a
N=max-1;
While(i<=N)
{ sum=sum+a[i
]; }
IV.

Strength Reduction

Strength of certain operators is higher than others.

For instance strength of * is higher than +.

In this technique the higher strength operators can be replaced by lower strength
operators.

Example:
for(i=1;i<=50;i++)
{
count = i x 7;
}

Here we get the count values as 7, 14, 21 and so on up to less than 50.

This code can be replaced by using strength reduction as follows


temp=7
for(i=l;i<=50;i++)
{
count = temp;
temp = temp+7;
}

V.

Dead Code Elimination

A variable is said to be live in a program if the value contained into is subsequently.

On the other hand, the variable is said to be dead at a point in a program if the value
contained into it is never been used. The code containing such a variable supposed to
be a dead code. And an optimization can be performed by eliminating such a dead code.

Example :
i=0;
if(i==1)
{

TEJAS PATEL

Page

11

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

a=x+5;
}

if statement is a dead code as this condition will never get satisfied hence, statement can
be eliminated and optimization can be done.

VI.

Code Motion

The aim to improve the execution time of the program by reducing the evaluation
frequency of expressions.

Evaluation of expressions is moved from one part of the program to another in such a way
that it is evaluated lesser frequently.

Loops are usually executed several times.

We can bring the loop-invariant statements out of the loop.

Example:
a = 200;
while (a

> 0)

{
b = x + y;
if ( a%b == 0)
printf (%d, a);
}

The statement b = x + y is executed every time with the loop. But because it is loop
invariant,

We can bring it outside the loop.

It will then be executed only once.

a = 200;
b = x + y;
while
(a

> 0)

{
if ( a%b == 0)
printf (%d, a);
}

Q.10

Define the following terms


1) Static pointers: Access to a non local variable can be done using one reserved pointer
called static pointer.
2) Display: To speed up the access to non local variable an array of pointer is maintained
such array is called display.
3) Optimizing transformation: It is a rule of rewriting a segment of a program to
improve its execution efficiency without affecting its meaning.
4) Local optimization: the optimizing transformation are applied over small segments of
a program consisting of a few statements

TEJAS PATEL
12

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

5) Global optimization: The optimizing transformations are applied over a program unit.
6) Basic block: basic block is sequence of consecutive statements in which flow of
control enters at the beginning and leaves at the end without halt or branching.

Q.11

Explain control flow property


Program flow graph: a program flow graph for a program P is a directed graph Gp=(N,
E, n0) where
N: set of basic block in P
E: set of directed edges (bi,bj) indicating the possibility of control flow from the last
statement of bi to first statement of bj
n0: start node of P.
following are the property of control flow:
1) Predecessor and successor: If (bi, bj) E, bi is a predecessors of bj and bj is a
successors of bi.
2) Paths: A path is a sequence of edges such that the destination node of one edge is the
source node of the following edge.
3) Ancestor and descendants: If a path exists from bi to bj, bi is an ancestor of bj and
bj is a descendant of bi.
4) Dominators and post dominator: Block bi is dominator of block bj if every path from
n0 to bj passes through bi.
bi is post dominator of bj if every path from bj to exit node passes through bi.

Q.12

Explain Data flow property


Data Flow Properties

Before discussing the data flow properties consider some basic terminologies that be
used while giving the data flow property.

A program point containing the definition is called definition point.

A program point at which a reference to a data item is made is called reference point.

A program point at which some evaluating expression is given is called evaluation point.

For example :

Definition point
W1:x=3
Reference point
W2: y=x
Evaluation point
W3: z=a*b

I.

Available expression

TEJAS PATEL
13

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

An expression x+y is available at a program point w if and only if along all paths are
reaching to w.
1.

The expression x+y is said to be available at its evaluation point.

2.

The expression x+y is said to be available if no definition of any operand of the


expression (here either x or y) follows its last evaluation along the path. In other
word, if neither of the two operands get modified before their use.
B1: t1=4*i

B3: t2=4*i

B2:
t2:c+d[t1]

B4: t4=a[t2]

Expression 4 * i is the available expression for B2, B3 and B4 because this expression is not
been changed by any of the block before appearing in B4.

II.

Reaching definition
A definition D reaches at the point P if there is a path from D to P if there is a path from
D to P along witch D is not killed.

A definition D of variable x is killed when there is a redefinition of x.

The definition D1 is reaching definition for block B2, but the definition D1 not is reaching
definition for block B3, because it is killed by definition D2 in block B2.

III.

Q.13

Live variable

A live variable x is live


from p to the exit,

at point p if there is a path


along which the value of x is

used before it is
variable is said to be

redefined. Otherwise the


dead at the point.

Write a short note on

Interpreter.

An interpreter is a language processor which bridges an execution gap without generating a


machine language program.

Main component of interpreters are

Data store

TEJAS PATEL
14

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Compiler

Symbol table
Data manipulation routine
Types of interpreter
1) Pure interpreter
Data
Source
Program

Interpreter

Result

2) Impure interpreter
Data

Source
Program

Interpreter

IR
Interpreter

Result

It is a program that executes instructions written in a high-level language.

A high-level programming language translator that translates and runs the program at the
same time.

It converts one program statement into machine language, executes it, and then proceeds
to the next statement. This differs from regular executable programs that are presented to
the computer as binary-coded instructions.

Interpreted programs remain in the source language the programmer wrote in, which is
human readable text.

Interpreters are not much different than compilers. They also convert the high level
language into machine readable binary equivalents.

Each time when an interpreter gets a high level language code to be executed, it converts
the code into an intermediate code before converting it into the machine code.

Each part of the code is interpreted and then execute separately in a sequence and an error
is found in a part of the code it will stop the interpretation of the code without translating
the next set of the codes.

The advantage of an interpreter, however, is that it does not need to go through the
compilation stage during which machine instructions are generated.

This process can be time-consuming if the program is long. The interpreter, on the other
hand, can immediately execute high-level programs.

For this reason, interpreters are sometimes used during the development of a program,
when a programmer wants to add small sections at a time and test them quickly.

Interpreter characteristics:

TEJAS PATEL
15

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Q.14

Compiler

Relatively little time is spent analyzing and processing the program.

The resulting code is some sort of intermediate code.

The resulting code is interpreted by another program.

Program execution is relatively slow.

Compare one pass and two pass of compiler.


One pass compiler
A single pass compiler makes a single pass over the source text, parsing, analyzing and
generating code all at once.
A one pass assembler passes over the source file exactly once, in the same pass collecting
the labels, resolving future references and doing the actual assembly.
The difficult part is to resolve future label references and assemble code in one pass.
One-pass compiler does not "look back" at code it previously processed.
It is also called narrow compiler.
A one-pass compiler is faster than two-pass compilers.
Unable to generate efficient program because of compiler has limited scope.
Languages like PASCAL can be implemented by one-pass compiler.

Source
code

Compiler

Machine
code

Errors
Two pass compiler
A two pass assembler does two passes over the source file (the second pass can be over a file
generated in the first pass).
In the first pass all it does is looks for label definitions and introduces them in the symbol table.
In the second pass, after the symbol table is complete, it does the actual assembly by translating

the operations and so on.


Compilers use an Intermediate Representation (IR) for the program being compiled
Two-pass compiler has wide scope of passes.
Each pass takes the result of theprevious pass as the input, and creates an intermediate
output.
It is also called wide compiler.
Languages like C++ can be implemented by two-pass compiler.

TEJAS PATEL
16

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Source
code

Front
end

IR

Compiler
Back
end

Machine
code
Errors

TEJAS PATEL

Page

17

SAFFRONY INSTITUTE OF TECHNOLOGY


Language

2150708 System Programming (SP)

Processor

Q.1

Explain following terms.


Semantic: It represents the rules of the meaning of the domain.
Semantic gap: It represents the difference between the semantic of two domains.
Application domain: The designer expresses the ideas in terms related to application domain of
the software
Execution domain: To implement the ideas of designer, their description has to be interpreted
in terms related to the execution domain of computer system.
Specification gap: The gap between application and PL domain is called specification and
design gap.
Execution gap: The gap between the semantic of programs written in different programming
language.
Specification

Execution

gap

gap

Application
Domain

PL Domain

Execution
Domain

Language processor: Language processor is software which bridges a specification or execution


gap.
Language translator: Language translator bridges an execution to the machine language of a
computer system.
Detranslator: It bridges the same execution gap as language translator, but in the reverse
direction.
Preprocessor: It is a language processor which bridges an execution gap but is not a language
translator
Language migrator: It bridges the specification gap between two programming languages.
Interpreter: An interpreter is a language processor which bridges an execution gap without
generating a machine language program.
Source language: There are more than thousand high level languages; they are called as high
level languages.
Target language: Each machine has its own machine language; they are called as target
languages.
Problem oriented language: Programming language features directly model aspects of the
application domain, which leads to very small specification gap. Such a programming language

can only be used for specific application; hence they are called problem oriented languages.
Procedure oriented language: Procedure oriented language provides general purpose facilities
required in most application domains. Such a language is independent of specific application
domains and results in a large specification gap which has to be bridged by an application
designer.

SAFFRONY INSTITUTE OF TECHNOLOGY


Language

2150708 System Programming (SP)

Processor
Q.2

Explain Language processing activity


There are mainly two types of language processing activity which bridges the semantic gap
between source language and target language.
1. Program generation activities
2. Program execution activities
Program generation
A program generation activity aims an automatic generation of a program.
Program generator is software, which aspects source program and generates a program in target
language.
Program generator introduces a new domain between the application and programming language
domain is called program generator domain.

Errors

Program
specification

Program
generator

Target
Program

Program Execution
Two popular models for program execution are translation and interpretation.
Translation
The program translation model bridges the execution gap by translating a program written in PL,
called source program, into an equivalent program in machine or assembly language of the
computer system, called target program.

Errors

Source
program

Translator

Data

M/c language
program

Target
program

Interpretation
The interpreter reads the source program and stores it in its memory.
The CPU uses the program counter (PC) to note the address of the next instruction to be
executed.
The statement would be subjected to the interpretation cycle, which could consist the following
steps:
1.
2.
3.

TEJAS PATEL

Fetch the
instruction
Analyse the
statement and
determine its

meaning, the computation to be performed and its operand.


Execute the meaning of the statement.

Page
2

SAFFRONY INSTITUTE OF TECHNOLOGY


Language

2150708 System Programming (SP)

Processor

Interpreter

Memory
Source
prog.
+
Data

PC

Error

Q.3

What is phases and passes of compiler?


Language processing= analysis of SP + synthesis of TP.
Each language processor consists of mainly two phases
1. Analysis phase
2. Synthesis phase
Analysis phase uses each component of source language to determine relevant information
concerning a statement in the source statement. Thus, analysis of source statement consists of
lexical, syntax and semantic analysis.(Front end)
While, synthesis phase is concerned with the construction of target language. It includes mainly
two activities memory allocation and code generation.(Back end)

Language Processor
Source
Program

Analysis
phase

Errors

Synthesis
phase

Target
Program

Errors

If language processing can be performed on statement by statement basis- that is, analysis of
source statement cab be immediately followed by synthesis of equivalent target statement. This
may not be feasible due to:
Forward reference: a forward reference of a program entity is a reference to the entity which
precedes its definition in the program.
This problem can be solved by postponing the generation of target code until more information
concerning the entity becomes available.
It leads to multipass model of language processing.
Language processor pass: a language processor pass is the processing of every statement in a
source program, to perform language processing function.
In Pass I: Perform analysis of the source program and note relevant information
In Pass II:

It once again analyses the source program to generate target code using type

information noted in pass I.


The language processor performs certain processing more than once.
This can be avoided using an intermediate representation (IR) of the source program

TEJAS PATEL
3

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Language

2150708 System Programming (SP)

Processor
An intermediate representation is a representation of a source program which reflects the effect
of some, but not all, analysis and synthesis task performed during language processing.

Source
Program

Front End

Target
Program

Back End

Intermediate
representation (IR)

Phases of Language processor (Toy compiler)


Lexical Analysis (Scanning)
Lexical analysis identifies the lexical unit in a source statement. Then it classifies the units into
different lexical classes. E.g. ids, constants, keyword etc...And enters then into different tables.
This classification
language.

may be based on the nature of a string or on the specification of the source

(For example, while an integer constant is a string of digits with an optional sign,

a reserved id is an id whose name matches one of the reserved names mentioned in the
language specification.)
Lexical analysis builds a descriptor, called a token. We represent token as
Consider following code

code#no

i: integer;
a,b: real;
a=b+i;
The statement a:b+i is represented as a string of token
a

Id#1

Op#1

Id#2

Op#2

Id#3

Syntax analysis (parsing)


Syntax analysis processes the string of token to determine statement class and also check
whether given statement is syntax wise valid or not.
It then builds an IC which represents the structure of the statement.
semantic analysis to determine the meaning of the statement.

The IC is passed to

Semantic Analysis
Semantic

analysis of declaration

imperative statements.

statements

differs from the semantic

The former results in addition of information

Type, length and dimensionality

of variables.

analysis of

to the symbol table, e.g.

TEJAS PATEL
4

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Language

2150708 System Programming (SP)

Processor
The letter identifies the sequence of actions necessary to implement the meaning of a source
statement.
In both cases the structure of a source statement guides the application of the semantic rules.
When semantic analysis determines the meaning of a sub tree in the IC, it adds information to a
table or adds an action to the sequence of actions.
It then modifies the IE to enable further semantic analysis. The analysis ends when the tree has
been completely processed. The updated tables and the sequence of actions constitute the IR
produced by the analysis phase.
It adds information to a table or adds action to the sequence of actions.
The analysis ends when the tree has been completely processed.
=

a, real

a, real

+
a,real

temp,real
b, real

b,real

i, int

i*,real

Intermediate representation
IR contains intermediate code and table.
Symbol table
symbol

Type

int

real

real

i*

real

temp

real

length

address

Intermediate code
1. Convert(id1#1) to real, giving (id#4)
2. Add(id#4) to (id#3), giving (id#5)
3. Store (id#5) in (id#2)
Memory allocation
Memory allocation

is a simple task given the presence of the symbol table. The memory

requirement of an identifier is computed from its type, length and dimensionality and memory is
allocated to it.
The address of the memory area is entered in the symbol table. After memory allocation,

the

symbol table looks as shown below.


Symbol

Type

int

2000

real

2001

real

2002

TEJAS PATEL
5

length

address

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Language

2150708 System Programming (SP)

Processor
Code generation
the synthesis phase may decide to hold the value of i* and temp in machine registers and may
generate the assembly code

Q.4

CONV_R

AREG, I

ADD_R

AREG, B

MOVEM

AREG, A

Explain following terms


Formal language: A language L can be considered to be a collection of valid sentences. Each
sentence consists of sequence of words, and each word as a sequence of letter or graphic
symbols acceptable in L. a language specified in this manner is known as a formal language.
Terminal symbol: The alphabet of L, denoted by Greek symbol , is the collection of symbols in
character set. Lower case letters a,b,c is used to denote symbols in . A symbol in the alphabet
is known as terminal symbol of L.
Nonterminal symbols: A nonterminal symbol is the name of a syntax category of a language.
An NT written as a single capital letter or as a name enclosed between <>
Production: a production is also called a rewriting rule, is a rule of the grammar. A production
has the form
Nonterminal symbol := string of Ts and NTs
Ex: <article>:= a | an | the
Grammar: a grammar G of a language LG is a quadruple ( , SNT, S, P) where
= alphabet of LG, the set of terminals
SNT= set of NTs
S= distinguished symbol( start symbol)
P= set of production
Binding: a binding is the association of an attribute of a program entity with a value
Binding time: binding time is the time at which binding is performed.
Static binding: a static binding is a binding performed before the execution of a program
begins.
Dynamic binding: a dynamic binding is a binding performed after the execution of a program
has begun.

Q.5

Explain Derivation, Reduction and Parse tree?


Derivation

Reduction

Let production P1 of grammar G be the form

Let production P1 of grammar G be the form

P1: A:=

P1: A:=

And let be a string such that = A , then

And let be a string such that = , then

replacement of A by in string constitutes a

replacement of by A in string constitutes a

derivation according to production P1.

reduction according to production P1.

The derivation operation helps to generate


valid string.

Reduction operation helps to recognize valid


strings.

TEJAS PATEL
6

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Language

2150708 System Programming (SP)

Processor
Consider the grammar G
<sentence>= <noun phrase><verb phrase>
<noun phrase>= <article><noun>
<verb phrase>= <verb><noun phrase>
<article>= a| an| the
<noun>= boy | apple
<verb>= ate

EX: according to grammar we perform the

The following strings are sentential forms

following reduction.

of LG

The boy ate an apple

<noun phrase><verb phrase>

<article> boy ate an apple

<article><noun><verb phrase>

<article><noun> ate an apple

The boy <verb phrase>

<article><noun><verb> an apple

The boy <verb><noun phrase>

<article><noun><verb><article> apple

The boy ate <article><noun>

<article><noun><verb><article><noun>

The boy ate an apple

<noun phrase><verb><article><noun>
<noun phrase><verb><noun phrase>
<noun phrase><verb phrase>
<sentence>

Parse tree
A sequence of derivation or reduction reveals the syntactic structure of a string with respect to G.
We depict the syntactic structure in the form of a parse tree.
Derivation according to the production A:= gives rise to the following elemental parse tree

..

NTi

Ex:
<sentence>

<Noun phrase>

<Article>

. < Noun> i

<Verb phrase>

< Noun>

<Noun phrase>

<Article>

The

TEJAS PATEL

boy

ate

an

<Noun>

apple

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Language

Processor
Q.6

Give classification of grammars


Type-0 grammars
This grammar known as phrase structure grammar or unrestricted grammar, contains production
of the form
A->X
Where A and X can be strings of Ts and NTs.
Type-1 grammar
This grammar is also known as context sensitive grammar because their productions specify that
derivation or reduction of strings can take place only in specific contexts.
A grammar G is said to context sensitive if all the production are in the form of
X->Y
Where
X->combination of T and NT with at least one NT
Y->combination of T and NT and should be non empty.
Length of Y must be greater than or equal to X.
Type-2 grammar
Type2 grammar is also called context free grammar.
A grammar is said to be context free grammar if all the production in the form
A->X
Where A-> single NT
X-> combination of T and NT.
Type-3 grammar (regular grammar)
Left linear grammar
A grammar is said to be left linear grammar if the leftmost character symbol of RHS of
production rule is NT.
A->Ba | a
Right linear grammar
A grammar is said to be right linear grammar if the rightmost character symbol of RHS of
production rule is NT.
A->a | aB
Operator Grammar
An operator grammar is the grammar none of whose production contains two or more
consecutive NT, in any RHS alternative.

TEJAS PATEL
8

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Q.1

Linker & Loader

Define the following terms


1) Translation time address: Translation time address is used at the translation time. This
address is assigned by translator
2) Linked time address:Link time address is used at the link time. This address is assigned by
linker
3) Load time address:Load time address is used at the load time. This address is assigned by
loader
4) Translated origin: Address of origin assumed by the translator
5) Linked origin: Address of origin assumed by the linker while producing a binary program
6) Load origin: Address of origin assumed by the loader while loading the program for
execution.

Q.2

Describe in detail how relocation and linking is performed.

Program relocation is the process of modifying the addresses used in the address sensitive
instruction of a program such that the program can execute correctly from the designated
area of memory.

If linked origin translated origin, relocation must be performed by the linker.

If load origin linked origin, relocation must be performed by the loader.

Let AA be the set of absolute address - instruction or data addresses used in the instruction
of a program P.

AA implies that program P assumes its instructions and data to occupy memory words
with specific addresses.

Such a program called an address sensitive program contains one or more of the
following:

An address sensitive instruction: an instruction which uses an address i AA.

An address constant: a data word which contains an address i AA.

An address sensitive program P can execute correctly only if the start address of the memory
area allocated to it is the same as its translated origin.

To execute correctly from any other memory area, the address used in each address
sensitive instruction of P must be corrected.

Performing relocation

Let the translated and linked origins of program P be t_originp and l_originp, respectively.

Consider a symbol symb in P.

Let its translation time address be tsymb and link time address be lsymb.

The relocation factor of P is defined as

Relocation _factorp=l_originp-t_originp

Note that relocation_factorp can be positive, negative or zero.

Consider a statement which uses symb as an operand. The translator puts the address tsymb
in the instruction generated for it. Now,

.....(1)

TEJAS PATEL
1

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Tsymb= t_originp + dsymb

Where dsyml_bis the offset of symb in P. Hence

lsymb = l_originp + dsymb

Using (1),

lsymb = t_originp + Relocation _factorp + dsymb

Linker & Loader

= t_originp + dsymb+ Relocation _factorp


= tsymb + Relocation _factorp

.....(2)

Let IRPp designate the set of instructions requiring relocation in program P. Following (2) ,
relocation of program P can be performed by computing the relocation factor for P and
adding it to the translation time address(es) in every instruction i IRP p.

Linking

Consider an application program AP consisting of a set of program units SP = {Pi}.

A program unit Pi interacts with another program unit Pj by using addresses of Pjs
instructions and data in its own instructions.

To realize such interactions, Pj and Pi must contain public definitions and external references
as defined in the following: (Explain public definition and external reference)
o

Public definition: a symbol pub_symb defined in a program unit which may be


referenced in other program units.

External reference: a reference to a symbol ext_symb which is not defined in the


program unit

Q.3

What is program relocation? Explain characteristics of self-relocating programs.


Definition (program relocation): Program relocation is the process of modifying the addresses used in
the address sensitive instruction of a program such that the program can execute correctly from the
designated area of memory.
Self Relocating Programs

A self relocating program is a program which can perform the relocation of its own address
sensitive instructions.

It contains the following two provisions for this purpose:


o

A table of information concerning the address sensitive instructions exists as a part of


the program.

Code to perform the relocation of address sensitive instructions also exists as a part
of the program. This is called the relocating logic.

The start address of the relocating logic is specified as the execution start address of the
program.

Thus the relocating logic gains control when the program is loaded in memory for the
execution.

It uses the load address and the information concerning address sensitive instructions to

TEJAS PATEL
2

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Linker & Loader

perform its own relocation.

Execution control is now transferred to the relocated program.

A self relocating program can execute in any area of the memory.

This is very important in time sharing operating systems where the load address of a
program is likely to be different for different executions.

Q.4

Explain design of linker


1. Relocation & linking requirement in segmented addressing

Use of the segmented addressing structure reduces the relocation requirement of a


program.
Sr.No.

statement

Offset
0000

0001

DATA_HERE

SEGMENT

0002

ABC

DW

0003

DW?

0012

SAMPLE

SEGMENT
ASSUME

0013

25

0002

CS:SAMPLE
DS:DATA_HERE

0014

MOV

AX, DATA_HERE

0000

0015

MOV

DS, AX

0003

0016

JMP

0005

0017

MOV

AL, B

0008

AX, BX

0196

0027

MOV

0043

SAMPLE

ENDS

0044

END

Consider the above program, the ASSUME statement declares the segment register
CS and DS to be available for memory addressing.

Hence all memory addressing is performed using suitable displacement for their
contents.

Translation time address of A is 0196. In statement 16, a reference to A is assembled


as a displacement of 196 form the content of the CS register.

This avoids the use of an absolute address; hence the instruction is not address
sensitive. Now no relocation is needed is segment SAMPLE is to be loaded with the
address 2000 because the CS register would be loaded with the address 2000 by a
calling program.

The effective operand address would be calculated as <CS>+0196, which is the


correct address 2196.

A similar situation exists with the reference to B in statement 17. The reference to B

TEJAS PATEL
3

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Linker & Loader

is assembled as a displacement of 0002 from the content of DS register.

Since the DS register would be loaded with the execution time address of
DATA_HERE, the reference to B would be automatically relocated to correct address.

2. Linking requirement

In FORTRAN all program units are translated separately, hence all sub program calls
and common variable references require linking.

Pascal procedures are typically nested inside the main program; hence procedure
references do not require linking.

In C, program files are program files translated separately so, only function calls that
cross file boundaries and references to global data require linking.

A name table (NTAB) is defined for use in program linking. Each entry of the table
contains the following fields:
Symbol: symbolic name of an external reference or an object module
Linked_address: for a public definition, this field contains linked address of the
symbol. For an object module, it contains the linked origin of the object module.

Q.5

Write a brief note on MS-DOS linker

We discuss the design of a linker for the Intel 8088/80x86 processors which resembles LINK
of MS DOS in many respects.

It may be noted that the object modules of MS DOS differ from the Intel specifications in
some respects.

Object Module Format (Explain object module of the program)

An Intel 8088 object module is a sequence of object records, each object record describing
specific aspects of the programs in the object module.

There are 14 types of object records containing the following five basic categories of
information:

Binary image (i.e. code generated by a translator)

External references

Public definitions

Debugging information (e.g. line number in source program).

Miscellaneous information (e.g. comments in the source program).

We only consider the object records corresponding to first three categories-a total of eight
object record types.

Each object record contains variable length information and may refer to the contents of
previous object records.

Each name in an object record is represented in the following format:


length( 1 byte)

THEADR, LNAMES and SEGDEF records

name

TEJAS PATEL
4

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Linker & Loader

THEADR record
80H

length

T-module name

check-sum

The module name in the THEADR record is typically derived by the translator from the source
file name.

This name is used by the linker to report errors.

An assembly programmer can specify the module name in the NAME directive.

LNAMES record
96H

length

name-list

check-sum

The LNAMES record lists the names for use by SEGDEF records.

SEGDEF record
98H

length

attributes

segment length

name index

(1-4)

(2)

(1)

check-sum

A SEGDEF record designates a segment name using an index into this list.

The attributes field of a SEGDEF record indicates whether the segment is relocatable or
absolute, whether (and in what manner) it can be combined with other segments, as also the
alignment requirement of its base address (e.g. byte, word or paragraph, i.e. 16 byte,
alignment).

Stack segments with the same name are concatenated with each other, while common
segments with the same name are overlapped with one another.

The attribute field also contains the origin specification for an absolute segment.

EXTDEF and PUBDEF record


8CH

90H

length

length

base

external reference list

name

offset

check-sum

check-sum

(2-4)

The EXTDEF record contains a list of external references used by the programs of this
module.

A FIXUPP record designates an external symbol name by using an index into this list.

A PUBDEF record contains a list of public names declared in a segment of the object module.

The base specification identifies the segment.

Each (name, offset) pair in the record defines one public name, specifying the name of the
symbol and its offset within the segment designated by the base specification.

LEDATA records
A0H

length

segment index

data offset

(1-2)

(2)

data

check-sum

An LEDATA record contains the binary image of the code generated by the language
translator.

Segment index identifies the segment to which the code belongs, and offset specifies the

TEJAS PATEL
5

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Linker & Loader

location of the code within the segment.


FIXUPP record
9CH

length

locat

fix

frame

target

target

(1)

dat

datum

datum

offset

(1)

(1)

(1)

(2)

checksum

A FIXUPP record contains information for one or more relocation and linking fixups to be
performed.

The locat field contains a numeric code called loc code to indicate the type of a fixup.

The meanings of these codes are given in Table


Loc code

Meaning

Low order byte is to be fixed.

Offset is to be fixed.

Segment is to be fixed.

Pointer (i.e., segment: offset) is to be fixed.

locat also contains the offset of the fixup location in the previous LEDATA record.

The frame datum field, which refers to a SEGDEF record, identifies the segment to which the
fixup location belongs.

The target datum and target offset fields specify the relocation or linking information.

Target datum contains a segment index or an external index, while target offset contains an
offset from the name indicated in target datum.

The fix dat field indicates the manner in which the target datum and target offset fields are to
be interpreted.

The numeric codes used for this purpose are given in below table.
code

contents of target datum and offset fields

Segment index and displacement.

External index and target displacement.

Segment index (offset field is not used).

External index (offset field is not used).

MODEND record
8AH

length

type

start addr

(1)

(5)

check-sum

The MODEND record signifies the end of the module, with the type field indicating whether it
is the main program.

This record also optionally indicates the execution start address.

This has two components: (a) the segment, designated as an index into the list of segment
names defined in SEGDEF record(s), and (b) an offset within the segment.

TEJAS PATEL
6

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Q.6

Linker & Loader

What is an overlay? Explain overlay structured program and its execution.

An overlay is part of a program (or software package) which has the same load origin as
some other part of the program.

Overlay is used to reduce the main memory requirement of a program.

Overlay structured program

We refer to a program containing overlays as an overlay structured program. Such a


program consists of
o

A permanently resident portion, called the root.

A set of overlays.

Execution of an overlay structured program proceeds as follows:

To start with, the root is loaded in memory and given control for the purpose of execution.

Other overlays are loaded as and when needed.

Note that the loading of an overlay overwrites a previously loaded overlay with the same load
origin.

This reduces the memory requirement of a program.

It also makes it possible to execute programs whose size exceeds the amount of memory
which can be allocated to them.

The overlay structure of a program is designed by identifying mutually exclusive modules


that is, modules which do not call each other.

Such modules do not need to reside simultaneously in memory.

Execution of an overlay structured program

For linking and execution of an overlay structured program in MS DOS the linker produces a
single executable file at the output, which contains two provisions to support overlays.

First, an overlay manager module is included in the executable file.

This module is responsible for loading the overlays when needed.

Second, all calls that cross overlay boundaries are replaced by an interrupt producing
instruction.

To start with, the overlay manager receives control and loads the root.

A procedure call which crosses overlay boundaries leads to an interrupt.

This interrupt is processed by the overlay manager and the appropriate overlay is loaded into
memory.

When each overlay is structured into a separate binary program, as in IBM mainframe
systems, a call which crosses overlay boundaries leads to an interrupt which is attended by
the OS kernel.

Q.7

Control is now transferred to the OS loader to load the appropriate binary program.

TEJAS PATEL
7

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Linker & Loader

Explain different loading scheme


1) Compile & go loader

Assembler is loaded in one part of memory and assembled program directly into their
assigned memory location

After the loading process is complete, the assembler transfers the control to the starting
instruction of the loaded program.

Advantages
The user need not be concerned with the separate steps of compilation, assembling,
linking, loading, and executing.

Execution speed is generally much superior to interpreted systems.

They are simple and easier to implement.

Program
loader in
memory

Source
program

Compiler & go
assembler
Assembler

Disadvantages
There is wastage in memory space due to the presence of the assembler.

The code must be reprocessed every time it is run.

2) Absolute loader

It is a simple type of loader scheme which fits object code into main memory without
relocation.

This load accepts the machine text and placed into main memory at location prescribe by the
translator.

Advantage

Very simple

Disadvantage

Programmer must specify load address

In multiple subroutines environment programmer requires to do linking.

3) Subroutine linkage loader

A program unit Pi interacts with another program unit Pj by using address of Pj s instruction
and data in its own instruction.

To realize such instruction pj an dpi must contain public definitions and external reference

Public definition: program unit which may be referenced in other program unit

External reference: This is not defined in program unit containing the reference.

ENTRY statement: this list the public definition of the program unit.

EXTRN statement: lists the symbol in which external references are made in the program

TEJAS PATEL
8

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Linker & Loader

unit.
4) Relocating loader (BSS loader)

To avoid possible assembling of all subroutine when a single subroutine is changed and to
perform task of allocation and linking for the programmer, the general class of relocating
loader was introduced.

Binary symbolic loader (BSS) is an example of relocating loader.

The output of assembler using BSS loader is


1. Object program
2. Reference about other program to be accessed
3. Information about address sensitive entities.

Let us consider a program segment as shown below


Offset=10
ADD AREG,X
Offset=30
X DS 1

In the above program the address of var5iable X in the instruction ADD AREG, X will be 30

If this program is loaded from the memory location 500 for execution then the address of X
in the instruction ADD AREG, X must become 530.

Offset=10

ADD AREG,X

500
ADD AREG,X

X DS 1
Offset=30

530

X DS 1

Use of segment register makes a program address insensitive

Actual address is given by content of segment register + address of operand in instruction

So, 500+30=530 is actual address of variable X.

5) Direct linking loader

It is a general re-locatable loader and is perhaps the most popular loading scheme presently
used.

Advantages

Allowing multiple segments

Allowing multiple data segment

Flexible intersegment referencing

Accessing ability

Relocation facility

Disadvantage

Not suitable in multitasking

TEJAS PATEL
9

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Linker & Loader

6) Dynamic loader

It uses overlay structure scheme

In order for the overlay structure to work, it is necessary for the module loader to load their
various procedures as they are needed.

The portion of the loader that actually interprets the calls and loads the necessary procedure
is called overlay supervisor or flipper.

Q.8

This overlay scheme is called dynamic loading or load on call( LOCAL)

An algorithm for first pass of a linker


1. Extract load_origin from the command line.
2. Repeat step 3 for each module to be linked.
3. Select the next object module from the command line. For each record in the object
module:
(a) If an LNAMES record then enter the name of the module in the name directory
(NAMED).
(b) If a SEGDEF record then
(i)

i= name index from the record


(ii) segment_name= NAMED [i]
(iii) If an absdute segment then enter (segment_name, segment_addr) in ESD.
(iv) If the segment is relocatable then

Align load_origin with the next paragraph. It should be multiple of lb.

Enter (segment_name, load_origin) in ESD.

Load_origin= load_origint segment length.

(c) If a PUBDEF record then


(i) i= base
(ii) Segment_name= NAMED [i] Symbol = name
(iii) Segment_addr= load address of segment_name in ESD
(iv) SymboLaddr= segment_addr+ offset
(v) Enter (symbol, s)'mbol_addr) in ESD.

Q.9

Object module of linker

The objet module of a program contains all information necessary to relocate and link the
program with other programs.

The object module of a program P consists of 4 components:


1.

Header: The header contains translated origin, size and execution start address of P.

2.

Program: This component contains the machine language program corresponding to P.

3.

Relocation table: (RELOCTAB) This table describes IRRP. Each RELOCTAB entry contains a

TEJAS PATEL
10

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Linker & Loader

single field:
Translated address: Translated address of an address sensitive instruction.
4.

Linking table (LINKTAB): This table contains information concerning the public definitions and
external references in P.
Each LINKTAB entry contains three fields:
Symbol:

Symbolic name

Type:PD/EXT indicating whether public definition or external reference


Translated address:

For a public definition, this is the address of the first memory word
allocated to the symbol. For an external reference, it is the address of
the memory word which is required to contain the address of the symbol.

Example:
Statement

Address

START

500

ENTRY

TOTAL

EXTRN

HAX, ALPHA

Code

+ 09 0 540
A

READ

500) 501)

LOOP
.
.
.

+ 04 1 000

MOVER

AREG, ALPHA

518)

BC

ANY, HAX

519)

+ 06 6 000

. .
.
BC
LT, LOOP

STOP

538) 539)

DS

TOTAL

DS

540)

END

541)

+ 06 1 601
+ 00 0 000

1. Translated origin=500, size=42, execution start address=500.


2. Machine language instruction shown in code
3. Relocation table
500
538
4. Linking table

TEJAS PATEL

ALPHA

EXT

518

MAX

EXT

519

PD

540

Page

11

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Q.1

MacroProcessors

Explain macro, macro definition and Macro call


Macro: macro is a unit of specification for program generation through expansion.
Macro definition: macro definition is enclosed between macro header and macro end statement

Macro definition consist of


1.

Macro prototype statement: it declares macro name and formal parameter list

2.

One or more model statement: from which an assembly statement can be generated

3.

Macro preprocessor statement: used to perform auxiliary function

Macro call:A macro is called by writing macro name in the mnemonics field and set of actual
parameters.
<macro name>[<actual parameter name>]

Q.2

Explain macro expansion


A macro call leads to macro expansion. During macro expansion, the macro call statement is replaced
by a sequence of assembly statements.
Each expanded statement is marked with a + preceding its label field.
Two key notations concerning macro expansion are:
1. Expansion time control flow
2. Lexical Substitution
1. Flow of control during expansion

This determines the order in which model statements are visited during macro expansion.

Default flow of control during macro expansion is sequential.

A preprocessor statement can alter flow of control during expansion such that model
statements are never visited during expansion (conditional expansion) or repeatedly visited
during expansion (expansion time loop).

The flow control during macro expansion is implemented using a macro expansion
counter(MEC)

Algorithm:
1. MEC:= statement number of first statement following the prototype statement;
2. While statement pointed by MEC is not a MEND statement
(a) If a model statement then
(i) Expand the statement.
(ii) MEC:= MEC+1;
(b) Else (i.e. a preprocessor statement)
(i) MEC:= new value specified in the statement;
3.

Exit from macro expansion

MEC is set to point at the statement following the prototype statement. It is incremented by 1after
expanding a model statement

TEJAS PATEL
1

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

MacroProcessors

2. Lexical substitution
Amodel statement consist of 3 types of strings:
1. An ordinary string, which stand for itself
2. Name of formal parameter which is preceded by the character &.
3. Name of preprocessor variable, is preceded by the character &.
During lexical expansion, strings of type 1 are retained without substitution. Strings of types 2 and 3
are replaced by the values of the formal parameters or preprocessor variables.
2.1 Positional parameters

A positional formal parameter starts with '&' sign and it is defined in operand field of macro
name.

The actual parameters of macro call on macro using positional parameters are simply ordinary
string.

The value of first actual parameter of macro call is assigned to first positional formal
parameter defined in operand field of macro name.

The value of second actual parameter of macro call is assigned to second positional" formal
parameter defined in operand field of macro name.

Similarly the value of nth actual parameter is assigned to nth positional formal parameter
defined in operand field of macro name.

Positional parameter is always used at the place of operand2.

Value of positional parameter should not be keywords.

2.2 Keyword parameters

A keyword formal parameter starts with &KW string or &OP string or &REG or &CC depending
on macro processor. It is defined in operand field of macro name.

A keyword formal parameter ends with = sign depending on macro processor. It is defined in
operand field of macro name.

Formal keyword parameter mayor may not have default value. Again this is depends on macro
processor.

The actual parameter of macro call on macro using keyword parameter is simply ordinary
string if they are used as positional parameters.

Keyword parameter is always used at the place of mnemonic instruction or at the place of
operand 1.

Value of keyword parameter is always keywords. That are ADD, SUB, AREG, BREG, LT, LE etc.

2.3 Label parameters

A label formal parameter starts with &LAB string depending on macro processor. It is defined
in operand field of macro name.

A label formal parameter ends with = sign depending on macro processor. It is defined in
operand field of macro name.

Every label formal parameter should not have any default value. Again this depends on macro
processor.

The actual parameter of macro call on macro using label parameter is simply ordinary string if
they are used as a positional parameter.

TEJAS PATEL
2

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Label parameter is always used at the place of label field.

Value of label parameter should not be keyword.

MacroProcessors

2.4 Macros with mixed parameters lists

A macro may be defined to all parameters i.e. positional parameter, keyword parameter and
label parameter

Q.3

Explain types of parameter


Positional Parameter
A positional formal parameter is written as &<parameter name>
The value of a positional parameter XYZ is determined by the rule of positional association as
follows:
1.

Find the original position of XYZ in the list of formal parameter in the macro prototype
statement.

2.

Find the actual parameter specification occupying the same ordinal position in the list
of actual parameter in macro call statement.

Keyword parameter
Keyword parameters are used for following purposes: 1.
Default value can be assigned to the parameter
2.

During a macro call, a keyword parameter is specified by its name. it takes the
following form:
<parameter name>=<parameter value>
MACRO
INCR &VARIABLE=X, &INCR=Y, &REG=AREG
MEND

VARIABLE is a keyword parameter with default value as X


INCR is a keyword parameter with default value as Y
REG is a keyword parameter with default as AREG
The position of keyword parameter during macro call is not important.

Q.4

Compare the features of subroutine and macros with respect to following: (i) Execution
Speed (ii) Processing requirement by assembler (iii) Flexibility and generality

Macros use string replacement for its invocation whereas subroutines use calls.

Due to replacement nature, macro can exist multiple copies in the programs whereas
subroutines can exist only in one copy.

Because of multiple copies possibility, you cannot obtain a macros address, whereas you can
obtain a subroutines address.

Macros can be faster since it doesnt have calling and return time penalty.

Macros can be harder to debug since the replacement may be obstacle in the resulting code.

TEJAS PATEL

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


MacroProcessors

2150708 System Programming (SP)


(i) Execution speed
MACRO

SUBROUTINE

At the time of execution each and every

At the time of execution, execution control

macro call replaced with macro definition i.e.

transfers

it expands main program

execution of subroutine it returns to the


main

to

the

program

subroutine
again

and

and

after

executing

remaining instructions.

This

process

not

required

any

stack

manipulation operation during the execution

operation during the execution of program.

of program

This process requires stack manipulation

That means it stores current address in


stack and the execution control goes to sub
routine, after execution of subroutine it pop
address from stack and return to the main
program.

It

requires

extra

processing

time

for

expansion but at once.

It not requires extra processing time for


expansion but every time at each subroutine
call it requires stack manipulation operation.

Speed of its object code is very fast because

it not requires any stack manipulation.

Speed of its object code is becomes slow


because it requires stack manipulation at
each subroutine.

(ii) Processing requirement by assembler


MACRO

In assembly level macro should be defined

SUBROUTINE

subroutine can be defined anywhere.

before main program.

In high level macro can be defined any

macro statement in assembly language is as

i.e. before or after the main program. This


depends on high level language.

where i.e. before or after the main program.

In assembly level as well as in high level

subroutine call in assembly language is

follows

follows

[Label]<Macro name>[<parameter list>]

[Label]<CALL><Subroutine name>

Example:
FACTORIAL

A,FACT

Example:
CALL FACTORIAL

Where FACTORIAL is the name of macro and

Where

A, FACT is the list of actual parameters.

subroutine.

(iii)

is

the

name

of

nesting

in

Flexibility and generality


MACRO

FACTORIAL

In assembly level programming looping and

SUBROUTINE

We

can

use

looping

and

nested looping like facilities used in macro

subroutines in low level as well as in high

In high level programming looping and

level.

nested looping should not be used

TEJAS PATEL
4

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


MacroProcessors

2150708 System Programming (SP)

Its object code requires large amount of

main memory as well as secondary memory

It is suitable in real time operating system or

memory as well as secondary memory

environment

It is not suitable in real time operating


system or environment

Here time factor is more important than

space

Q.5

Its object code requires less amount of main

Here space factor is more important than


time

Explain nested macro calls OR


Define two macros of your choice to illustrate nested calls to these macros. Also show their
corresponding expansion.

A model statement in a macro may constitute a call on another macro. Such calls are known
as nested macro calls.

We refer to the macro containing the nested call as the outer macro and the called macro as
the inner macro.

Expansion of nested macro calls follows the last-in-first-out (LIFO) rule. Thus, in a structure of
nested macro calls, expansion of the latest macro call (i.e. the innermost macro call in the
structure) is completed first.

Example
The below defined is the definition of INCR_D macro.
MACRO
INCR_D

&MEM_VAL=,&INCR_VAL=, &REG=AREG

MOVER

&REG, &MEM_VAL

ADD

&REG, &INCR_VAL

MOVEM

&REG, &MEM_VAL

MEND
Macro COMPUTE defined below contains a nested call on macro INCR_D defined above.

MOVEM

BREG, TM

MOVER

BREG, X

ADD

BREG, Y

MOVEM

BREG, X

MOVER

BREG, TM

MACRO
COMPUTE

&FIRST, & SECOND

MOVEM

BREG, TMP

INCR_D

&FIRST, & SECOND, REG=BREG

MOVER

BREG, TMP

MEND

TEJAS PATEL
5

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


MacroProcessors

2150708 System Programming (SP)


The expanded code for the call
COMPUTE

X, Y

is described as follows.
+

MOVEM BREG TEMP[1]


+ MOVER BREG, X 2

COMPUTE X , Y

INCR_D

X,Y

MOVER BREG,TEMP[5]

+ ADD BREG, Y 3
+ MOVEM BREG, X 4

Q.6

Advanced macro facilities

1. Alteration of flow of control during expansion


Expansion time statement: OR (Explain expansion time statements AIF and AGO for macro
programming)
AIF

An AIF statement has the syntax:


o

AIF (<expression>) <sequencing symbol>

where<expression> is a relational expression involving ordinary strings, formal parameters


and their attributes, and expansion time variables.

If the relational expression evaluates to true, expansion time control is transferred to the
statement containing <sequencing symbol> in its label field.

AGO

An AGO statement has the syntax:


o

AGO <sequencing symbol>

It unconditionally transfers expansion time control to the statement containing <sequencing


symbol> in its label field.

Expansion time loopsor (Explain expansion time loop)

It is often necessary to generate many similar statements during the expansion of a macro.

This can be achieved by writing similar model statements in the macro.

Expansion time loops can be written using expansion time variables (EVs) and expansion time
control transfer statements AIF and AGO.

Example

MACRO

&M

CLEAR

&X, &N

LCL

&M

SET

MOVER

AREG, =0

MOVEM

AREG, &X+&M

.MORE

SET

&M + 1

&M

AIF

(&M NE N) .MORE

TEJAS PATEL
6

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


MacroProcessors

2150708 System Programming (SP)


MEND

The LCL statement declares M to be a local EV.

At the start of expansion of the call, M is initialized to zero.

The expansion of model statement MOVEM, AREG, &X+&M thus leads to generation of the
statement MOVEM AREG, B.

The value of M is incremented by 1 and the model statement MOVEM..is expanded


repeatedly until its value equals the value of N.

2. Expansion time variable or (Explain expansion time variable with example)

Expansion time variables (EV's) are variables which can only be used during the expansion of
macro calls.

A local EV is created for use only during a particular macro call.

A global EV exists across all macro calls situated in a program and can be used in any macro
which has a declaration for it.

Local and global EV's are created through declaration statements with the following syntax:
o

LCL <EV specification> [,<EV specification> .. ]

GBL <EV specification> [,<EV specification> .. ]

<EV specification> has the syntax &<EV name>, where <EV name> is an ordinary string.

Values of EV's can be manipulated through the preprocessor statement SET.

A SET statement is written as:


o

< EV specification > SET <SET-expression>

where< EV specification > appears in the label field and SET in mnemonic field.

A SET statement assigns value of <SET-expression> to the EV specified in < EV specification


>.
Example
MACRO
CONSTANTS
&A
&A

LCL

&A

SET

DB

&A

SET

&A+l

DB

&A

MEND

The local EV A is created.

The first SET statement assigns the value '1' to it.

The first DB statement thus declares a byte constant 1.

The second SET statement assigns the value '2' to A and the second DB statement declares a
constant '2'.

3.

Attributes of formal parameter

An attribute is written using the syntax


<attribute name> <formal parameter spec>

It represents information about the value of the formal parameter, i.e. about the

TEJAS PATEL
7

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


MacroProcessors

2150708 System Programming (SP)


corresponding actual parameter.

The type, length and size attributes have the names T, L and S.
Example

MACRO
DCL_CONST

&A

AIF

(L'&A EQ 1) .NEXT

--.NEXT

-MEND

Here expansion time control is transferred to the statement having .NEXT field only if the
actual parameter corresponding to the formal parameter length of ' 1'.

Q.7

Explain lexical and semantic expansion OR


Explain tasks involved in macro expansion.
Lexical expansion:

Lexical expansion implies replacement of a character string by another character string during
program generation.

Lexical expansion is to replace occurrences of formal parameters by corresponding actual


parameters.

Semantic expansion:

Semantic expansion implies generation of instructions tailored to the requirements of a


specific usage.

Semantic expansion is characterized by the fact that different uses of a macro can lead to
codes which differ in the number, sequence and opcodes of instructions.

Eg: Generation of type specific instructions for manipulation of byte and word operands.

Semantic expansion is the generation of instructions tailored to the requirements of a specific


usage.

It can be achieved by a combination of advanced macro facilities like AIF, AGO statements and
expansion time variables.

Here, the number of MOVEM AREG, Statements generated bya call on CLEAR is determined
by the value of the second parameter of CLEAR.

Macro EVAL of example is another instance of conditional expansion wherein one of two
alternative code sequences is generated depending on the peculiarities of actual parameters of
a macro call.

Below example illustrates semantic expansion using the type attribute.


Example
MACRO
CREATE_CONST

&X, &Y

TEJAS PATEL
8

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


MacroProcessors

2150708 System Programming (SP)

&Y

AIF

(T &X EQ B) .BYTE

DW

25

&A

Q.8

.OVER

AGO .BYTE

ANOP

&Y

DB

.OVER

MEND

25

nd

This macro creates a constant 25 with the name given by the 2

The type of the constant matches the type of the first parameter.

parameter.

Describe task and data structures considered for the design of a macro preprocessor

Macro preprocessor

The macro preprocessor accepts an assembly program containing macro definitions and calls
and translates it into an assembly program which does not contain any macro definition or
calls.

Below figure shows a schematic of a macro preprocessor.

The program from output by the macro preprocessor can now be handed over to an assembler
to obtain the target form output by macro preprocessor can now be handed over to an
assembler to obtain language form of program.

Macro PreAssembler
processor

Program with macro


definitionand calls

Program
Without

Target program

Macros

Following are the task of macro preprocessor:


1. Identify macro calls in the program.
2. Determine the values of formal parameters.
3. Maintain the values of expansion time variables declared in a macro.
4. Organize expansion time control flow.
5. Determine the values of sequencing symbols.
6. Perform expansion of a model statement.
Data Structures

Task has identified the key data structures of the macro preprocessor. To obtain a detailed
design of the data structures it is necessary to apply the practical criteria of processing
efficiency and memory requirements.

The tables APT, PDT and EVT contain pairs which are searched using the first component of
the pair as a key-for example, the formal parameter name is used as the key to obtain its
value from APT. This search can be eliminated if the position of an entity within a table is

known when its value is to be accessed. We will see this in the context of APT.

The value of a formal parameter ABC is needed while expanding a model statement using it,
viz.

TEJAS PATEL
9

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


MacroProcessors

2150708 System Programming (SP)


MOVER AREG, &ABC

Let the pair (ABC, ALPHA) occupy entry #5 in APT. The search in APT can be avoided if the
model statement appears as
MOVER AREG, (P, 5)
in the MDT, where(P, 5) stand for the words parameter #5.

Thus, macro expression can be made for efficient by storing an intermediate code for a
statement, rather than its source form, in the MDT.

All parameter names could be replaced by pairs of the form (P, n) in the model statement and
preprocessor statement stored in MDT.

An interesting offshoot of this decision is that the first component of the pairs stored in APT is
no longer used during macro expansion, e.g. the information (P, 5) appearing in a model
statement is sufficient to access the value of formal parameter ABC. Hence APT containing
(<formal parameter name>, <value>) pairs is replaced by another table called APTAB which
only contains <value>'s.

To implement this simplification, ordinal numbers are assigned to all parameter of a macro. A
table named parameter name table (PNTAB) is used for this purpose.

Parameter names are entered in PNTAB in the same order in which they appear in the
prototype statement.

The entry # of a parameter's entry in PNTAB is now its ordinal number. This entry is used to
replace the parameter name in the model and preprocessor statements the macro while
storing it in the MDT.

In effect, the information (<formal parameter name>, <value>) in APT been split into two
tables PNTAB which contains formal parameter name.
APTAB - which contains formal parameter value.(i.e. contains actual parameter)

Other data structures are given below:

Table

Field in each entry

Macro name table (MNT)

Macro name, Number of positional parameters (#P

Number of keyword parameters (#KP), "Number

expansion time variables (#EV), MDT pointer (MDT


KPDTAB pointer (KPDTP), SSTAB pointer (SSTP)

Q.9

Parameter Name Table (PNTAB)

Parameter name

EV Name Table (EVNTAB)

EV name

SS Name Table (SSNTAB)

SS name

Keyword Parameter Default Table (KPDTAB)

Parameter name, default value

Macro Definition Table (MDT)

Label, Opcode, Operands

Actual Parameter Table (EVTAB)

Value

SS Table (SSTAB)

MDT entry #

Explain design specification task for macro preprocessor with suitable example

TEJAS PATEL
10

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

MacroProcessors

Design Overview

We begin the design by listing all tasks involved in macro expansion.


1. Identify macro calls in the program.

2.

Determine the values of formal parameters.

3.

Organize expansion time control flow.

4.

Maintain the values of expansion time variables declared in a macro.

5.

Determine the values of sequencing symbols.

6.

Perform expansion of a model statement.

The following 4 step procedure is followed to arrive at a design specification for each task:
1.

Identify the information necessary to perform a task.

2.

Design a suitable data structure to record the information.

3.

Determine the processing necessary to obtain the information.

4.

Determine the processing necessary to perform the task.

Application of this procedure to each of the preprocessor tasks is described as follows.


Identify macro calls
A table called the macro name table (MNT) is designed to hold the names of macros defined in a
program. A macro name is entered in this table when a macro definition is processed. While
processing a statement in the source program, the preprocessor compares the string found in its
mnemonic field with the macro names in MNT. A match indicates that the current statement is a
macro call.
Determine value of formal parameters
A table called the actual parameter table (APT) is designed to hold the values formal parameters
during the expansion of a macro call. Each entry in the table is a pair
(<formal parameter name>,<value>)
Two items of information are needed to construct this table, names of formal parameters, and default
values of keyword parameters. For this purpose, a table called parameter default table (PDT) is used
for each macro. This table would be accessible from the MNT entry of a macro and would contain
pairs of the form (<formal parameter name>, <default value>). If a macro call statement does not
specify a value for some parameter par, its default value would be copied from PDT to APT.
Maintain expansion time variables
An expansion time variables table(EVT) contains pairs of the form
(<EV name>, <value>)
The value field of a pair is accessed when a preprocessor statement or a model statement under
expansion refers to an EV.

Organize expansion time control file

TEJAS PATEL
11

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


MacroProcessors

2150708 System Programming (SP)

Macro definition table (MDT) stores set of preprocessor statements and model statements. The flow of
control during macro expansion determines when a model statement is to be visited for expansion. It
is updated after expanding a model statement or on processing a macro preprocessor statement.
Determine values of sequencing symbols
A sequencing symbols table(SST) is maintained to hold this information. The table contains pairs of
the form
(<sequencing symbols name >, <MDT entry #>)
where<MDT entry #> is the number of the MDT entry which contains the model statement defining
the sequencing symbol.
Perform expansion of a model statement
This is a trivial task given the following:
1.

MEC points to the MDT entry containing the model statement.

2.

Values of formal parameters and EV's are available in APT and EVT, respectively.

3.

The model statement defining a sequencing symbol can be identified from SST.

4.

Expansion of a model statement is achieved by performing a lexical substitution for the


parameters and EV's used in the model statement.

Q.10

Write a macro that moves n number from the first operand to the second operand, where n
is specified as third operand of the macro.
MACRO

&source, &dest, &N

MOVEA
LL

&M

LCL

&M

SET

AREG, &source + &M

.NE

MOVER

XT
&M

AREG, &dest + &M


MOVEM

&M + 1

SET

( &M NE &N) .NEXT

AIF
MEND
Q.11

Write a macro which takes B, C and D as parameters and calculates B*C+C*D.


MACRO
EVAL

&X, &Y, &Z

MOVER

AREG, &X

MUL

AREG, &Y

MOVEM

AREG,&X

MOVER

AREG, &Y

MUL

AREG, &Z

ADD

AREG, &X

MEND
Q.12

Draw a flow chart and explain simple one pass macro processor.

TEJAS PATEL
12

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


MacroProcessors

2150708 System Programming (SP)

Start
MDTC =1
MNTC =1
Read line
From
source

No

Is Macro
Pseudo up
Yes

Is
END

Read line
From
source

Yes
Go for
Assembl

Update
MNT

N
o
Search
in
No
Found
Yes

Update
PNTAB
Read line
From

Write into
output source
file

Replace formal
parameter
Write into
output

MDTC++

Is
MEND
?

No
Yes

In this type of preprocessor only one pass is used to construct data structure and use that data
structure.
It is also called as preprocessor, Because it is processed before translator. It is shown in figure.

TEJAS PATEL
13

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


MacroProcessors

2150708 System Programming (SP)

Source code
with macro

One pass
Macro processor

MNT

MDT

PNTAB

APTAB

SSTAB

KPPTAB

Source code
without macro

Figure: One pass macro processor


Data Structure
Macro name table (MNT):

This is used to store all information of macro definition that is macro

name, MDTP, Total number of positional parameters.


Macro definition table (MDT): This is used to store all program of macro definition.
Parameter name table (PNTAB): This is used to store all positional parameter name of macro
definition.
Keyword parameter default table (KPDTAB or KPT): This is used to store all keyword parameter
name of macro definition with its default values.
EV Name table (EVNTAB or EVT): this is used to store all expansion time variable name of macro
definition with its type (global or local).
SS Name table (SSNTAB): This is used to store all labels of macro definition.
SS Table (SSTAB): This is used to store MDT entry where sequencing symbol is defined in MDT.
EV Table (EVTAB): This is used to store current all value of the expansion time variables of macro
definition.
Actual parameter table (APTAB): This is used to store name of actual parameters defined in macro
call.
Algorithm:
Step 1: Initialize all other pointer variables to 1 or 0.
MDTP=1, MNTP=1, KPTP=1, LC=1.
th

Step 2: Read LC

line from source code that means input program.

Step 3: Isolate label instruction and operand from line.


Step 4: If instruction="MACRO"
If yes
4.1: LC=LC+ 1.
th

4.2: Read LC

line from source code that means input program.

4.3: Isolate label instruction and operand from line.


4.4: Enter macro name in MNT.

TEJAS PATEL

Page

14

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

MacroProcessors

Find out total number of parameter, keyword parameter and expansion time variables
and store it in MNT.
Store the value of all pointers in MNT.
4.5: Update PNTAB, KPDTAB, EVNTAB, SSNTAB, SSTAB.
4.6: Increments all the pointers of updated tables.
4.7: MNTP=MNTP+1.
4.8: LC=LC+1.
th

4.9: Read LC line from source code that means input program.
4.10: Isolate label instruction and operand from line and store it into MDT at MDTP location.
4.11: MDTP=MDTP+ 1.
4.12 : If instruction="MEND"
If yes
Go to step 2.
If no
Go to step 4.6.
If no
Go to step 4.
Step 5: Search instruction in MNT.
Step 6: If instruction found in MNT?
If yes
6.1: Find out Actual parameter &store it in APTAB.
6.2: Find out MDTP from MNT.
6.3: Search macro definition from MDT at MDTP position.
6.4: Adjust all model statements as follows.
6.4.1: Replace Actual parameters with formal parameters using PNTAB, KPDTAB, and APTAB.
6.4.2: Replace each expansion time variable name with its value using EVNTAB, EVTAB.
6.4.3: Find out labels from SSNTAB and its address from SSTAB, sequence label with
sequence number and replace it in old place.
6.5: Write all these adjusted model statements in output source file.
6.6: LC=LC+1.
6.7: Go to step 2.
If no
6.8: If instruction ="END"
If yes
Go to Assembler.
If no
Write line in output source file LC=LC+1.
Go to step 2.

TEJAS PATEL

Page

15

SAFFRONY INSTITUTE OF TECHNOLOGY


Parsing

2150708 System Programming (SP)

Q.1

What is parsing? Explain types of parsing.


Parsing or syntactic analysis is the process of analyzing a string of symbols according

to the rules of a formal grammar


Parsing is a technique that takes input string and produces output either a parse tree if string is
valid sentence of grammar, or an error message indicating that string is not a valid sentence of
given grammar.
There are mainly two types of parsing
1. Top down parsing: A top down parser for a given grammar G tries to derive a string
through a sequence of derivation starting with a start symbol.
Top down parsing methods are:
Top down parsing(with backtracking/ without backtracking)
Recursive decent parser
LL(1) parser
2. Bottom up parsing: In bottom up parsing, the source string is reduced to the start
symbol of the grammar. Bottom up parsing method is also called shift reduce parsing.
Bottom up parsing methods are:
Nave bottom up parsing
Operator precedence parsing
Q.2

Explain parse tree and abstract syntax tree.


A set of derivations applied to generate a string can be represented using a tree. Such a tree is
known as a parser tree.
While Abstract syntax tree represents the structure of a source string in more economical
manner.
EX:

Write

unambiguous

production

rules

(grammar)

for

arithmetic

expression

containing +, -, *, / and ^ (exponentiation).Construct parse tree and abstract syntax


tree for:
<id> - <id> * <id> ^ <id> + <id>.(GTU DEC_11)
Unambiguous grammar for arithmetic expression containing +, -, *, / and ^
E->E-T|T
T->T*F|F
F->F/G|G
G->G^H|H
H->H+I|I
I-><id>

TEJAS PATEL
1

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Parsing

2150708 System Programming (SP)


Parse tree
E

F
F

<id>

<id> <id>

<id>

<id>

Abstract Syntax Tree

id

*
id

id

id
Q.3

id

Explain left factoring and left recursion.


Left Factoring:
For each non-terminal A with two or more alternatives(production rules) with a common non
empty prefix, let say
A->1 |.| n| 1|m
Converted it into
A->A| 1|m

TEJAS PATEL
2

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Parsing

A-> 1 |.| n
EX:
A->xByA | xByAzA | a
B->b
Left factored, the grammar becomes
A->xByAA | a
A->zA |
B-> b
Left Recursion:
A grammar is left-recursive if we can find some non-terminal A which will eventually derive
a sentential form with itself as the left-symbol.
Immediate left recursion occurs in rules of the form

Where

and are sequences of non-terminals and terminals, and doesn't start with.

For example, the rule


is immediately leftrecursive.
It could be replaced by the non-left recursive productions as

A -
-
The general algorithm to remove immediate left recursion follows.

A -mn
where:
A is a left-recursive nonterminal

is a sequence of non-terminals and terminals that is not null ( )


is a sequence of non-terminals and terminals that does not start with A.
replace the A-production by the production:

A - -m
And create a new nonterminal

-n
Q.4

Top down parsing methods


1) Nave top down parsing or brute force parsing
Naive top down parsing algorithm

Current sentential form (CSF) = S

Let CSF be of the form A, such that is a string of Ts and A is the leftmost NT in CSF.
Exit with success if CSF=

Make a derivation A->1B according to a production A=1B of G such that 1 is a string


of Ts. This makes CSF= 1B.

TEJAS PATEL
3

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Parsing

2150708 System Programming (SP)

Go to step 2.

Ex:
Consider a given grammar
S->aAb
A->cd | c derive string acb
S

backtracking

2) Top down parsing without backtracking


Elimination of backtracking in top down parsing have several advantages:
Parsing would become more efficient and it would be possible to perform semantic action and
precise error reporting during parsing
We use left factoring to ensure that the RHS alternatives will produce a unique terminal symbol in
first position
Consider the grammar
E-> T+ E | T
T-> V*T | V
V-> Id
Perform left factoring on given grammar
Now grammar will
E->TE
E->+E|
T->VT
T->*T|
V->Id
Now parsing of the string <id>+<id>*<id>
Sr No.

CSF

symbol

prediction

<id>

E->TE

TE

<id>

T->VT

VTE

<id>

V-><id>

<id>TE

T->

<id>E

E->+E

<id>+E

<id>

E->TE

<id>+TE

<id>

T->VT

<id>+VTE

<id>

V-><id>

<id>+<id> TE

T->*T

TEJAS PATEL
4

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Parsing

2150708 System Programming (SP)


10

<id>+<id> *TE

<id>

T->VT

11

<id>+<id>*V TE

<id>

V-><id>

12

<id>+<id>*<id>TE

T->

13

<id>+<id>*<id>E

E->

14

<id>+<id>*<id>

3) Recursive decent parser


A top down parser that executes a set of recursive procedures to process the input
without backtracking is called recursive-decent parser, and parsing is called recursive
decent parsing
Ex:
S->E
E->VE
E->+VE |
V->Id
Recursive decent method given below for above grammar
S()
{
E();
}
E()
{
V();
E();
}
E()
{
If(next symbol==+)
{
V();
E();
}
}
V()
{
If(next symbol==Id)
{
Return;
}
Else
{
Print(error);
}
}

TEJAS PATEL
5

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Parsing

2150708 System Programming (SP)


4) LL(1) parser OR Describe working of LL(1) parser and parse the given string

An LL(1) parser is a table driven parser for left-to-left parsing.

The' 1' in LL(1) indicates that the grammar uses a look-ahead of one source symbol-that is, the
prediction to be made is determined by the next source symbol.

A major advantage of LL(1) parsing is its amenability to automatic construction by a parser


generator.

Consider the grammar is given below:


E ::= TE
E ::= +TE |
T ::= FT
T ::= *FT |
F ::= (E)|<id>
FIRST and FOLLOW for each NT
FIRST

FOLLOW

{(,id}

{$,)}

{+,}

{$,)}

{(,id}

{+,$,)}

{*,}

{+,$,)}

{(,id}

{+,*,$,)}

Predictive parsing table


Source Symbol

Non- terminal

<id>

E =>TE

E
T =>FT

E =>

E =>

T =>

T =>

T=>FT

T =>

-|

E=>TE
E => +TE

T =>* FT

F =><id>

F=>(E)

A parsing table entry PT (nti, t j) indicates what prediction should be made if ntiis the leftmost
NT in a sentential form and tjis the next source symbol.
A blank entry in PT indicates an error situation.
A source string is assumed to be enclosed between the symbols ' |-' and ' -|'.
Hence the parser starts with the sentential form |- E -|.
The

sequence

of

predictions

made

by

the

parser

for

the

source

string

<id>*<id>*<id>+<id>-| can be given as follows:


Current sentential form

Symbol

Prediction

|- E -|

<id>

E => TE

|- TE -|

<id>

T => FT

|- FTE -|

<id>

F =><id>

T => *FT

|- <id>TE -|

|-

TEJAS PATEL
6

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Parsing

2150708 System Programming (SP)


|- <id>*FTE -|

<id>

F =><id>

T => *FT

<id>

F =><id>

|- <id>*<id>TE -|
|- <id>*<id>*FTE -|

Q.5

|- <id>*<id>*<id>TE -|

T =>

|- <id>*<id>*<id>E -|

E => +TE

|- <id>*<id>*<id>+TE -|

<id>

T => FT

|- <id>*<id>*<id>+FTE -|

<id>

F =><id>

|- <id>*<id>*<id>+<id>TE -|

-|

T =>

|- <id>*<id>*<id>+<id>E -|

-|

E =>

|- <id>*<id>*<id>+<id> -|

Define the following terms:


1) Simple precedence:a grammar symbol a precedes symbol b, where each of a, b is a T or NT
of G, if in a sentential form ab, a should be reduced prior to b in a bottom up parsing.
2) Simple precedence grammar: a grammar G is a simple precedence grammar if for all
terminal and nonterminal symbol a, b of G, a unique precedence relation exist for a, b.
3) Simple phrase: is a simple phrase of the sentential form . if there exist a production of
the grammar A::= and ->A is a reduction in the sequence of reduction. ->->..S.
4) Handle: a handle of a sentential form is the leftmost simple phrase in it.
5) Handle pruning: the process of discovering a handle and reducing it to appropriate LHS NT is
known as handle pruning.

Q.6

Bottom up parsing methods


1) Nave bottom up parsing algorithm
1. SSM := 1; n := 0;
2. r := n;
3. Compare the string of r symbols to the left of SSM with all RHS alternatives in G which have
length of r symbols.
4. If a match is found with a production A ::=, then
reduce the string of r symbols to NT A;
n := n r + 1;
Goto step 2;
5. r:=r-1;
if (r > 0), then goto step 3;
6. If no more symbols exist to the right of SSM then
If current string form = S then
exit with success;
else report error and exit with failure;
7.

SSM := SSM + 1;
n := n + 1;

goto step 2;
2) Operator Precedence Parsing

TEJAS PATEL
7

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Parsing

2150708 System Programming (SP)

What is operator precedence parsing? Show operator precedence matrix for following
operators :+,-,*,(,). Parse following string: |-<id> + <id> * <id>-|(GTU Dec_11,Jan_13)

Operator precedence parsing is based on bottom-up parsing techniques and uses a precedence
table to determine the next action.

The table is easy to construct and is typically hand-coded.


This method is ideal for applications that require a parser for expressions and where embedding
compiler technology.

Disadvantages
1.

It cannot handle the unary minus (the lexical analyzer should handle

the unary

minus).
2.

Small class of grammars.

3.

Difficult to decide which language is recognized by the grammar.

Advantages
1.

simple

2.

powerful enough for expressions in programming languages

Operator Precedence Matrix for the operators +, -, *, /,id,(, ) is given as follows:

LHS
oper ator s

RHS operators
+

<

<

<

<

<

<

<

<

<

>

<

<

<

<

id

<

|-

<

>
>
>
>
>
>
.

>
.

>
>
>
>
>
>

<

>

<

>
>
>
>

<

>

<

>
>
>

<

>

<

Now consider the grammar E->E+E | E*E | id and string is id+id*id

id

-|

<

<

<

<

<

<

>

<

<

<

>
>
>

>

<

<

>
>
>
>
>
>

=
.

>
.

>

>

<

>

>

TEJAS PATEL
8

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Parsing

2150708 System Programming (SP)


We will follow following steps to parse the given string: 1.
.

Scan the input string until first > is encountered 2.


.

Scan backward until < is encountered


3.

The handle is string between < and >


.

|- < Id > + < Id > * < Id > -|

..

Handle id is obtained between < >


Reduce this by E->id

..

E+ < Id > * < Id > -|

Handle id is obtained between < >


Reduce this by E->id

..

E+ E * < Id > -|

Handle id is obtained between < >


Reduce this by E->id

E+E*E

Remove all non terminal

+*

Insert |- and -|

|- +* -|
.

Place relation between operators

..

|- < +< * >-|

The * operator is surrounded by < >

indicates * becomes handle we have to re


E*E
.

|- < + >-|

+ becomes handle. Hence reduce E+E

|- -|

Parsing Done

Operator precedence parsing (Stack base (Algorithm))


Operator precedence parsing Algorithm:

Here, Consider parsing of the string


|- <id>a + <id> b * <id>c -|
according to grammar , where <id >a represents a.

Figure below shows steps in its parsing.

Figures (a)-(c) show the stack and the AST when current operator is '+', '*' and '-|'
respectively.

In Fig. (c), TOS operator .> current operator.

This leads to reduction of '*'. Figure (d) shows the situation after the reduction.

The new TOS operator, i.e. '+', .> current operator

This leads to reduction of '+' as shown in Fig. (e).

TEJAS PATEL
9

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Parsing

2150708 System Programming (SP)


Current Operator
+

(a)

Stack
SB,TOS

AST
a

||-

(b)

SB
TOS

b
|-

(c)

-|

a
b
c

SB
TOS

*
|-

-|
(d)

SB
TOS

*
b
(e)

-|

SB,TOS

c
+

|a

c
Q.7

Explain Shift Reduce parser


Shift reduce parser attempts to construct parse tree from leaves to root.
Thus it works on the same principle of bottom up parser.
A shift reduce parser requires following data structures
1) Input buffer 2) Stack
The parser performs following basic operation
1) Shift
2) Reduce
3) Accept
4) Error
Ex: consider the grammar E->E-E | E*E | id perform shift reduce parsing for string id-id*id
Stack

Input buffer

Action

Id-id*id$

Shift

$id

-id*id$

Reduce E->id

$E

-id*id$

shift

$E-

id*id$

shift

$E- id

*id$

Reduce E->id

$E-E

*id$

shift

$E-E*

id$

shift

$E-E*id

Reduce E->id

$E-E*E

Reduce E->E*E

TEJAS PATEL
10

$E-E

Reduce E->E-E

$E

Accept

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)
Q.8

Parsing

Compare top down and bottom up parser.


Top down parser
A parser is top-down if it discovers a parse tree top to bottom.
A top-down parse corresponds to a preorder traversal of the parse tree.
A leftmost derivation is applied at each derivation step.
Top-down parsers come in two forms

Recursive-Descent Parsing

Backtracking is needed (If a choice of a production rule does not work, we backtrack to
try other alternatives.)

It is a general parsing technique, but not widely used.

Not efficient

Predictive Parsing

Predict the production rule to be applied using lookahead tokens.

no backtracking

efficient

Needs a special form of grammars (LL (1) grammars).

Recursive Predictive Parsing is a special form of Recursive Descent parsing without


backtracking.

Non-Recursive (Table Driven) Predictive Parser is also known as LL (1) parser.

Bottom up parser

Bottom-up parsers build parse trees from the leaves and work up to the root.
Bottom-up syntax analysis known as shift-reduce parsing.

An easy-to-implement shift-reduce parser is called operator precedence parsing.


Bottom up parser having two techniques

Shift-reduce parsing

Shift input symbols until a handle is found. Then, reduce the substring to the nonterminal on the lhs of the corresponding production.

Operator-precedence parsing

Based on shift-reduce parsing.

Identifies handles based on precedence rules.

General method of shift-reduce parsing is called LR parsing.


Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the
leaves (the bottom) and working up towards the root (the top).

At each reduction step a particular substring matching the right side of a production is replaced
by the symbol on the left of that production, and if the substring is chosen correctly at each
step, a rightmost derivation is traced out in reverse.

Q.9

Regular expression and DFA for declaring a variable in c language.


Regular Expression to declare variable in c language
integer

[+|-](d)+

TEJAS PATEL
11

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


Parsing

2150708 System Programming (SP)


real number

[+|-](d)+.(d)+

real number with


optional fraction

[+|-](d)+.(d)*

identifier

l( l|d)*

DFA for declaring a variable in c


Figure shows a DFA for recognizing identifiers, unsigned integers and unsigned real numbers
with fractions. The DFA has 3 final states Id,Int and Real corresponding to identifier, unsigned
integer and unsigned real respectively. Note that a string like '25.' is invalid because it leaves
the DFA in state S2 which is not a final state.
State

Next Symbol

Start

Id

Int

Id

Id

Id

Int

Int

S2

Real

Real

Real

S2

Figure: DFS for integers, real numbers and identifiers

Q.10

Write algorithm for operator precedence parsing.


Data Structure:Stack: each stack entry is a record with two fields, operator and operand_pointer
Node: a node is a record with three fieldssymbol, left_pointer, and right_pointer.

Functions:Newnode(operator,

l_operatorand_pointer,

r_operand_pointer)

creates

node

with

appropriate
Pointer fields and returns a pointer to the node.
1. TOS:= SB-1; SSM=0;
2. Push |- on the stack.
3. Ssm=ssm+1;
4. x:=newnode(source symbol, null, null)
TOS.operand_pointer:=x;

TEJAS PATEL
12

Page

SAFFRONY INSTITUTE OF TECHNOLOGY


2150708 System Programming (SP)

Parsing

Go to step 3;
5. while TOS operator .> current operator,
x:=newnode(TOS operator, TOSM.operand_pointer, TOS.operand_pointer)
pop an entry of the stack;
TOS.operand_pointer:=x;
6. If TOS operator <. current operator, then
Push the current operator on the stack.
Go to step 3;
7. while TOS operator .= current operator, then
if TOS operator = |-- then exit successfully
if TOS operator =(, then
temp:=TOS.operand_pointer;
pop an entry off the stack
TOS.operand_pointer:=temp;
Go to step 3;
8. if no precedence define between TOS operator and current operator the report error and exit
unsuccessfully.

Q.11

Write complete grammar for an arithmetic expression containing operators +,*,$


using recursive specification and backusNaur Form (BNF) where $ is exponential
operator.
<exp> ::= <exp> + <term> | <term>
<term>::= <term> + <factor> | <factor>
<factor> ::= <factor> $ <primary> | <primary>
<primary>::= <id> | <constant> | (<exp>)
<id>::= <letter>|<id>[<letter>|<digit>]
<const>::=[+/-]<digit> | <const><digit>
<letter>::= a|b|c||z
<digit>::= 0 | 1| 2| 3| 4| 5| 6| 7| 8| 9

TEJAS PATEL

Page

13