Sie sind auf Seite 1von 53

SYSTEM PROGRAMMING Introduction

The designer expresses the ideas in terms related to application domains of the software. To implement these ideas, their description/ideas have to be converted in terms related to execution domain of computer system. Application Execution Domain Domain |____________| Semantic Gap Semantic Gap has many consequences, some of the important are large development times, large development efforts, poor quality of software etc. Programming Language (PL) tackles these issues. Software implementation using a PL introduces a new domain, PL domain. Application PL Execution Domain Domain Domain |____________|_________| Specification Execution Gap Gap Language Processor: A Language processor is software which bridges a specification or execution gap. Source Program (SP): Program form input to a language processor. Language which is use to write this program is Source Language. (E.g. C, C++) Target Program (TP): Output from language processor. Language which is used tow write this is known as target language. (E.g. Machine language, Assembly) Language Translator: Bridges execution gap to machine language. E.g. Compiler, assembler De-compiler / De-assembler/ De-translator: Bridges same execution gap but in reverse order. Preprocessor: It is a language processor which bridges execution gap but it is not language translator. C++ C++ C++/C Program -- Pre-processor - Program | Errors ---------------------------------------------------------------------------C++ C++ Machine Program -- Translator - Language Program | Errors

Interpreters: An interpreter is a language processor which bridges an execution gap without generating a machine language program.

Language Types
Problem Oriented Language: A classical solution is to develop a PL such that PL domain is very close to or identical to application domain. PL features now directly aspects of application domain. This leads to a small specification gap. Such PL can be used for specific applications hence they are called problem oriented languages. They have large execution gap. Application PL Execution Domain Domain Domain |_______|________________| Specification Execution Gap Gap Procedure Oriented Language: Provide general purpose facilities required in most application domains. Such a language is independent of specific application domains and result in large specification gap. E.g. c, c++

Program Translation: Program translation model bridges the execution gap by translating a program written in a PL, called source program (SP), in to equivalent program in the machine language or assembly language of the computer system, called target program. Translator Source Program----> (SP) m/c. language program

----

---Target Program (TP) || DB | Data

| Errors

Program Interpretation: Interpreter reads source program and stores in its memory. During interpretation it takes a source statement; determine its meaning to implement it. Execution Cycle: 1. Fetch the instruction 2. Decode the instruction and determine operation to be performed 3. Fetch the operands 4. Execute the instruction Interpretation Cycle: 1. Fetch the statement 2. Analyze the statement and determine its meaning 3. Fetch operands 4. Execute meaning

Language Processing
Language Processing = Analysis of SP + Synthesis of TP We refer to collection of language processor components engaged in analysis of a source program as the analysis phase of language processor. Components engaged in synthesizing a target program is known as Synthesis phase. Analysis Phase:

1. Lexical analyzer governs formation of a valid lexical units 2. Syntax analyzer which governs formation of valid statements in source
language.

3. Semantic analyzer - which associate meaning with valid statements of the


language. Synthesis Phase: Synthesis phase is concerned with the construction of target language statements which have the same meaning as a source statement. It consists of: Creation of data structure in the target program Generation of target code Phases and passes of Language Processor:

SP

Language Processor Analysis Synthesis Phase Phase

TP

Errors

Errors

Analysis of source program can not be immediately followed by synthesis of target statement because of two reasons: 1. Forward references 2. Memory requirements and language processor organization

Forward Reference: A forward reference of a program entity is a reference to the entity which precedes its definition in the program. e.g. Interest = (p * r * n) / 100; .. . float Interest; Reference to interest in the assignment statement is a forward reference because declaration of interest occurs later in the program. Data Type of interest is not known while processing the assignment statement, Correct code can not be generated for it in a statement by statement manner. This leads to multi pass model of language processing. Language Processor Pass: A language processor pass is the processing of every statement in a source program, or its equivalent representation, to perform a language processing function (or a set of functions.) e.g. Pass I: Perform analysis of the source program and note relevant information Pass II: Perform synthesis of target program. Language processor performs certain processing more than once. In pass I, it analyses the source program to note the type information. In pass II, it again analyze source program to generate target code using the type information noted in Pass I. Intermediate representation: an intermediate representation (IR) is a representation of a source program which reflects the effect of some, but not all analysis and synthesis tasks performed during language processing.

Language Processor S.P. Analysis Phase Synthesis Phase T.P.

Intermediate Repn. (IR)

First pass performs analysis of source program and reflects(update) results in intermediate representation. The second pass reads and analyze IR, instead of source program to perform synthesis of target program. This avoid repeated processing of source program. First pass is concerned exclusively with source language issues. It is also called Front End. Second Pass is concerned with program synthesis for a specific target language. It is called backend of language processor. Front-end and back-end of a language processor, need to co-exist in memory.

Assemblers
Assembly language is low-level programming language, it is machine dependent. [LABEL] <Operation-Code> <Operand1>, [<Operand2> ]

Features of assembly language: 1. Mnemonic operation code: It keeps us free from remembering numeric op codes. It also helpful to find out mistakes in writing operation code.

2. Symbolic Operands: Are Symbolic names given to data items or

instructions. Assembler binds this symbolic name to memory locations.

3. Data declarations: We can declare data items and there by we dont have to
convert data items into machine format.

Instruction Op-Code 00 01 02 03 04 05 06 07 08 09 10 START 101 READ N MOVER BX , ONE MOVEM BX, TERM LOOP MULT BX,TERM MOVER CX,TERM ADD CX, ONE MOVEM CX, TERM COMP CX,N BC LE, LOOP MOVEM BX, RESULT PRINT RESULT STOP N DS 1 RESULT DS 1 ONE DC 1 TERM DS 1 END

Mnemonic Op-Code STOP ADD SUB MULT MOVER MOVEM COMP BC DIV READ PRINT

101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116

09 04 05 03 04 01 05 06 07 05 10

0 2 2 2 3 3 3 3 2 2 0

113 115 116 116 116 115 116 113 104 114 114

001

Assembly language Statements: 1. Imperative Statements (e.g ADD, MULT) 2. Declaration Statements (e.g. DS, DC) 3. Assembler Directives (e.g START, STOP) Assembler Phases: 1. Analysis Phase 2. Synthesis Phase 1. Analysis Phase Primary function is generation of symbol table. Need to associate address with each of program elements (e.g. instructions , data) Use of LC (location counter) for memory allocation. Location Counter is used to contain address of next instruction of target program. Checks validity of Mnemonic Op-code It makes entry in symbol table when new label is found in statement It finds length of instructions and adds it to LC. So LC would point to address of next instruction This is also called LC Processing Inst ADD SUB | Analysis Phase ------------ -> TP | LOOP 104 N 113 Symbol Table Synthesis Phase: Obtain the machine Opcode corresponding to the mnemonic from the Mnemonic table Obtain address of a memory operand from Symbol table Synthesize a machine instruction or machine form of a constant as needed. Op-Code Length 01 2 02 2 Mnemonics Table | SP Synthesis -Phase ------|

PASS Structure of Assemblers


Two Pass Translation Two pass translation of assembly language program can handle forward reference easily. LC Processing is done in first pass and symbols defined in table are entered in Symbol Table. Second pass synthesize target program using Symbol Table. First pass does analysis of source program while second does synthesis of target program First pass also construct IR from source program.

Single Pass Translation LC processing and construction of symbol table is done same way as two pass Forward reference is tacked using process called BACKPATCHING Operand field of an instruction containing forward reference is left blank initially. Address of forward reference is put when definition is encountered E.g. MOVER BX, ONE .. Here ONE is forward reference so second oprand address is left blank initially It makes forward reference entery to TII (table of Incomplete Insturctions). E.g (101, ONE) When END statement is processed, the symbol table would contain address of all symbols and TII would contain information about forward references Assembler now processes each entry of TII to complete concerned instruction.

Pass I of Assembler Pass I is using three data structure OPTAB Mnemonic table/ operand table, SYMTAB Symbol table LITTAB Literals Mnemonic Op-code class info symbol address length ADD IS(Imparative) (04,1) DS DL(Declarative) R#7 START AD(Assembler R#11 Directive) Loop Next Last A address =abc =2 : : literal no #1 #3 POOLTAB 202 214 216 217 SYMTAB 1 1 1 1

OPTAB Literal 1 2 3

LITTAB

OPTAB contains op-code, class and mnemonic info. Class indicates if the opcode is Imperative Statements (IS), a declaration statement (DL) or assembler directive(ID) Literal table contains entry for each literals Processing of assembly statement begins with processing of label, if it contains a symbol, the symbol and LC is entered into SYMBTAB. Then assembler interprets OPTAB entry for mnemonic information, class is checked if it is Imperative statements, the length of machine instruction is added to LC and entered into SYMTAB. If it is declarative statement or assembler directive, the routine mentioned in mnemonic info is called to perform. For e.g. in case of DS routing no 7 is called, which finds amount of memory required by this statement and updates LC and symbol table entry. First pass uses LITTAB to collect all literals used in a program. Awareness of different literal pools is maintained using auxiliary pool table. This table contains literal number of starting literal of each literal pool.

PASS 1 ALGORITHM: 1. loc_cntr:=0 (default value), pooltab_ptr:=1;POOLTAB[1]=1; littab_ptr:=1; 2. While next statement is not END a. If label is present? this_label = symbol in label field Enter (label, loc_cntr) in SYMTAB b. If LTORG statement is present Allocate Literals, Update LITTAB and POOLTAB. Updated Loc_cntr. Increment pooltab_ptr by 1. (pooltab_ptr=pooltab_ptr+1) c. If START or ORIGIN statement then loc_cntr = value of operand field. d. If EQU statement then This_addr = value of address Correct SYMTAB Entry. e. If DS/DC - declaration statement then Calculate the storage SIZE required by DC/DS Update loc_cntr Generate IC for statement f. If it is imperative statement then code = machine OP-code from OPTAB Update loc_cntr by length of instruction Check for Operand , if Operand is literal then Enter it into literal table Else Enter SYMBOL in SYMTAB (symbol table) Generate IC 3. END statement is encountered a. Perform step 2b b. Generate IC C. Go to PASS II

PASS II ALGORITHM: 1. code_area_address= address if code area; pooltab_ptr=1; loc_cntr=0 2. While next statement is not END a. Clear machine_code buffer b. If label is present? this_label = symbol in label field Enter (label, loc_cntr) in SYMTAB c. If LTORG statement is present Process Literals similar to processing of constant in DC. Assemble literals in machine code buffer. Size = size of memory area required for literals Increment pooltab_ptr by 1. (pooltab_ptr=pooltab_ptr+1) d. If START or ORIGIN statement then loc_cntr = value of operand field. size = 0 e. If DS/DC - declaration statement then if dc statement then assemble the constant in machine_code_buffer size = size of memory required by DC/DS f. If it is imperative statement then Get Operand address from SYMTAB or LITTAB Assemble instruction in machine code buffer size = size of instruction. g. If size <> 0 Move contenets fo machine_code_buffer to address code_area_address + loc_cntr loc_cntr = loc_cntr + size 3. END statement is encountered a. Perform step 2b and 2F b. write code_area into Output File.

A SINGLE PASS ASSEMBLER FOR IBM PC Single pass assembler for the intel 8088 processor used in the IBM PC. A single pass assembler for the intel 8088 processor used in the IBM PC. The architecture of Intel 8088 The intel 8088 microprocessor supports 8 and 16 bit arithmetic, and also provides special instructions for string manipulation. The CPU contains the following features Data registers AX, BX, CX and DX Index registers SI and DI Stack pointer registers BP and SP Segment registers Code, Stack, Data and Extra. AH BH CH DH Stack Registers BP SP Index Registers SI DI Segment Registers CS DS SS ES AL BL CL DL

Data Registers --------------------CODE SGMNT ---------------------------------------DATA SGMNT --------------------

RAM (1 MB) -----------------

The Intel 8088 provides addressing capability for 1 MB of primary memory. The memory is used to store three components of a program, program code, data and stack. The Code, Stack and Data segment registers are used to contain the start addresses of these three components. Address of a particular instruction is obtained by addition Segment Starting address to logical 16 bit address offset. Each segment register is of 16 bit and thereby segment size can not be higher than 2^16 = 64 KB. A large program may contain more than one segment.

8088 Addressing Modes:

Addressing Mode immediate register direct register indirect

Example MOV SUM, 1234H MOV SUM, AX MOV SUM, [1234H] MOV SUM, [BX] MOV SUM, CS : [BX]

Description data = 1234H AX contains the data data at 1234H data at location of BX value

base addressing index base and index Intel 8088 instructions: Arithmetic Instructions: Assembly Statements

MOV SUM, 12H [ BX] MOV SUM, 34H [SI] MOV SUM,56H [ SI] [BX]

data at = 12 H + (BX) data = 34H + SI 56h + (si) + (bx)

ADD AL,BL ADD AL, 12H (SI) ADD AX, 3456 H

Control Transfer Instructions Branching Example Move 80 bytes from source index address to destination address. Assembler Directives Declarations A DB 25 B DW ? C DD 6 DUP(0) XYZ DB ? ABC EQU XYZ ; ABC is new name for XYZ . PURGE ABC ; ABC <> XYZ ABC EQU 25 MOV SI, 100h MOV DI, 200h MOV CX, 50h CLD REP MOVSB

EQU & Purge

SEGMENT , ENDS

Segment and ENDS is used to declare starting and ending of segment CODESG SEGMENT instructions CODESG ENDS DATASG SEGMENT DATASG ENDS

ASSUME

ASSUME SEGMNT_REG :SEGMENT_NAME e.g ASSUME DS : DATASG FACTORIAL PROC FACTORIAL ENDP

PROC AND ENDP delimit the body of procedure

Example Assembly Language Program

Sr No.

Statement SEGMENT ASSUME CS:CODE, DS:DATA MOV AX, DATA MOV DS, AX MOV CX, LENGTH STRNG MOV COUNT,0000 MOV SI, OFFSET STRNG ASSUME ES:DATA, DS:NOTHING MOV AX, DATA MOV ES, AX COMP:CMP [ SI],'A' JNE NEXT MOV COUNT, 1 NEXT: INC SI DEC CX JNE COMP CODE ENDS DATA SEGMENT ORG 1 COUNT DB STRNG DW DATA ENDS END

Offset

001 CODE 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023

0000 0003 0005 0008 0011

0014 0017 0019 0022 0024 0027 0029 0030

0001 50 DUP(?)

0002

Forward Reference: A symbolic name may be forward referenced in a variety of ways. When used as a data operand in a statement, its assembly is straightforward. An entry can be made in the table of incomplete instructions (TII) This entry would identify the bytes in code where the address of the referenced symbol should be put. When the symbols definition is encountered, this entry would be analyzed to complete the instruction. When the definition of COUNT is encountered in statement 20, information concern ing these forward references can be found in the table of incomplete instructions Segment Registers table: ASSUME statement indicates that a segment register contains the base address of a segment. The assembler represents this information by a pair of the form (segment register segment name). This information can be stored in a segment registers table (SRTAB). SRTAB is updated on processing an ASSUME statement. For processing the reference to a symbol symb in an assembly statement, the assembler accesses the symbol table entry of symb and finds (segsymb, offset where segsymb is the name of the symbol containing the definition of symb. It uses the information in SRTAB to find the register which contains segsymb. Let it be register r. It now synthesizes the pair (r, offsetsymb). This pair is put in the address field of the target instruction. A new SRTAB is created while processing an ASSUME statement. This SRTAB differs from the old SRTAB only in the enthes for the segment registers named in the ASSUME statement. Since many SRTABs exist at any time, an array named SRTAWARRAY is used to store the SRTABs. This array is indexed using a counter srta b_no.

Forward Reference Table Instead of TIl, a forward reference table (FRT) is used. Each entry of FRT contains the following entries: (a) Address of the instruction whose operand field contains the forward ref erence (b) Symbol to which forward reference is made (c) Kind of reference (e.g. T: analytic operator TYPE, D : data address, S self relative address, L : length, F: offset, etc.) (d) Number of the SRTAB to be used for assembling the reference.

LC Processing LC processing in this algorithm differs from LC processing in the first pass of a two pass assembler (see Algorithm 4.1) in one significant respect. In Intel 8088, the unit for memory allocation is a byte, however certain entities require their starting byte to be aligned on specific boundaries in the address space. For example, a word requires alignment on an even boundary, i.e. it must have an even start address. Such alignment requirements may force some bytes to be left unused during memory allo cation. Hence while processing DB statements and imperatives, assembler first aligns

SRTAB Two SRTABs would be built for the program of Fig. 4.25. SRTAB#1 con tains the pairs (CS, CODE) and (DS, DATA) while SRTAB#2 contains the pairs (CS, CODE) and (ES, DATA). While processing statement 6, SRTAB#1 is the current SRTAB. Hence the FRT entry for this statement is (008, COUNT, D, SRTAB#l). Similarly the FRT entry for statement 13 is (024, COUNT, D, SRTAB#2). These entries are processed on encountering the definition of COUNT, giving the address pairs (DS, 001) and (ES, 001). (Note that FRT entries would also exist for statements 5, 7 and 12. However, none of them require the use of a base register.)

Clear ERRTAB
A

Read Next Statement


Look up mnemonic in MOT and call appropriate routine

Align LC Allocate Storage Convert Constant D Literal Opera nd? E Assemble Instruction D Put into target code buffer

Seal present SRTAB from new SRTAB B

Obtain M/C Op-code, data alignment requirements

Manipulate LC

Evaluate Operand Expressions

Opera nd Symb ol?

Enter in CRT
Alread y define d?

Enter SRTAB #, instn, address, assignment code & stat# in FRT

Check Alignment & addressability, Generate seg. Base, offset for Symbol

Lable presen t?

Y
C Enter in SYMTAB, CRT Process Forward Reference

Update LC

List the stmt. Report errors . Clear ERRTAB

END? A

Report undefined Symbols and FRT

Process cross-ref Listing

STO P

1. code_area_address := address of code_area; srtab_no= 1; LC :=O; stmt_no := 1; SYMTAB_segment_entry := 0; Clear ERRTAB, SRTABARRAY. 2. While next statement is not an END statement (a) Clear machine_code_buffer (b) if label is present then this_label := symbol in label field; (c) If an EQU statement (i) this_address := value of operand expression; (ii) Make an entry for this Label in SYMTAB with offset := this_addr Defined =yes; Owner_segment := owner_segment of operand symbol; Source_stmt_no : stmt_no; (iii) Enter stmt_no in the CRT list of the label in the operand field. (iv) Process forward references to this_label; (v) size := 0; (d) If an ASSUME statement (i) Copy the SRTAB in SRTABARRAY [srtab_no] into SRTAB ARRAY [srtab_no + 1] (ii) srtabno := srtab_no+ 1; (iii) this_register := register mentioned in the statement. (iv) this_segment := entry number of SYMTAB entry of the segment appearing in operand field. (v) Make the entry (this_register, this_segment) in SRTAB_ARRAY [srtab_no] (This overwrites an existing entry for this_register.) (vi) size := 0; (e) If a SEGMENT statement (i) Make an entry for this_label in SYMTAB. (ii) Set segment name ? true; (iii) SYMTABsegrnenLentry entry no. in SYMTAB; (iv) LC:= 0;(v) size:= 0;

(f) If an ENDS statement then SYMTAB_segment_entry := 0; (g) If a declaration statement (i) Align LC according to the specification in the operand field. (ii) Assemble the constant(s), if any, in the machine_code_buffer. (iii) size := size of memory area required; (h) If an imperative statement (i) If operand is a symbol symb then enter stmtno in CRT list of sytnb. (ii) If operand symbol is already defined then Check its alignment and addressibility. Generate the address specification (segment register, els offset) for the symbol using its SYMTAB entry and e SRTAB ARRAY tjsrtab Make an entry for symbol in SYMTAB. Defined no; Enter (srtab LC, usage code, stmLno) in FRT. (iii) Assemble instruction in machine_code_buffer. (iv) size := size of the instruction; (i) If size 0 then (i) If label is present then Make an entry for this _label in SYMTAB. ownersegment = SYMTAB.segment_entry; Defined := yes; offset := LC; source_stmt_# := stmt_no; (ii) Move contents of machine_code to the address code _address; (iii) code_area_address code_area_address + size; (iv) Process forward references to the symbol. Check for alignment a addressability errors. Enter errors in ERRTAB. (v) List the statement with errors contained in ERRTAB. (vi) Clear ERRTAB.

Compilers
Aspect of Compilation Compiler bridges the semantic gap between PL domain and Execution domain. Two aspect of compilation process are: 1. Generate Code to implement meaning of source in execution domain. 2. Provide diagnostic for violation of PL semantics in source code. PL Features: 1. Data Type 2. Data Structures 3. Scope Rules 4. Control Structure.

1. Data Type: A data type is specification of a. Legal value for variable of the type and b. legal operations on legal value of the type. E.g. int i, j; j = i + 2; 2. Data Structures: Language permits declaration and use of data structures like arrays, stack, records and link list etc. To compile reference to an element or structure, compiler must develop memory mapping strategy to access memory allocated to the element. e.g. arrays , structures etc. 3. Scope Rules: Scope rules determines accessibility of variables declared in different block of a program. Scope of a program entity is that part of program where entity is accessible. e.g. main() { int i = 2; { int i = 5; } } 4.Control Structures: Control structure of a language is the collection of language features for altering flow of control during program execution. This includes branching, conditional execution, iteration control and procedure calls or functions. int i, x [10]; x = i + 5;

Memory Allocation Strategy of Compilers it involves three tasks: 1. Determine amount of memory required to represent the value of data item 2. Use an appropriate memory allocation model to implement the lifetime and scope 3. Determine appropriate memory mapping to access the value in non scalar data item e.g. Value in an array element. First task is accomplished during semantic analysis of data declaration statements. Memory allocation models: 1. Static and Dynamic memory allocation 2. Memory allocation in Block Structured Languages Definition: Memory binding is an association between the memory address of data item and the address of memory area. Memory allocation is procedure used to perform memory binding. The binding ceases to exist when memory is de-allocated. Memory binding can be static or dynamic in nature. In static memory allocation, memory is allocated to a variable before the execution of program begins. Static memory allocation is typically performed during compilation. No memory allocation or de-allocation is performed during execution. So variables remains permanently allocated; allocation to a variable exists even if the program unit in which it is defined is not active. In dynamic memory allocation, memory bindings are established and destroyed during execution of program. Typical example for static memory allocation is FORTRAN. Dynamic memory allocation is used in PL/I, Pascal, Ada,C etc. Dynamic memory allocation has two types 1.Automatic memory allocation and 2.Program controlled allocation. Automatic allocation: Memory binding is performed at execution start time of program unit. Program controlled allocation: Memory binding is performed during execution of a program unit. Automatic dynamic allocation, memory is allocated to variable declared in a program unit, when program unit is entered during execution and de-allocated when program unit is exited.

Program controlled dynamic allocation: A program can allocate and de-allocate memory at arbitrary point during its execution. In dynamic memory allocation, address of memory area of variables can not be determined at time of compilation. Dynamic memory allocation is implemented using stack and heaps, thus all access is done through pointer and will be slower than static memory allocation. Automatic dynamic memory allocation is done using stack (LIFO) structure. Program controlled dynamic memory allocation is implemented using a heap. Dynamic memory allocation advantages: 1. Recursion can be implemented easily because memory is allocated when a program unit is entered during execution.

2. Dynamic memory allocation can also support data structures whose size are
determined dynamically. E.g. array x[m,n];

Code A Data A Code B Data B Code C Data C //////


Static

Code A Code B Code C Data A //////////// //////////// ///////////

Code A Code B Code C Data A Data B /////////////// //////////////

Code A Code B Code C Data A Data C ///////////// ///////////////

Dynamic Memory Allocation

Memory allocation in Block structured languages: A block is a program unit which can contain data declarations. A program in a block structured language is a nested structure of blocks. Block structured language uses dynamic memory allocation. Scope Rule: A variable declared in a block is accessible in that block and all inner blocks provided same variable is not re-declared again in inner block. Scope of a variable: If a variable vari is crated with name namei in block B a. vari can be accessed in any statement in block B b. vari can be accessed in any statement situated in block B which is enclosed in B, unless B contains a declaration using the same name. Memory allocation and access Automatic dynamic memory allocation is implemented using extended stack model. Each record in stack has two reserved pointer. Each stack record accommodates the variables for one activation of a block, hence we call it an activation record (AR).

/////////////// \\\\\\\\\\\\\\\ X

0 (ARB)| Reserved Pointers ARB 1 (ARB)| TOS

ARI == Activation Record for the ith activation of block A Dynamic Pointer: First Reserved Pointer in blocks AR points to activation record of its parent. This is called the dynamic pointer and has address 0 (ARB). Dynamic pointer is used for de-allocating an AR.

{ int x; { char y; { int z, w; z = 5; x = z; } } } /////////////// \\\\\\\\\\\\\\\ X /////////////// \\\\\\\\\\\\\\\ y /////////////// \\\\\\\\\\\\\\\ z w

Sample Program

ARa

ARb

ARB ARc TOS

DYNAMIC ALLOCATION AND ACCESS Accessing Non local Variables: A non local variable nl_var of a block b_use is a local variable of some block b_defn enclosing b_use. According to rules of a block structured language, when b_use is in execution, b_defn must be active. Hence ARb_defn exists in stack and nl_var is to be accessed as start of address of ARb_defn + dnl_var Where dnl_var is displacement of nl_var in AR. Static Pointer: Access to non-local variables is implemented using second reserved pointer in Activation Record (AR). This pointer which has address 1 (ARB) is called static pointer.

Displays: For large level of difference, it is expensive to access non-local variables using static pointers. Display is an array used to improve efficiency of non local access. Symbol Table Requirements: Symbol table Is required in dynamic memory allocation and access. Recursion Recursive procedure are characterized by fact that many invocation of a procedure co-exist during execution of program. A copy of local variable must be allocated for each execution of program. This can be done using extended stack model of memory allocation. Stack Based Memory Allocations Limitations/Difference: 1. Stack based allocation are not adequate for programmed memory allocation using malloc and free. Compiler must use heap for such allocations. 2. Access to variables are implemented through pointers associated with individual variables instead of AR. 3. Garbage collection or Free List are used to implement reuse of de-allocated memory. 4. Stack is also inadequate for multi-activity or multi-threaded programs.

COMPILATION OF EXPRESSIONS:
a*b+cd/e Issues in code generation for expressions are: 1. Determination of evaluation Order of operators in an expression 2. Selection of instruction to be used in the target code. 3. Use of registers and handling partial results. Evaluation order depends on operator precedence for e.g. an operator which has higher precedence level compared to its neighbors must be evaluated first. Choice of instruction in target code depends upon: 1. the type and length of operands 2. The addressability of each operand. i.e. where operand is located and how it can be accessed. Operand Descriptor is used to maintain type, length and addressability of each operand. Partial result is the value of some sub-expression computed while evaluating an expression. For efficiency partial results are maintained in CPU registers but some results have to be moved to memory when none of CPU registers are free. Register Descriptor is used to maintain details of which partial result is stored in register. Operand Descriptor: It contains following fields: 1. Attributes: Contain the type, length and misc. informations. 2. Addressability: Specifies where operand is located and how it can be accessed. a. Addressability code: Value M for in Memory operand and R for in Register Operand b. Address: Address of CPU register or memory word. Code generation for Expression a * b MOVER AREG, A MULT AREG, B Operand Descriptor: 1 2 3 (int ,1) (int ,2) (int ,3) M, addr (a) M, addr (b) R, addr (AREG) descriptor of a descriptor of b descriptor of a * b

Register Descriptor: A register descriptor has two fields: 1. Status : Contains the code free or occupied to indicate register status. 2. Operand descriptor no#: If status = occupied, this field contains the descriptor# for the operand contained in the register. e.g. Register descriptor for AREG after generating code a * b.

Occupied

#3

Generation of Instruction: When operator op is reduced by parser, the function codegen is called with operator op and its descriptors of operands as parameters. Single instruction can be generated to evaluate operator if one of the oprand is in register and another is in memory. If both operands are in memory, an instruction is generated to move one of them into register and then operator op is evaluated. Saving Partial Results: If all registers are occupied , when operator op is to be evaluated, a register r is freed by copying its value to a temporary to hold its value. codegen (operator , op1, op2) /* codegen(* , a, b) { if op1.Addressibilty_code = R /*case 1 */ if operator = + Generate ADD AREG, OP2 /* same for other operators */ else if op2.Addressability_code =R /*case 2*/ if operator = + Generate ADD AREG, OP1 /*same for other operators */ else /*case 3*/ if Register_descritor.status = Occupied /* Save Partial Result */ Generate MOVEM AREG, Temp[j] j=j+1 Operand_Descriptor[Register_descr.Descritpor#] = (<type>,Temp[j]) Generate MOVER AREG, Op1 if Operator = + Generate ADD AREG, Op2 /*same for other operators*/ /* Create a NEW Descriptor Operand value is in register AREG */ I=I+1 Operand_Descriptor[i] = (<type>, (R, Areg(AREG))) Register_Descriptor = (Occupied, I ) Return I }

Consider an expression: a * b + c *d * ( e + f) + c *d

+ / + / * / a \ * \ / \ b * + / \ /\ c d e f / c \ * \ d

Intermediate code for Expression


Postfix String Conversion: consider expression: a + b * c + d * e ^ f Postfix Expression: a b c * + d e f ^ * + Postfix string is very popular in non-optimizing compilers due to ease of generation in use. Code generation from postfix string can be performed using a stack of operand descriptors. Operand descriptors are pushed on stack as operands appear in string. When an operator with same arity k appears in string, k descriptors are popped off the stack. Descriptor for partial result is then pushed again on the stack. So in example stack contains descriptor a and partial result b * c when + is encountered. Triples: A triple is representation of an elementary operation in the form of pseudomachine instruction.

Operator

Operand1

Operand2

Each operand of a triple is either a variable or constant or the result of some evaluation represented by other TRIPLE.

Expression: a b c * + d e f 1 2 3 4 5 * + ^ * + b 1 e d 2 c a f 3 4

^*+

Expression Trees
Postfix Evaluation order may not lead to most efficient code for an expression. So compiler must analyze an expression to find out best evaluation order for its operators. Expression tree is a syntax tree which depicts the structure of an expression.

COMPILATION OF CONTROL STRUCTURES


Control structure of a programming language is the collection of language features which governs the sequencing of control through a program. Control Transfer, Conditional Execution and iterative constructs: Control transfer implemented through conditional and unconditional goto. When target language of a compiler is machine language, compilation of control transfer is similar to assembly of forward or backward goto in assembly program. Control transfer like IF, FOR or WHILE cause significant semantic gap between the PL domain and execution domain because control transfer are implicit rather than explicit. This gap is bridged in two steps: 1. Control structure is mapped into an equivalent program containing explicit goto. 2. The program is translated into assembly language program.

if e1 then S1; else S2; S3; while (e2) S1; S2; Sn; End While; .other statements

if NOT e1 then goto int1; S1; goto int2; int1 : S2; int2 : S3; int3: if NOT e2 then goto int4; S1; S2; Sn; goto int3; int4: other statements

if NOT e1 then goto int1; S1; goto int2; int1 : S2; int2 : S3;

{ ..instruction for NOT e1} BC, int1 {instruction for S1 } BC ANY , int2; INT1 : { instruction for S2} INT2 : {instruction for S3}

PROCEDURE or FUNCTION Calls A function call example, x = function1(y, z) + b * c; Statement executes body of function1 and returns its value to the calling program. In addition the function call may also result in side effects. Side Effects: A side effect of a function or procedure call is a change in the value of a variable which is not local to the called function. Procedure call only achieves side effects; it does not return a value. While implementing a functional call, the compiler must ensure: 1. 2. 3. 4. 5. Actual parameters are accessible in called function. The called function is able to produce side effects according the rule of PL. Control is transferred to, and is returned from the called function. The function value is returned to the calling program. All other aspects of execution of calling programs are unaffected by function call.

Compiler uses a set features to implement function calls: 1. Parameter list: Contains a descriptor for each actual parameter of function call. 2. Save Area: Called function saves all CPU registers in this area before starting execution. 3. Calling conventions: shared by calling and caller functions a. How parameter list is accessed b. How the save area is accssed c. How transfer of control is made and return is implemented d. How the machine value is returned to calling program

Parameter passing mechanism: 1. Call by value: Actual parameters are passed to function. These values are assigned to the corresponding formal parameters. Pass by value can take place in only one direction. I.e. function can not return value via these parameters. Function can not produce side effects of its parameters. Call by value is generally used in built-in functions of the language. Its main advantage is simplicity. Function may allocate memory to formal parameters and copy value of actual parameter into this location at every call. 2. Call by value-result: This mechanism extends the call by value by copying values of formal parameters back into corresponding actual parameters at return. Side effects are realized at return. This mechanism incurs higher overheads. 3. Call by Reference: In this mechanism, the address of actual parameter is passed to the called function. If parameter is an expression, its value is computed and stored in temp location and its address is passed to called function. If parameter is an array element, its address is similarly computed at the time of call Change in value of parameters are reflected back to original called function. 4. Call by name: This parameter transmission mechanism has same effect of its every occurrence of a formal parameter in the body of the called function is replaced by name of corresponding actual parameter. Actual parameter corresponding to a formal parameter can change dynamically during execution of a function. This makes the call by name mechanism powerful. C Call by value, reference Pascal Call by value, reference PL/ I Call by reference only Ada Choice of call by value result or reference ALGOL -60 Call by value and call by name

Interpreters
Notations: tc = Average compilation time per statement te = Average execution time per statement ti = Average interpretation time per statement Compilers and interpreters both analyze the source statement to determine its meaning. During compilation analysis of statement is followed by code generation, while during interpretation it is followed by implementing its meaning. Assuming tc == ti. Assuming tc = 20 x te. Execution is 20x faster as compared to compilation Consider program P. size(P) - represents the number of statemtns in P stmts_executed(p) represents no of statements executed CPU time required for Compilation and Interpretation: size(p) = 200. P consist of: 20 statements + 10 iteration (8 stmts) + 20 statements for printing result = 120 statements

Compilation model =200. tc + 120.te =200 .tc + 6.tc =206.tc User of Interpreters:

Interpretation Model = 120.ti = 120.tc

1. Efficiency in certain environments and simplicity. 2. It is better to use interpreter for a program P where statements_exeucted (p) < size (p). 3. It is simpler to develop interpreters as compared to compilers as interpreters do not involve code generations.

Components of Interpreters 1 Symbol Table: It holds information concerning entities in the source program. 2 Data Store: The data store contains values of the data items declared in the program being interpreted. The data store consists of set of components; each component is an array containing elements of distinct type. 3. Data manipulation routines: A set of data manipulation routines contains every legal data manipulation action in the source language. ----- Process ---On analyzing source statements, if statement is declaration statement, interpreter locates a component in Data Store and value is stored in Data Store. Memory mapping of declaration is stored in Symbol Table. If statement is a = b + c, where a , b and c are of same type of variablethen interpreters executes following routines: add (b, c, result); assign( a, result); Advantages: 1. Meaning of source statement is implemented through execution of routine rather than code generation. This simplifies interpretation process. 2. Avoiding generation of machine language instructions makes interpreter portable.

Design and Operation of Interpreter


Program Interpreter (source , output) type symentry = record Record: array[1..10] of character; Type: character; Address: integer; endtype; var symtab : array [1..100] of symentry; realvar : array [1100] of real; ivar : array[1100] of integer; r_tos : 1100; i_tos : 1.100; Procedure assign (address1 : interger, value : integer) begin ivar[addr1] = value End procedure add (sym1, sym2 : symentry) begin . If (sym1.type = real && sym2.type =int) then addrealint(sym1.addrss, sym2.address) .. end Procedure addrealint( addr1, addr2 : integer) begin rvar[r_tos] = rvar[addr1] + ivar[addr2] end begin { MAIN PROGRAM} r_tos = 100; i_tos = 100; . analyze source statement and call appropriate route end

E.g.

real a, b integer c c=7 b = 1.2 a=b+c type real real int address 8 13 5

Symb a b c

Rvar

Ivar c

a b

i_tos r_tos PURE & Impure Interpreters Pure Interpreters Source program is retained in the form all through its interpretation. This incurs subtatitial analysis overheads while interpreting statements. Impure interpreters: It performs some preliminary processing of the source program to reduce analysis overhead during interpretation. e.g. Preprocessor converts program to some IR (intermediate representation) during interpretation.

Linkers
Execution of a program written in a language L involves following steps: 1. Translation / Compilation of the program 2. Linking of the program with other program / library files 3. Relocation of program to execute 4. Loading of program in the memory for purpose of execution.

Source Prog

Translator Linker Loader ------ Binary Program \ / \ / Object Modules Binary Programs

Relocation is done by linker or loader due to one of the two reasons: 1. Same translated address may have been used in object modules of library program that needs to be linked to our program. 2. Operating system may requires that a program should execute from a specific area of memory. Address of Program Entities: 1. Translation Time Address: Address assigned by translator / compilers 2. Linked Address: Address assigned by linker 3. Load time address: Address assigned by the loader.

Origins of program: 1. Translated Origin: Address of origin / Start address assigned to program by translator. 2. Linked Origin: Address of the origin assigned by the linker while producig a binary program. 3. Load Origin: Address of origin/ starting address assigned by the loader while loading program for execution.

Relocation and Linking:


Address Sensitive Program: it contains instructions or data addresses of some absolute address of memory. Program Relocation: is the process of modifying the address used in the address sensitive instructions of a program such that the program executes correctly from the designated area of memory. If link origin <> translated origin then .. relation must be performed by linker If load origin <> link orign then relocation must be performed by loader Absolute Loader: If load origin = link orign Such a loader is called Absolute Loader. That is, relocation is not performed by loader. Relocating Loader: If loader performs relocation.

Relocation Factor
t_originP = Translated origin symb = symbol in program P Relocation_Factor P = l_originp - t_originP tsymb and lsymb = l_progmP + dsymb (displacement from origin) = t_originP + relocation_factor + dsymb = tsymb + relocation_factor = t_progmP + dsymb (displacement from origin) l_originP = Linked Origin

Linking Is a process of binding an external reference to the correct link time address.

Binary Program A binary program is a machine language progam consist of a set of program unit SP such that for all Pi e SP. 1. Pi is relocated to the memory area starting from link origin. 2. Linking has been performed for external reference.

Object Module: Object module of a program containts all information necessary to relocate and link the program with other programs. The Object moduel of a program P consist of:

1. Header: The header contains translated orign, size and execution start
address of P.

2. Program: Contains the machine language instructions of P 3. Relocation Table: (RELOCTAB) it Describes relocation. Each entry in relocation
table contains an Entry: Translated Address: of address sensitive instructions

4. Linking Table LINKTAB. It contains information about public definition and


external reference of P. LINKTAB entry contains: Symbol : Symbolic Name Type : Public / Ext [PD/EXT] Tranl. Addres: PublicD: address of first word of symbol Ext: address of memory word require to contain address of symbol

Example: PROGRAM P Statement START 500 ENTRY TOTAL EXTERN MAX, ALPHA READ A LOOP . MOVER AREG, ALPHA BC ANY, MAX . BC LT, LOOP STOP A DS 1 TOT DS 1 END Address Code

500 501 518 519 538 539 540 541

09 0 540 04 1 000 06 6 000 06 1 501 00 0 000

Translated Origin of Program = 500 Translated Address of symbol A = 540. If linked origin of Program = 900 Linked Address of Symbol A = 940 PROGRAM Q Statement START 200 ENTRY ALPHA . ALPHPA DS 25 TOT DS 1 END Address

231 ) 00 0 025

If program P is linked with Porgram Q. ALPHA is external in P which is defined in Q. Linked Origin of P = 900 Length of P = 41 Linked Origin of Q = 942 Linked address of ALPHA = 973 Linker will put address 973 in P wherever ALPHA is used in program P.

Object program: 1. Translated Origin = 500, size = 42, execution start address = 500 2. Machines instructions 3. Relocation Table (Address sensitive instructions) 500 538 4. Linking Table ALPHA EXT 518 MAX EXT 519 A PD 540 Design of Linker Relocataion requirements of program are influenced by addressing structure of the computer system on which it is to execute. Relocation Algorithm: 1 . Program_linked_origin = <link orign> from linker command 2. For each object module, { a. t_origin = translated origin of the object module OM_size = size of Object module b. relocation_factor = l_origin - t_origin c. Read machine language program in Work_area d Read RELOCTAB of object module e.For each Entry in RELOCTAB { translated_addr = address in RELOCTAB addr_in_work_area = address of work_area + translated_address t_origin add relocation_Factor to operand address in word with address addr_in_work_area; } f. program_linked_origin = program_linked_origin + OM_size } E.g. lOrigin = 900, tOrigin = 500 Addr_of_work_area = 300 relocation_factor = 400 first entry in RELOCTAB = 500 addr_in_work_area = 300 + 500 500 = 300 for READ A second entry addr_in_work_area = 300 + 538 500 = 338

Linking Requirements In fortran all program units are translated separately. Hence all subprogram calls and common variables reference requires linking. Pascal procedures are nesed inside main program. Hence procedures do not require linking. In C, program files are translated separately. Thus only function calls that cross file boundries and global data requires linking. Linker process all object modules being linked and builds a table of all public definitions and their load time address. A name table (NTAB) is defined for use in program linking. Each entry contains, Symbol name of external reference link_addr for PD it contains address of symbol For object module it contains link origin of obj mod

Linking : Program P and program Q while linked , liker would make following table, NTAB : symbol P A Q ALPHA linked address 900 940 942 973

Linking Algorithm 1. program_linked_origin link_origin from LINKER command 2. For each object module { a. t_origin = translated origin OM_size = size of obj module b. relocation_factor = program_linked_orign - t_origin c. Read machine language program in Work_area d Read LINKTAB of object module e.For each Entry in LINKTAB with type = PD { Name = symbol linked_addr = translated_addr + relocation_factor entry (name, link_addr) in NTAB } f. Enter (object_mod_name, linked_origin) in NTAB g. Program_linked_orign = program_linked_orign+ OM_szie } 3. For each Object module { a. t_origin = translated origin program_link_origin = load_addr from NTAB b. for each LINKTAB entry with TYPE= EXT { addr_in_work_area= addr_in_work_area + program_link_origin - link_orign + tran_addr - t_orign search symbol in NTAB and copy its linked addrss add linked_address to operand address in with addr_in_work_area } }

SELF Relocating Programs The manner in which a pgoram can be modified, or can modify itself, to execute from a given load origin can be used to classify programs into the following: 1. Non relocatable programs 2. Relocatable programs 3. Self-relocating programs A non-relocatable program is a program which can not be exucted in any memory area other than starting of its translated origin. Non-relocatable program do not contains information about address sensitive instructions. E.g Hand coded machine language program Object modues are relocatable program which can be modify to execute form other area in memory. Self-relocating program is a program which can perform the relocation of its own address sensitive instructions. self-relocating program contains two provisions for this purpose: 1. A table of information concerning the address sensitive instructions exist as a part of program 2. code to perform relocation of address sensitive instructions also exists as a part of program called relocation logic. -- selfrelocating programs are less-efficient than relocatable programs -- no need of linker for self-relocating programs -- self-relocating program needs to find its load address before it can execute its relocation logic.

MS DOS Linker LINER uses two pass strategy, in first pass, the object modules are processed to collect information concerning segments and public definitions. Second pass performs relocation and linking. Object Module of MS Dos Intel 80x86 object module is a sequence of object records. Each of object record describes specific aspect of the program in the object module. There are 14 types of object records of five categories containing basic information: 1. Binary image 2. External Reference 3. Public definition 4. Debugging Information (e.g. e.g. line number in program) 5. Misc. Information (e.g. comments in program) RECORD THEADR LNAMES SEGDEF EXTDEF PUBDEF LEDATA FIXUPP MODEND Description Translator Header Record List of name record Segment Difinition record External name definition record Public name definition record Enumerated data Fixup record Module end record

Pass I 1. Program_linked_origin = <load orign> 2. Repeat step for each Object Modules 3. Select an object module and process its records a if LNAME record then enter it in NAMELIST b if SEGDEF record b1 I = name index, segment_name = NAMELIST[i] segment_addr = start address in attributes b2 if an absolute segment , enter(seg_name, seg_addr) in NTAB B3 If segment is relocatable and con not be combined with other segments - align address contained in program_link_origin on next word - enter (segment_addr, program_link_orign) in NTAB - program_link_origin = program_link_origin + seg_length

c I PUBDEF record c1 I = base, segment_name = NAMELIST[i], symbol = name c2 segment_addr = load address of segment_name in NTAB c3 symb_addr = segment_addr + offset c4 Enter (symbol, symb_addr) in NTAB

PASS 2 1 . list_of_all_object_modules = object modules name in LINKER command 2. Repeat step 3 until list_of_object_modules is empty 3. Select an object module and process its object records a. If LNAME Enter name in NAMELIST b. If a SEGDEF record I = name index, segment_name = NAMELIST[i] segment_addr = start address in attributes c. If an EXTDEF record 1. external_name = name from EXTDEF record 2. If external_name is not found in NTAB -- locate obj_mod which contains external_name -- add object module to list_of_obj_moduels -- perform first pass of LINKER for new OBJECT modules 3. Enter (ext_name, load addr form NTAB) in EXTTAB. d. IF an LEDATA record I = segment addr, d = data offset Program_load_orign = SEGTAB[i]. load_addr; Addr_in_work_area = addr_of_work_area + program_load_orign - <load_origin> + d Move data from LEDATA into memory of addr_in_work_area. E . IF FIXUPP record, for each FIXUPP specification f= offset from locate field Fix_up_addr = addr_in_work_area + f Perform required fix up using load address form SEGTAB or EXTTAB F. If MODEND record If start address is specifid, computer load address and record It in Executeable file.

LINKING FOR OVERLAYS An overlays is a part of program (or software package) which has same load origin as some other parts of the program. Overlay structured program: Overlay program consist of: 1. Permanently resident portion, called root 2. A set of overlays

Das könnte Ihnen auch gefallen