Sie sind auf Seite 1von 23

2.

0 INTRODUCTION
Computers have changed a lot since the days when people used to communicate with them by on and off switches denoting primitive instructions. With present day computers interaction has become more userfriendly because of the advancement in hardware and software tools. One category of software which assist in the mechanics of software development is system software. Assembler, linker/loader, compiler, operating system all belong to the realm of system software. In the previous unit, we discussed several components of programming languages, basic definitions of Assembler, Compiler, interpreters and differences among them. In this unit our focus will be on the implementation and use of assemblers. We will also cover broadly the use of macro processor, loaders and linkers. This unit is organised as follows: 1. 2. 3. 4. 5. In section 2.2, we discuss the broad objectives of a translator. In section 2.3, we identify different types of translators and explain their functioning. Section 2.4 presents techniques for implementing assemblers. Section 2.5 and 2.6 describes the functioning of macro- processor and loaders in detail. At the end in section 2.7 we summarize the main issues discussed in this unit.

2.1 OBJECTIVES
At the end of this unit, you will be able to. differentiate several types of translators implement an assembler explain the functioning of a macro processor compare several loader schemes

2.2 ADVANTAGES OF A TRANSLATOR


A translator permits the programmer to express his/her thought Process (algorithm) in a language Other dm that of machine, which is to execute the algorithm. The reason is that machine language is very illdesigned for human computer communication. The disadvantages in writing a computer program in

machine language is well explained in the first unit Of this Course. Some of the important benefits in writing a program other than through machine language are as follows: Increase in the programmer's productivity: What can be written more easily can be written more rapidly. Machine independence: A programmer has to be concerned more about the problem than the inner details of the machine on which a program has to be developed.

The input to a translation program is expressed in a source language. A source language might be assembly language or any high level language. The output of a translator (figure 1) is a target language.

Fig. 1: Translator Target language is often the machine language of some computer which is then able to execute the algorithm. Sometimes a non-machine (intermediate language) language is chosen as target language which itself is the source language for another translator which then yields the desired result

2.3 TYPES OF TRANSLATORS


Interpreters, compilers and assemblers are the three types of translators which have been introduced earlier (Unit 1, Course 2). In this section. we will discuss some software tools which help execute programs. Their functioning is also similar to a translator. For the simplest assembler, the target language is machine language and its source language has instruction in one to one correspondence with those of machine language but with symbolic names for both operators and operands. The translator just converts each assembler language instruction into corresponding machine language instruction. Less elementary assemblers translates the source program into a target language form which is combined with other software tools like library programs, linker, loader etc. before the execution.

Macro Processor
Many programs contain sequence of instructions which am repeated in identical form. The repetitious writing of such sequence is controlled by the macro processor which allows a sequence of source language code to be defined once and then referred to by name each time it is to be referred. Each time this name occurs in a program, the sequence of codes will be substituted at that point.

Example

1, X

ADD contents of X to register 1

A A A X In the above program the sequence A A

2, X 1, X 2, X DC

ADD contents of X to register 2 ADD contents of X to register 1 ADD contents of X to register 2 F '6' Actual value of X in hexadecimal form

1, X 2, X

occurs twice. It can occur many times also. A macro facility permits us to attach a name to this sequence and to use this name in its place. Each assembler language supports macro. IBM-360 type language supports a macro language that allows to specify the above as a macro definition and then refer to the definition later. A macro processor effectively constitutes a separate language processor with its own language. We attach a name to a sequence by means of a macro instruction definition, which is formed in the following manner: Start of definition Macro name Sequence to be abbreviated END of a Macro definition MACRO CNTR (for example) MEND

Therefore, the above example can be expressed in a macro language as shown in Figure 2(a). In this case, macro processor replaces each macro cells with the lines. A A 1, X 2, X

This process of implacementation is called expanding die macro. This is shown in Figure 2(b). Macro CNTR A A MEND . . . CNTR . . CNTR . . X DC F'6' Figure 2(a) Expanded Source 1,X 2,X

A X. A 2, X A 1,X. A 2,X X DC F'6' Figure 2(b)

Linker

For modularity of the program, it is better to break programs into several modules (subroutines). It is even better to put common routine, like reading a hexadecimal number, writing a hexadecimal number etc. which could be used by a lot of other programs also into a separate file. These files are assembled (translated) separately. After each has been successfully assembled, they can be linked together to form a large file, which constitutes the computer program. The program that links several programs is called the linker. The linker produces a link rile which contains the binary codes for all compound modules. The linker also produces a link map which contains the address information about the linked files. The linker, however, does not assign absolute addresses to program. It only assigns continuous relative addresses to all the modules linked, starting from zero. This form of program is said to be relocatable, because it can be put anywhere in main memory to be run. This form of code can even be carried to other machines of the some kind. or compatible to the present machine, to be run successfully. It is also called binder or linkage editor. It takes as independently translated programs as input whose original source language representation include symbolic reference to each other. Its task is to resolve these symbolic references and produce a single program. There is typically little difference between the linker's source and target languages. In most of the implementation linking is implemented by a loader only. In this unit also linker is considered to be a part of a loader.

Loader
A loader is a program that places programs into main memory and prepares them for execution. The loader's target language is machine language, its source language is nearly machine language. Loading is ultimately bound with the storage management function of operating systems and is usually performed later than assembly or compilation. The period of executions of user's program is called execution time. The period of translating a user's source program is called assembly or compile time. Load time refers to the period of loading and preparing an object program for execution. Loading will be discussed further in section 2.6.

Check Your Progress 1


Question l: What are the main advantages of a translator in the development of a software? Question2: Why linkers and loaders are also called translators?

2.4 ASSEMBLER IMPLEMENTATION


An assembly is a program that accepts as input, an assembly language program and produces its machine language equivalent along with information for the loader (Figure 3).

Fig. 3: Assembler For example, the externally defined symbols (library program) must be indicated to the loader the assembler does not know the address of these symbols and it is up to the loader to find the programs containing them, load them into memory and place the values of these symbols in the

calling program. In this section, we will discuss the different approaches to design of an assembler and its related program.

2.4.1 Assembler and its related Program


The assembler-language program contains three kinds of entities. Absolute entities include operation codes, numeric and string constants and fixed addresses. The values of absolute entities are independent of which storage locations the resulting machine code will eventually occupy. Relative entities include the addresses of instructions and of working storage. These are fixed only with respect to each other, and are normally staled relative to the address of the beginning of the module. An externally defined entity is used within a module but not defined within it Absolute or relative is not necessarily known at the time the module is translated. The object program includes identification of which addresses are relative. which symbols are defined externally, and which internally defined symbols are expected to be referenced externally. In the modules in which the latter are used. they are considered to be externally defined. These external references are resolved for two or more object programs by a linker. The linker accepts the several object program as input and produces a single program ready for loading, hence termed a load program. The module is free of external references and consists essentially of machine-language code accompanied by a specification of which addresses are relative. When the actual main storage locations to be occupied by the program become known, a relocating loader reads the program into storage and adjusts the relative addresses to refer to those actual locations. The output from the loader is a machine-language program ready for execution. The overall process is depicted in Figure 4. If only a single source-language module containing no external references is translated, it can be loaded directly without intervention by the linker. In some programming systems the format of linker output is sufficiently compatible with that of its input to permit the linking of a previously produced load module with some new object modules. The functions of linking and loading are sometimes both effected by a single program, called a linking loader. Despite the convenience of combining the linking and loading functions, it is important to realize that they are distinct functions, each of which can be performed independently of the other.

Fig. 4: Program Translation Steps

2.4.2 Load and Go Assembler

The simplest assembler program is the load and go assembler. It accepts as input a program whose instructions are essentially one to one correspondence with those of machine language but with symbolic names used for operators and operands. It produces machine language as output which are loaded directly in main memory and gets executed. The translation is usually performed in a single pass over the input program text. The resulting machine language program occupies storage locations which are fixed at the time of translation and cannot be changed subsequently. The program can call library subroutines, provided that they occupy other locations than those required by the program. No provision is made for combining separate subprograms translated in this manner. The load and go assembler forgoes the advantages of modular program development. Among the most of these are (1) (2) the ability to design code and test different program components in parallel. change in one particular module does not require scanning the rest of program. Most assemblers are therefore designed to satisfy the desire to create programs in modules. These module assemblers. generally are developed in a two-pass translation. During the first pass the assembler examines the assembler-language program and collects the symbolic names into a table. During the second pass, the assembler generates code which is not quite in machine language. It is rather in a similar form, sometimes called "relocatable code" and here called object code. The program module in object-code form is typically called an object module.

2.4.3 One-Pass Module Assembler


The translation performed by an assembler is essentially a collection of substitutions: machine operation code for mnemonic, machine address for symbolic, machine encoding of a number for its character representation, etc. Except for one factor, these substitutions could all be performed in one sequential pass over the source text. That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler). The separate passes of the two pass assemblers are required to handle forward references without restriction. If certain limitations are imposed, however, it becomes possible to handle forward references without making two passes. Different sets of restrictions lead to the one pass assembler. These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely, as on many small machines.

2.4.4 Two Pass Assembler


Mostly assembler are designed in two passes(stages), therefore, they are called Two-Pass Assemblers. 'Re pass-wise grouping of tasks in a two pass assembler is given below: Pass I Separate the symbols, mnemonic op-code and operational fields. Determine the storage requirement for every assembly language statement and up date the location counter. Build the symbol table. (Table that is used to store each label and its corresponding value).

Pass II Generate object code.

FUNCTION
The program of figure 5, although, written in a hypothetical assembler language, contains the basic elements which need to be translated into machine language. (It is not essential for students to understand the meaning of each statement of the program.) For ease of reference, each instruction is defined by a line number, which is not part of the program. Each instruction in our language contains either an operation specification (lines 1- 15) or a storage specification (lines 16- 21). An operation specification is a symbolic operation code, which may be preceded by a label and must be followed by 0, 1, or two operand specifications, as appropriate to the operation. A storage specification is a symbolic instruction to the assembler. In our assembler language, it must be preceded by a label and must be followed, if appropriate, by a constant FIXED. Labels and operand specifications are symbolic addresses; every operand specification must appear somewhere in the program as a label. Line Label Operation Operand 1 Operand 2 1 COPY ZERO OLDER 2 COPY ONE OLD 3 READ LIMIT 4 WRITE OLD 5 FRONT LOAD OLDER 6 ADD OLD 7 STORE NEW 8 SUBST LIMIT 9 BRPOS FINAL 10 WRITE NEW 11 COPY OLD OLDER 12 COPY NEW OLD 13 JMP FRONT 14 FINAL WRITE LIMIT 15 STOP 16 ZERO CONST 0 17 ONE CONST 18 OLDER SPACE 19 OLD SPACE 20 NEW SPACE 21 LIMIT SPACE Figure 5. Sample Assembler-Language Program Symbolic ADD JMP JMPNEG JMPPOS JMPZERO COPY DIVIDE LOAD MULT READ STOP STORE SUB WRITE Operation Code Machine 02 00 05 01 04 13 10 03 14 12 11 07 06 08 Length 2 2 2 2 2 3 2 2 2 2 1 2 2 2 No. of Operands 1 1 1 1 1 2 1 1 1 1 0 1 1 1 Action ACC - ACC + OPDI Jump to OPDI Jump to OPDI if ACC < 0 Jump to OPDI if ACC > 0 Jump to OPDI f ACC = 0 PD2- OPDI ACC- ACC / OPDI ACC - OPDI ACC - ACC X OPDI OPDI - input stream Stop execution OPDI - ACC ACC - ACC - OPDI Output stream - OPDI

Figure 6: Instruction Set Our hypothetical machine has a single accumulator and a main storage of unspecified size. Its 14 instructions are listed in Figure 6. Ale first column shows the operation code and the second gives the machine-language equivalent (in decimal). The fourth column specifies the number of operands, and the last column describes the action which ensues when the instruction is executed. In that column "ACC", "OPDI", and "OPD2" refer to contents of the accumulator, of the first operand location, and of the second operand location, respectively. The length of each instruction in words is, 1 greater than the number of its operands. Thus if the machine has 12 bit words, an ADD instruction is 2 words of 24 bits, long. The table's third column, which is redundant, gives the instruction length. If our hypothetical computer had a fixed instruction length, the third and fourth columns could both he omitted. The storage specification SPACE reserves one word of storage which presumably will eventually hold a number; there is no operand. lie storage specification FIXED also reserves a word of storage; it has an operand which is the value of a number to be placed in that word by the assembler. The instructions of the program are presented in four fields, and might indeed be, constrained so such a format on the input medium. The label, if present, occupies the first field. The second field contains the symbolic operation code or storage specification which will hence- forth be referred to simply as the operation. The third and fourth fields hold the operand specification, or simply operands, if present. Although, it is not at all important to our discussion to understand what the example program does, the foregoing specifications of the machine and of its assembler language reveal the algorithm. The program simply, computes the so-called Fibonacci numbers (0,1,1,2,3,5,8,...). This program is also written in BASIC programming language of Unit 1 Course 2. Now that we have seen the elements of an assembler-language program we can ask what functions the assembler must perform in translating it Here is the list Replace symbolic addresses by numeric addresses. Replace symbolic operation codes by machine operation codes. Reserve storage for instructions and data. Translate constants into machine representation.

The assignment of numeric addresses can be performed without prior knowledge of what actual locations will eventually be occupied by the assembled program. It is necessary only to generate addresses relative to the start of the program. We shall assume that our assemble normally assigns addresses starting at 0. In translating line 1 of our example program, the resulting machine instruction will therefore be assigned address 1 and occupy 3 words, because COPY instructions are 3 words long. Hence the instruction corresponding to line 2 will be assigned address 3, the READ instruction will be assigned address 6, and the WRITE instruction of line 4 will be assigned address 8, and so on to the end of the program. But what addresses will be assigned to the operands named ZERO and OLDER? These addresses must be inserted in the machine-language representation of the first instruction.

Implementation
The assembler uses a counter to keep track of machine- language addresses. Because these addresses will ultimately specify locations in main storage, the counter is called the location counter. Before assembly, the location counter is initialized to zero. After each source line has been examined on the first pass, the location counter is incremental by the length of the machine-language code which will ultimately be generated to correspond to that source line. When the assembler first encounters line 1 of the example program, it cannot replace the symbols ZERO and OLDER by addresses because those symbols make forward references to source language program lines not yet reached by the assembler. The most straightforward way to cope with the problem of forward references is to examine the entire program, text once, before attempting to complete the translation.

During that examination, the assembler determines the address which corresponds to each symbol, and places both the symbols and their addresses in a symbol table. This is possible because each symbol used in an operand field must also appear as a label. The address corresponding to a label is just the dress of the symbol table requires one pass over the source text. During a second pass, the assembler uses the addresses collected in the symbol table to perform the translation. As such symbolic address is encountered in the second pass, the corresponding numeric address is substituted for it in the object code. Two of the most common logical errors in assembler-language programming involve improper use of symbols. If a symbol appears in the operand field of some instruction, but nowhere in a label field. it is undefined. If a symbol appears in the label fields of more than one instruction, it is multiply defined. In building the symbol table on the first pass, the assembler must examine the label field of each instruction to permit it to associate the location counter value with each symbol. Multiply-defined symbols will be found on this pass. Undefined symbols, on the other hand, will not be found on the first pass unless the assembler also examines operand fields for symbols. Although this examination is not required for construction of the symbol table, normal practice is to perform it anyhow, because of its value in early detection of program errors. There are many ways to organize a symbol table. The organisation of a symbol table will not be discussed in this Unit. The state of processing after fine 3 is shown in Figure 7. During processing of line 1, the symbols ZERO and OLDER were encountered and entered into the fiat two positions of the symbol table, The operation COPY was identified. and instruction length, information from figure 6 used to advance the location counter from 0 to 3. During processing of line 2 two more symbols were encountered and entered in the symbol table and the location counter was advanced from 3 to 6. Line 3 yielded the fifth symbol, LIMIT, and caused incrementation of the location counter from 6 to 8. At this point the symbol table holds five symbols, none of which yet has an address. The location counter holds the address 8, and processing ready to continue from line 4. Neither the line numbers nor the addresses shown in part (a) of the figure are actually part of the source-language program. The addresses record the history of incrementation of the location counter the line numbers permit easy reference. Clearly, the assembler needs not only a location counter, but also a line counter to keep track of which source line is being processed. Line 1 2 3 Address 0 3 6 Label Operation COPY COPY READ (a) Source text scanned Symbol ZERO OLDER ONE OLD LIMIT Address -----(b) Symbol table: Counters Figure 7: First Pass After Scanning Line 3 During processing of line 4 the symbol OLD is encountered for the second time. Because it is already in the symbol table, it is not entered again. During processing of line 5, the symbol FRONT is encountered in ft label field. It is entered into the symbol table, and the current location counter value, 10 is entered with it as its address. Figure 8 displays the state of the translation after line 9 has been processed. Operand 1 ZERO ONE LIMIT Operand2 OLDER OLD

Location counter; 8 Line counter; 4

Line 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Address 0 3 6 8 10 12 14 16 18 20 22 25 28 30 32 33 34 35 36 37 38

Label

FRONT

FINAL ZERO ONE OLDER OLD NEW LIMIT

Operation Operand l operand 2 COPY ZERO OLDER COPY ONE OLD READ LIMIT WRITE OLD LOAD OLDER ADD OLD STORE NEW SUB LIMIT JWPOS FINAL WRITE NEW COPY OLD OLDER COPY NEW OLD JMP FRONT WRITE LIMIT STOP CONST 0 CONST 1 SPACE SPACE SPACE SPACE (a) Source text scanned

Symbol ZERO OLDER ONE OLD LIMIT FRONT NEW FINAL

Address 33 35 34 36 38 10 37 30

Location counter: 39 Line counter.. 22

(b) Symbol table: Counters Figure 9 The XX can be thought of as a specification to the loader will eventually process the object code, that the content of the location corresponding to address 35 does not need to have any specific value loaded. The loader can then just skip over that location. Some assemblers specify anyway a particular value for reserved storage locations, often zeros. There is no logical requirement to do so, however, and the user unfamiliar with his assembler is ill-advised to count on a particular value. Address 00 03 06 08 10 12 14 16 18 3 3 2 2 2 2 2 2 2 Length Machine Code 13 33 35 13 34 36 12 38 08 36 03 35 02 36 07 37 06 38 01 30

20 22 25 28 30 32 33 34 35 36 37 38

2 3 3 2 2 1 1 1 1 1 1 1

08 37 13 36 35 13 37 36 00 10 08 38 11 00 01 XX XX XX XX

Figure 10: Object Code Generated on 2nd Pass The specifications CONST and SPACE do not correspond to machine instructions. They are really instructions to the assembler program. Because of this, we shall refer to them as assembler instructions. Another common designation for them is pseudo-instructions. Neither term is really satisfactory. Of the two types of assembler instructions in our example program, one results in the generation of machine code and the other in the reservation of storage. Later we shall see assembler instructions which result in neither of these actions. One organization is to use a separate table which is usually searched before the operation code table is searched.Another is to include both machine operations and assembler instructions in the same table. A field in the table entry then identifies the types to the assembler. A few variations to the foregoing process can be considered. Some of the translation can actually be performed during the first pass. Operation fields must be examined during the first pass to determine their effect on the location counter. The second pass table lookup to determine the machine operation code can be obviated at he cost of producing intermediate test which holds machine operation code and instruction length in addition to source text. Another translation which can be performed during the first pass is that of constants, e.g. from sourcelanguage decimal to machine-language binary. The translation of any symbolic addresses which refer backward in the text, rather than forward, could be performed on the first pass, but it is more convenient to wait for the second pass and treat all symbolic addresses uniformly. A minor variation is to assemble addresses relative to a starting address other than 0. The location counter is merely initialized to the desired address. If, for example, the value 200 is chosen, the symbol table would appear as in figure 11.The object code corresponding to line 1 wouldbe200 3 13 233 235. Symbol ZERO OLDER ONE OLD LIMIT FRONT NEW FINAL Address 233 235 234 236 238 210 237 230 Figure 11: Symbol Table with Starting Location 200

If it were known at assembly time that the program is to reside at location 200 for execution then full object code with address and length need not be generated. The machine code alone would suffice. In this event the result of translation would be the following 39-word sequence. 13 236 01 236 XX 233 03 230 00 XX 235 235 08 210 XX 13 02 238 08 234 236 13 238 236 07 236 11 12 237 235 00 238 06 13 01 08 238 237 XX

2.5 MACRO PROCESSOR


The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program. The block may consist of code to swap sets of registers, do some arithmetic operations. In this situation the programmer find a macro instruction facility useful. Macro instruction (often called macros) are single line abbreviation for group of instructions. In employing a macro, the programmer essentially defines a single instruction to represent a block of code. For every occurrence of this one-line macro instruction in his program, the macro processing assembler substitute the entire block. In this section we will discuss an assembler macro facility and its implementation within a assembler.

2.5.1 Macro Definition and Usage


In section 2.3 we have already given one example of macro. In this section we will present one more example to highlight salient aspects of macro-processor. The example is very similar to Intel's 8 bit microprocessor assembly language instruction..

Example
MACRO INCRMT LOAD ADD STORE ENDM ___________ ___________ ___________ INCRMT

&A, & B &A &B &A

Macro definition

X,Y

ENDM

LOAD X Macro ADD Y expansion STORE X Macro Program Figure 12

A macro definition is placed at the start of a program, enclosed between the statements MACRO and ENDM. A MACRO statement indicates that a macro definition starts, while statement ENDM indicates the end of a macro definition. Thus, a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit. If many macros are to be defined in a program, as many definition modules will exist at the start of the program. Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above, INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence. The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro. The assembler replaces such a statement by the statement sequence comprising the macro. This is known as macro expansion. In example 12 INCRMTX,Y is shown to lead to insertion of the assembly statements LOAD ADD STORE X Y X

in its place. All macro calls in a program are expanded in this fashion.

Defining a Macro
Let us take another look at the macro definition unit appearing in the following Figure 13. MACRO INCRMT LOAD ADD STORE ENDM ___________ ___________ ___________ INCRMT

&A, & B &A &B &A

Macro definition

X,Y

ENDM

LOAD X Macro ADD Y expansion STORE X Macro Program

Figure 13 The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit, signals the start of the main assembly language program. The next statement in the definition unit is die prototype for a macro call. This statement names the macro and indicates how the operands in any call on the macro would be written. The prototype is followed by the so called model statements. These are assembly statements which will replace the macro call as a result of macro expansion.

Positional Parameters
The prototype statement indicates how operands in a macro call would be written. These operands are called parameters or arguments. All parameters used in the prototype statement have names starting with the special character '&'. These parameters are known as formal parameters. A macro call is written using parameter names which do not start with ft special character '&'. These are known as actual parameters. The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively, establish a correspondence between each formal parameter and an actual parameter. In figure 12, this correspondence is determined by the relative positions of these parameters in their respective lists. Thus the first actual parameter in the fist is paired with the first of formal parameters etc. Considering the prototype and macro call statements once again. INCRMT INCRMT &A,&B X,Y ... ... prototype macro call

We see that X would be paired with &A and Y with &B. While expanding a macro call, any formal parameter appearing within a model statement is replaced by the corresponding actual parameter. This is how expansion of the call INCR X,Y heads to the following statements LOAD ADD STORE X Y X

2.5.2 Schematics for Macro-Expansion


In the last section, we touched upon the fundamental aspects of macro expansion. From the discussion, it appears that the process of macro expansion is similar to language translation. The source program containing macro definitions and calls is translated into an assembly language, program without any macro definitions or calls. This program form can now be handed over to a conventional assembler as to obtain the target languages form of the program. In such a schematic (Figure 14), the process of macro expansion is completely segregated from the process of assembly program. The translator which performs macro expansion in this manner is called a macro pre-processor. The advantage of this scheme is that any existing conventional assembler can be enhanced in this manner to incorporate macro processing. It would reduce the programming cost involved in making a macro facility available to programmer using a computer system. The disadvantage is that this scheme is probably not very efficient because of the time spent in generating assembly language statements and processing them again for the purpose of translation to the target language.

Fig. 14: A pre-processor based scheme for macro assembly As against this schematic of prefixing a conventional assembler with a macro pre-processor, it is possible to design a macro assembler which not only processes macro definitions and macro calls for the

purpose of expansion, but also assembles the expanded statements along with the original assembly statements. The macro assembler should require fewer passes over the program than the pre-processor scheme. This holds out a promise for better efficiency. But for the sake of simplicity in this section, we will discuss the issues related to implementation of macro pre- processor instead of actual implementation.

2.5.3 Issues related to the Design of a Macro Pre Processor


In this section we will discuss issues related to the design of macro pre-processor but not the actual implementation. Our discussion regarding the definition and use of macros in an assembly program has brought out to some extent the working principles of a macro preprocessor. To summarise, we should be able to differentiate between macro names and invalid operation code mnemonics. On thus recognising a call on a macro, we should be able to access the text of its definition so that we can expand the call. For generating a statement during expansion, we need to develop a simple scheme for substituting the appearance of a formal parameter with its value. Correspondence between a formal parameter and its value will have to be established for this purpose. It is desirable that instead of performing this action for every appearance of a formal parameter, correspondent between formal parameters and their value should be established once and for all, at the start of macro expansion. Considerations of positional and keyword correspondence would thus get localised to the start of macro expansion only. This would have the further advantage that no distinction would need to be made between keyword and positional parameters during macro expansion. Step 1: Scan all macro definitions one by one. Foi each macro defined. i) ii) iii). Step 2: Examine all statements in the assembly source program to detect macro calls. For each macro call i) locate the macro in MNT. ii) obtain information from MNT regarding position of the macro definition in MDT. iii) process the macro call statement to establish correspondence between all formal parameters and their values (i.e. actual parameters). iv) expand the macro call by following the procedure given in step 3. Step 3: Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered. The conditional assembly statement AIF and AGO will enforce changes .enter its name in the Macro Name Table (MNT). .store the entire macro definition in the Macro Definition Table (MDT). add auxiliary information to the MNT indicating where the definition of a macro can be found in MDT.

in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables. In order to have a complete working scheme within the above framework, we need to finalise the following details: i) Method of establishing correspondence between a formal parameter and its value. ii) Method of sequencing through the statements comprising a macro definition in expansion time order. iii) Method of expanding a model statement iv) Allocation of storage for expansion time variables and access to their values during expansion.

Check Your Progress 2


Question 1: Write down important tasks performed in two pass assemblers.

Question 2: Why load and go assembler is simplest in comparison to one-pass assembler?

Question 3: What does object program produced by assembler/ compiler include?

2.6 LOADERS
The purpose of this section is to discuss various functions of a loader. As discussed earlier in this unit, the loader is a program which accepts an object code and prepare them for an execution. An object code produced by an assembler/compiler cannot be executed without any modification. As many as four more function must be performed first. These functions are performed by a loader. These functions are: i. Allocation of space in main memory for the programs.

ii. Linking of a program with each other like library programs iii. Adjust all address dependent locations. such as address constants, to correspond to the allocated space. it is also called relocation iv. Physically load the machine instructions and data into memory. The following figure 15 shows the function of a loader

Fig. 15: Function of a loader. Let us examine the need of some of these function of the loader.

Linking:
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a 'stand-alone' nature. That is a program generally cannot execute on its own. without requiring the presence of some other programs in the computer's memory. For example. consider a program written in high level languages like C. Such a program may contain calls on certain Input/Output functions like Printf ( ), Scanf ( ) etc., which am not written by the programmer himself. During program execution, those standard functions must reside into the main memory. Furthermore, everytime an Input/Output function is called by a C language program, control should get transferred to the appropriate function. The linking function makes address of programs known to each other so that such transfers can take place during the execution.

Relocation:
Another function commonly performed by a loader is that of program relocation. This function can be explained as follows: Assume that a program written in C ( let us call it A) calls standard function Printf ( ). A and Printf ( ) would have to be linked with each other. But where is main storage shall we load A and Printf ( ). A possible solution would be to load them according to the addresses assigned when they were U~W& For example, as translated . A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150. If we were to load these programs at their translated addresses, a lot of storage lying between them may go waste. Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100. 7bus, A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50. But there is simply no way A and Printf ( )can co-exist at same storage location. Therefore, the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste. It should be noted that relocation is more than simply moving a program from one area to another in the storage. It refers to adjustment of address fields and not to movement of a program. The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity, be it a program or data. It is possible to produce multiple program or data segment in a single source file). The pan of a loader which performs relocation is called relocating loader.

2.6.1 Loader Schemes


There, are several schemes accomplishing the four loading function. These schemes are (i) Absolute loader (ii) Relocating Loader (iii) Direct Linking Loader (iv) Dynamic Loading (v) Dynamic Linking etc.

Absolute Loader: The task of an absolute loader is virtually trivial. The loader simply accepts the machine language code produced by the assembler and places it into main memory at the location specified by the assembler. Relocating Loader: To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer. The general class of relocating loader was introduced. The output of a relocating loader is the object program and information about all other programs it references. In addition, there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory. Direct Linking Loader: It is a general relocatable loader, and is perhaps the most popular loading scheme presently used. It has the advantage of allowing the programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments. This provides flexible inter segment referencing and accessing ability, while at the same time allowing independent translations of programs. The other two loader schemes will be discussed in the next section.

2.6.2 Dynamic Loading and Linking


There are numerous variations to the previously presented loader schemes. One disadvantage of the direct-linking loader, as presented, is that it is necessary to allocate, relocate, link. And load all of the subroutines each time in order to execute a program. Since there may be tens and often hundreds of subroutines involved, especially when we include utility routines such as SQRT etc., this loading process can be extremely time- consuming. Furthermore, even though the loader program may be smaller than the assembler, it does absorb a considerable amount of space. These problems can be solved by dividing the loading process into two separate programs: a binder and a module loader. A binder is a program that performs the same functions as the direct-linking loader in binding subroutines together, but rather Cm placing the relocated and linked text directly into memory, it outputs the text as a file. This output file is in a format ready to be loaded and is typically called a load module. The module loader merely has to physically load the module into main memory. The binder essentially performs the functions of allocation, relocation, and linking; the module loader merely performs the function of loading. There are two major classes of binders. The simplest type produces a load module that looks very much like a single absolute loader filet This means that the specific memory allocation of the program is performed at the time that the subroutines are bound together. A more sophisticated binder, called a linkage editor. can keep uwk of the relocation information so that the resulting load module can be further relocated and thereby loaded anywhere, in memory. In this case the module loader must perform additional allocation and relocation as well as loading, but it does not have to worry about the complex problems of linking. In both cases, a program that is to be used repeatedly need only be bound once and then can be loaded whenever required. The first binder is relatively simple and fast. The second one (linkage editor binder) is somewhat more complex but allows a more flexible allocation and loading scheme.

Dynamic Loading
In each of the previous loader schemes we have assumed that all of the subroutines needed are loaded into main memory at the same time. If the total amount of memory required by all these subroutines exceeds the amount available, as is common with large programs on small computers, there is trouble! There are several hardware, techniques, such as paging and segmentation, that attempt to solve this

problem; these issues will be discussed in Block 2 of this course. In this section we will present conventional dynamic loading schemes based upon the use of a binder prior to loading. Usually the subroutines of a program are needed at different times: for example, pass 1 and pass 2 of an assembler are mutually exclusive ~ 1 and pass 2 should not simultaneously occupy memory resources). By explicitly recognizing which subroutines call other subroutines it is possible to produce an overlay structure that identifies mutually exclusive subroutines. Figure 16 illustrates a program consisting of five subprograms (A, B. C, D and E) that require 100K bytes of memory. The arrows indicate that subprogram A only calls B, D and E; subprogram B only calls C and E; subprogram D only calls E; and subprograms C and E do not call any other routines. Figure 16(a) highlights that interdependencies between the procedures. Note that procedures B and D are never in use at the same time; neither are C and E. If we load only those procedures that are actually to be used at any particular time. the amount of memory needed is equal to the longest path of the overlay structure. This happens to be 7-K for the example in Figure 16(b) procedures A, B and C. Figure 16(c) illustrates a storage assignment for each procedure consistent with the overlay structure. In order for the overlay structure to work it is necessary for the module loader to load the, various procedures as they are needed. We will not go into their specific details, but there are many binders Capable of processing and allocating an overlay structure. The portion of the loader that actually intercepts the calls and loads the necessary procedure is called the over lay supervisor or simply the flipper. This overall scheme is called dynamic loading or load on-call.

Fig. 16 : Dynamic Loading

Dynamic Linking
The major disadvantage of all of the previous loading schemes is that if a subroutine is referenced but never executed (e.g. if the programmer had placed a call statement in his program but this statement was never executed because of a condition did not satisfy) the loader would still incur the overhead of linking the subroutine.

Furthermore, all of these schemes require the programmer to explicitly name all procedures that might be called. A very general type of loading scheme is charted dynamic linking. This is a mechanism by which loading and linking of external references are postponed until execution time. The loader loads only the main program. If the main program should execute a transfer instruction to an external address, or should reference an external variable (that is, a variable that has not been defined in this procedure segment), the loader is called. Only then is the segment containing the external reference loaded. An advantage here is that no overhead is incurred unless the procedure to be called or referenced is actually used. A further advantage is that the system can be dynamically reconfigured. The major drawback to using this type of loading scheme is the considerable overhead and complexity incurred, due to the fact that we have postponed most of the binding process until execution time. Now we will discuss the implementation of the simplest type of loader scheme which is called an absolute loader.

2.6.3 Implementation of an Absolute Loader


Absolute loaders are simple to implement but they do have disadvantages. First, the programmer must specify to the assembler the address in memory when the program is to be loaded. Further, if there are multiple function to be called within a program, the programmer must remember the address of each and use that absolute address explicitly in his Other functions to perform linking of functions. The figure B illustrates the operation of an absolut loader. The programmer must he careful not to assign two subroutine function to the same or overlapping address.

Figure 17: Absolute Loader The program First. c is assigned to locations 100-300 and the sqrt function is assigned location 400-450. If changes were made to A that increased its length to more than 300 bytes, the end of first. c (at 100+300 = 400) would overlap the start of sqrt (at 400). It would then be necessary to assign sqrt to a new address. Furthermore, it would also be necessary to modify all other functions that referred to sqrt. In situation when dozen of subroutines are being used, this manual shuffling can get very complex, tedious and wasteful of time and memory. The four loader functions are accomplished as follows in an absolute loading scheme: MACRO

INCRMT LOAD ADD STORE ENDM ___________ ___________ ___________ INCRMT

&A, & B &A &B &A

Macro definition

X,Y

ENDM

LOAD X Macro ADD Y expansion STORE X Macro Program

Check Your Progress 3


Question 1: Explain the important functioning of a loader.

Question 2: List different types of loader schemes and explain advantages and disadvantages of dynamic linking loader (Dynamic linker) scheme.

2.7 SUMMARY
In this unit we introduced several features of assemblers. The important topics covered in this unit include: Design of assembler Concepts and design of macro processor Several loader schemes and implementation of an absolute loader To understand this unit, one is required to understand a bit of microprocessor and its assembly language programming.

2.8 MODEL ANSWERS


Check Your Progress 1
1.The two important advantages are: i. ii. It increases the programmer's productivity It supports machine independence. A programmer has to be concerned more about the problem than the inner details of the machine on which a program has to be developed.

iii.

It also supports portability. A program written in high level language is not dependent upon architecture of a machine.

2.Because both Tinkers and loaders change a software from one form to another. A linker takes as input independently translated programs whose original source-language representation include symbolic references to each other. Its output is a single program after it has resolved all these symbolic references. The loader takes a program produced by assembler or complier as input and produces it output in executable form by assigning physical address of memory to instructions.

Check Your Progress 2


1. The passwise tasks performed in two pass assemblers are: Pass I Separate the symbols, mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and update the location counter. Build the symbol table. Pass II Generate object code. 2. No model answer. 3. Assembler translates assembly language program and produces object program for loaders and linkers which contains information. It includes identifications of relative address, external references which are further resolved by a linker.

Check Your Progress 3


1. The important functions of a loader are as follows: i. ii. iii. iv. Allocation of memory space for programs Linking of a program with externally defined routines like library software, Relocation which means adjustment of all address dependent locations to correspondent to the allocated space. Physically load the machine instruction and data into memory.

2. There arc several types of loader schemes i. ii. iii. iv. Absolute loader Relocating loader Direct linking loader Dynamic Loader

v.

Dynamic Linker

Dynamic Linking Loader: A major disadvantage of other loading schemes (Absolute loader, Relocating loader, Direct linking loader) is that if any library routine is referenced but never executed, the loader will still incur the overhead of linking the subroutine . Furthermore, all of these schemes require the programmer to explicitly specify the names of all procedures that might be called. A very general type of loading scheme called dynamic linking which postpones loading and linking of external reference until execution time. The first advantage here is that no overhead is incurred unless the procedure to be called or referenced is actually used. The second advantage is that the system can be dynamically reconfigured. The disadvantage with this scheme is that the considerable overhead and complexity occur due to the postponement of the most of the binding process until execution time.

2.9 FURTHER READINGS


1. Systems Programming by John J. Donovan, McGRAW HILL- International Book Company. 2. Introduction to System Software by D.M. Dhamdhere - Tata McGRAW HILL.

Das könnte Ihnen auch gefallen