Sie sind auf Seite 1von 15

July 2011 MC0073 System Programming 4 Credits

(Book ID: B0811)

Assignment Set 1 (60 Marks)


Answer all Questions 1) Explain the following, Each question carries TEN marks

A) Lexical Analysis Ans:

The lexical analyzer is the interface between the source program and the compiler. The lexical analyzer reads the source program one character at a time, carving the source program into a sequence of atomic units called tokens. Each token represents a sequence of characters that can be treated as a single logical entity. Identifiers, keywords, constants, operators, and punctuation symbols such as commas and parentheses are typical tokens. There are two kinds of token: specific strings such as IF or a semicolon, and classes of strings such as identifiers, constants, or labels.
B) Syntax Analysis Ans:

The parser has two functions. It checks that the tokens appearing in its input, which is the output of the lexical analyzer, occur in patterns that are permitted by the specification for the source language. It also imposes on the tokens a tree-like structure that is used by the subsequent phases of the compiler. The second aspect of syntax analysis is to make explicit the hierarchical structure of the incoming token stream by identifying which parts of the token stream should be grouped together.

2. What is RISC and how it is different from the CISC? Ans:

CISC: A Complex Instruction Set Computer (CISC) supplies a large number of complex instructions at the assembly language level. Assembly language is a low-level computer programming language in which each statement corresponds to a single machine instruction. CISC instructions facilitate the extensive manipulation of low-level computational elements and events such as memory, binary arithmetic, and addressing. The goal of the CISC architectural philosophy is to make microprocessors easy and flexible to program and to provide for more efficient memory use. The CISC philosophy was unquestioned during the 1960s when the early computing machines such as the popular Digital Equipment Corporation PDP 11 family of minicomputers were being programmed in assembly language and memory was slow and expensive. CISC machines merely used the then-available technologies to optimize computer performance. Their advantages included the following: (1) A new processor design could incorporate the instruction set of its predecessor as a subset of an ever-growing languageno need to reinvent the wheel, code-wise, with each design cycle. (2) Fewer instructions were needed to implement a particular computing task, which led to lower memory use for program storage and fewer time-consuming instruction fetches from memory. (3) Simpler compilers sufficed, as complex CISC instructions could be written that closely resembled the instructions of high-level languages. In effect, CISC made a computers assembly language more like a high-level language to begin with, leaving the compiler less to do. Some disadvantages of the CISC design philosophy are as follows: (1) The first advantage listed above could be viewed as a disadvantage. That is, the incorporation of older instruction sets into new generations of processors tended to force growing complexity. (3) Many specialized CISC instructions were not used frequently enough to justify their existence. The existence of each instruction needed to be justified because each one requires the storage of more microcode at in the central processing unit (the final and lowest layer of code translation), which must be built in at some cost. (4) Because each CISC command must be translated by the processor into tens or even hundreds of lines of microcode, it tends to run slower than an equivalent series of simpler commands that do not require so much translation. All translation requires time. (4) Because a CISC machine builds complexity into the processor, where all its

various commands must be translated into microcode for actual execution, the design of CISC hardware is more difficult and the CISC design cycle correspondingly long; this means delay in getting to market with a new chip. The terms CISC and RISC (Reduced Instruction Set Computer) were coined at this time to reflect the widening split in computer-architectural philosophy. RISC: The Reduced Instruction Set Computer, or RISC, is a microprocessor CPU design philosophy that favors a simpler set of instructions that all take about the same amount of time to execute. The most common RISC microprocessors are AVR, PIC, ARM, DEC Alpha, PA-RISC, SPARC, MIPS, and IBMs PowerPC.

RISC characteristics - Small number of machine instructions : less than 150 - Small number of addressing modes : less than 4 - Small number of instruction formats : less than 4 - Instructions of the same length : 32 bits (or 64 bits) - Single cycle execution - Load / Store architecture - Large number of GRPs (General Purpose Registers): more than 32 - Hardwired control - Support for HLL (High Level Language). RISC and x86 However, despite many successes, RISC has made few inroads into the desktop PC and commodity server markets, where Intels x86 platform remains the dominant processor architecture (Intel is facing increased competition from AMD, but even AMDs processors implement the x86 platform, or a 64-bit superset known as x86-64). There are three main reasons for this. One, the very large base of proprietary PC applications are written for x86, whereas no RISC platform has a similar installed base, and this meant PC users were locked into the x86. The second is that, although RISC was indeed able to scale up in performance quite quickly and cheaply, Intel took advantage of its large market by spending vast amounts of money on processor development. Intel could spend many times as much as any RISC manufacturer on

improving low level design and manufacturing. The same could not be said about smaller firms like Cyrix and NexGen, but they realized that they could apply pipelined design philosophies and practices to the x86-architecture either directly as in the 686 and MII series, or indirectly (via extra decoding stages) as in Nx586 and AMD K5. Later, more powerful processors such as Intel P6 and AMD K6 had similar RISClike units that executed a stream of micro-operations generated from decoding stages that split most x86 instructions into several pieces. Today, these principles have been further refined and are used by modern x86 processors such as Intel Core 2 and AMD K8. The first available chip deploying such techniques was the NexGen Nx586, released in 1994 (while the AMD K5 was severely delayed and released in 1995). As of 2007, the x86 designs (whether Intels or AMDs) are as fast as (if not faster than) the fastest true RISC single-chip solutions available. RISC VS CISC : CISC Emphasis on hardware Includes multi-clock complex instructions Memory-to-memory: "LOAD" and "STORE" incorporated in instructions Small code sizes, high cycles per second Transistors used for storing complex instructions Fig.(table for CISC VS RISC) RISC Emphasis on software Single-clock, reduced instruction only Register to register: "LOAD" and "STORE" are independent instructions Low cycles per second, large code sizes Spends more transistors on memory registers

3. Explain the following with respect to the design specifications of an Assembler: A) Data Structures Ans:

The second step in our design procedure is to establish the databases that we have to work with.

Pass 1 Data Structures 1. Input source program 2. A Location Counter (LC), used to keep track of each instructions location. 3. A table, the Machine-operation Table (MOT) that indicates the symbolic mnemonic, for each instruction and its length (two, four, or six bytes) 4. A table, the Pseudo-Operation Table (POT) that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 1. 5. A table, the Symbol Table (ST) that is used to store each label and its corresponding value. 6. A table, the literal table (LT) that is used to store each literal encountered and its corresponding assignment location. 7. A copy of the input to be used by pass 2. Pass 2 Data Structures 1. Copy of source program input to pass1. 2. Location Counter (LC) 3. A table, the Machine-operation Table (MOT), that indicates for each instruction, symbolic mnemonic, length (two, four, or six bytes), binary machine opcode and format of instruction. 4. A table, the Pseudo-Operation Table (POT), that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 2. 5. A table, the Symbol Table (ST), prepared by pass1, containing each label and corresponding value. 6. A Table, the base table (BT), that indicates which registers are currently specified as base registers by USING pseudo-ops and what the specified contents of these registers are. 7. A work space INST that is used to hold each instruction as its various parts are being assembled together. 8. A work space, PRINT LINE, used to produce a printed listing. 9. A work space, PUNCH CARD, used prior to actual outputting for converting the assembled instructions into the format needed by the loader. 10. An output deck of assembled instructions in the format needed by the loader. Format of Data Structures The third step in our design procedure is to specify the format and content of each of the data structures. Pass 2 requires a machine operation table (MOT) containing the name, length, binary code and format; pass 1 requires only name and length. Instead of using two different tables, we construct single (MOT). The Machine operation table

(MOT) and pseudo-operation table are example of fixed tables. The contents of these tables are not filled in or altered during the assembly process. The following figure depicts the format of the machine-op table (MOT) 6 bytes per entry Not Mnemonic Binary Opcode Instruction Instruction used Opcode (1byte) length Format here (4bytes) (hexadecimal) ( 2 bits) ( 3 bits) characters (binary) (binary) ( 3 bits) Abbb 5A 10 001 Ahbb 4A 10 001 ALbb 5E 10 001 ALRB 1E 01 000 . .. b represents blank

B) pass1 & pass2 Assembler flow chart Ans:

Pass Structure of Assemblers Here we discuss two pass and single pass assembly schemes in this section: Two pass translation Two pass translation of an assembly language program can handle forward references easily. LC processing is performed in the first pass and symbols defined in the program are entered into the symbol table. The second pass synthesizes the target form using the address information found in the symbol table. In effect, the first pass performs analysis of the source program while the second pass performs synthesis of the target program. The first pass constructs an intermediate representation (IR) of the source program for use by the second pass. This representation consists of two main componentsdata structures, e.g. the symbol table, and a processed form of the source program. The latter component is called intermediate code (IC). Single pass translation LC processing and construction of the symbol table proceed as in two pass translation. The problem of forward references is tackled using a process called backpatch-ing.

The operand field of an instruction containing a forward reference is left blank initially. The address of the forward referenced symbol is put into this field when its definition is encountered.
Look at the following instructions:

START READ MOVER MOVEM


AGAIN

101 N BREG, ONE BREG, TERM BREG, TERM CREG, TERM CREG, ONE CREG,TERM CREG,N LE, AGAIN BREG, RESULT RESULT 101) 102) 103) 104) 105) 106) 107) 108) 109) 110) 111) 112) 1 1 1 1 113) 114) 115) 116)

+ 09 0 113 + 04 2 115 + 05 2 116 + 03 2 116 + 04 3 116 + 01 3 115 + 05 3 116 + 06 3 113 + 07 2 104 + 05 2 114 + 10 0 114 + 00 0 000

MULTI MOVER
ADD MOVEM COMP BC MOVEM PRINT STOP

N RESULT ONE TERM

DS DS DC PS END

+ 00 0 001

In the above program, the instruction corresponding to the statement MOVER BREG, ONE can be only partially synthesized since ONE is a forward reference. Hence the instruction opcode and address of BREG will be assembled to reside in location 101. The need for inserting the second operands address at a later stage can be indicated

by adding an entry to the Table of Incomplete Instructions (TII). This entry is a pair (instruction address>, <symbol>), e.g. (101, ONE) in this case. By the time the END statement is processed, the symbol table would contain the addresses of all symbols defined in the source program and TII would contain information describing all forward references. The assembler can now process each entry in TII to complete the concerned instruction. For example, the entry (101, ONE) would be references. The assembler can now process each entry in TII to complete the concerned instruction. For example, the entry (101, ONE) would be processed by obtaining the address of ONE from symbol table and inserting it in the operand address field of the instruction with assembled address 101. Alternatively, entries in TII can be processed in an incremental manner. Thus, when definition of some symbol symb is encountered, all forward references to symb can be processed.

Design of A Two Pass Assembler Tasks performed by the passes of a two pass assembler are as follows: Pass I: 1. 2. 3. 4. Separate the symbol, mnemonic op code and operand fields. Build the symbol table. Perform LC processing. Construct intermediate representation.

Pass II: Synthesize the target program. Pass I performs analysis of the source program and synthesis of the intermediate representation while Pass II processes the intermediate representation to synthesize the target program. The design details of assembler passes are discussed after introducing advanced assembler directives and their influence on LC processing.
4. Define the following, A) Parsing

B) Scanning
C)Token

Ans: Parsing:

Parsing transforms input text or string into a data structure, usually a tree, which is suitable for later processing and which captures the implied hierarchy of the input. Lexical analysis creates tokens from a sequence of input characters and it is these tokens that are processed by a parser to build a data structure such as parse tree or abstract syntax trees. Conceptually, the parser accepts a sequence of tokens and produces a parse tree. In practice this might not occur. 1. The source program might have errors. Shamefully, we will do very little error handling. 2. Real compilers produce (abstract) syntax trees not parse trees (concrete syntax trees). We dont do this for the pedagogical reasons given previously. There are three classes for grammar-based parsers. 1. Universal 2. Top-down 3. Bottom-up The universal parsers are not used in practice as they are inefficient; we will not discuss them. As expected, top-down parsers start from the root of the tree and proceed downward; whereas, bottom-up parsers start from the leaves and proceed upward. The commonly used top-down and bottom parsers are not universal. That is, there are (context-free) grammars that cannot be used with them.

The LL and LR parsers are important in practice. Hand written parsers are often LL. Specifically, the predictive parsers we looked at in chapter two are for LL grammars. The LR grammars form a larger class. Parsers for this class are usually constructed with the aid of automatic tools.

Scanning and token: There are three phases of analysis with the output of one phase the input of the next. Each of these phases changes the representation of the program being compiled. The phases are called lexical analysis or scanning, which transforms the program from a string of characters to a string of tokens. Syntax Analysis or Parsing, transforms the program into some kind of syntax tree; and Semantic Analysis, decorates the tree with semantic information. The character stream input is grouped into meaningful units called lexemes, which are then mapped into tokens, the latter constituting the output of the lexical analyzer. For example, any one of the following C statements x3 = y + 3; x3 = y + 3 ; x3 = y+ 3 ; but not x 3 = y + 3; would be grouped into the lexemes x3, =, y, +, 3, and ;. A token is a <token-name,attribute-value> pair. The hierarchical decomposition above sentence is given figure 1.0

Fig. A token is a <token-name, attribute-value> pair. For example 1. The lexeme x3 would be mapped to a token such as <id,1>. The name id is short for identifier. The value 1 is the index of the entry for x3 in the symbol table produced by the compiler. This table is used gather information about the identifiers and to pass this information to subsequent phases. 2. The lexeme = would be mapped to the token <=>. In reality it is probably mapped to a pair, whose second component is ignored. The point is that there are many different identifiers so we need the second component, but there is only one assignment symbol =. 3. The lexeme y is mapped to the token <id,2> 4. The lexeme + is mapped to the token <+>. 5. The number 3 is mapped to <number, something>, but what is the something. On the one hand there is only one 3 so we could just use the token <number,3>. 6. However, there can be a difference between how this should be printed (e.g., in an error message produced by subsequent phases) and how it should be stored (fixed vs. float vs. double). Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored. Another possibility is to have a separate numbers table. 7. The lexeme ; is mapped to the token <;>.

Note, non-significant blanks are normally removed during scanning. In C, most blanks are non-significant. That does not mean the blanks are unnecessary. Consider int x; int x; The blank between int and x is clearly necessary, but it does not become part of any token. Blanks inside strings are an exception, they are part of the token (or more likely the table entry pointed to by the second component of the token). Note that we can define identifiers, numbers, and the various symbols and punctuation without using recursion (compare with parsing below). Parsing involves a further grouping in which tokens are grouped into grammatical phrases, which are often represented in a parse tree.

5. Describe the process of Bootstrapping in the context of Linkers Ans:

In computing, bootstrapping refers to a process where a simple system activates another more complicated system that serves the same purpose. It is a solution to the Chicken-and-egg problem of starting a certain system without the system already functioning. The term is most often applied to the process of starting up a computer, in which a mechanism is needed to execute the software program that is responsible for executing software programs (the operating system). Bootstrap loading The discussions of loading up to this point have all presumed that theres already an operating system or at least a program loader resident in the computer to load the program of interest. The chain of programs being loaded by other programs has to start somewhere, so the obvious question is how is the first program loaded into the computer? In modern computers, the first program the computer runs after a hardware reset invariably is stored in a ROM known as bootstrap ROM. as in "pulling ones self up by the bootstraps." When the CPU is powered on or reset, it sets its registers to a known state. On x86 systems, for example, the reset sequence jumps to the address 16 bytes below the top of the systems address space. The bootstrap ROM occupies the top

64K of the address space and ROM code then starts up the computer. On IBMcompatible x86 systems, the boot ROM code reads the first block of the floppy disk into memory, or if that fails the first block of the first hard disk, into memory location zero and jumps to location zero. The program in block zero in turn loads a slightly larger operating system boot program from a known place on the disk into memory, and jumps to that program which in turn loads in the operating system and starts it. (There can be even more steps, e.g., a boot manager that decides from which disk partition to read the operating system boot program, but the sequence of increasingly capable loaders remains.) Why not just load the operating system directly? Because you cant fit an operating system loader into 512 bytes. The first level loader typically is only able to load a single-segment program from a file with a fixed name in the top-level directory of the boot disk. The operating system loader contains more sophisticated code that can read and interpret a configuration file, uncompressed a compressed operating system executable, address large amounts of memory (on an x86 the loader usually runs in real mode which means that its tricky to address more than 1MB of memory.) The full operating system can turn on the virtual memory system, loads the drivers it needs, and then proceed to run user-level programs. Many Unix systems use a similar bootstrap process to get user- mode programs running. The kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long, into that process. The tiny program executes a system call that runs /etc/init, the user mode initialization program that in turn runs configuration files and starts the daemons and login programs that a running system needs. None of this matters much to the application level programmer, but it becomes more interesting if you want to write programs that run on the bare hardware of the machine, since then you need to arrange to intercept the bootstrap sequence somewhere and run your program rather than the usual operating system. Some systems make this quite easy (just stick the name of your program in AUTOEXEC.BAT and reboot Windows 95, for example), others make it nearly impossible. It also presents opportunities for customized systems. For example, a single- application system could be built over a Unix kernel by naming the application /etc/init. Software Bootstrapping & Compiler Bootstrapping Bootstrapping can also refer to the development of successively more complex, faster programming environments. The simplest environment will be, perhaps, a very basic text editor (e.g. ed) and an assembler program. Using these tools, one can write a more complex text editor, and a simple compiler for a higher-level language and so on,

until one can have a graphical IDE and an extremely high-level programming language. Compiler Bootstrapping In compiler design, a bootstrap or bootstrapping compiler is a compiler that is written in the target language, or a subset of the language, that it compiles. Examples include gcc, GHC, OCaml, BASIC, PL/I and more recently the Mono C# compiler.
6. Describe the procedure for design of a Linker. Ans:

Design of a linker Relocation and linking requirements in segmented addressing The relocation requirements of a program are influenced by the addressing structure of the computer system on which it is to execute. Use of the segmented addressing structure reduces the relocation requirements of program. Implementation Examples: A Linker for MS-DOS Example: Consider the program of written in the assembly language of intel 8088. The ASSUME statement declares the segment registers CS and DS to the available for memory addressing. Hence all memory addressing is performed by using suitable displacements from their contents. Translation time address o A is 0196. In statement 16, a reference to A is assembled as a displacement of 196 from the contents of the CS register. This avoids the use of an absolute address, hence the instruction is not address sensitive. Now no relocation is needed if segment SAMPLE is to be loaded with address 2000 by a calling program (or by the OS). The effective operand address would be calculated as <CS>+0196, which is the correct address 2196. A similar situation exists with the reference to B in statement 17. The reference to B is assembled as a displacement of 0002 from the contents of the DS register. Since the DS register would be loaded with the execution time address of DATA_HERE, the reference to B would be automatically relocated to the correct address. Though use of segment register reduces the relocation requirements, it does not completely eliminate the need for relocation. Consider statement 14 . MOV AX, DATA_HERE

Which loads the segment base of DATA_HERE into the AX register preparatory to its transfer into the DS register . Since the assembler knows DATA_HERE to be a segment, it makes provision to load the higher order 16 bits of the address of DATA_HERE into the AX register. However it does not know the link time address of DATA_HERE, hence it assembles the MOV instruction in the immediate operand format and puts zeroes in the operand field. It also makes an entry for this instruction in RELOCTAB so that the linker would put the appropriate address in the operand field. Inter-segment calls and jumps are handled in a similar way. Relocation is somewhat more involved in the case of intra-segment jumps assembled in the FAR format. For example, consider the following program : FAR_LAB EQU THIS FAR ; FAR_LAB is a FAR label JMP FAR_LAB ; A FAR jump Here the displacement and the segment base of FAR_LAB are to be put in the JMP instruction itself. The assembler puts the displacement of FAR_LAB in the first two operand bytes of the instruction , and makes a RELOCTAB entry for the third and fourth operand bytes which are to hold the segment base address. A segment like ADDR_A DW OFFSET A (which is an address constant) does not need any relocation since the assemble can itself put the required offset in the bytes. In summary, the only RELOCATAB entries that must exist for a program using segmented memory addressing are for the bytes that contain a segment base address. For linking, however both segment base address and offset of the external symbol must be computed by the linker. Hence there is no reduction in the linking requirements.

Das könnte Ihnen auch gefallen