Sie sind auf Seite 1von 16

Compiler Design

VISUE

Q.1 What is the challenges of compiler design? Ans. Compiler writing is not an easy job. It is very challenging and must have lot of
knowledge about variables fields of computer science. Some challenges of compiler design. 1) Many Variations: Many Programming language (FORTRAN, C++, JAVA) Many programming paradigms (that is object oriented functional) Many computer architecture (i.e. MIPS, SPARC, INTEL, ALPHA) Many operating system (i.e. LINUX, SOLARIES, WINDOWS) 2) Qualities of a good compiler: The compiler itself must be bug-free. It must generate correct machine code The generated machine code must run fast. The compiler itself must be portable. It must print good diagnostic and error messages The generator code must work well with existing debuggers. Must have consistent and predictable optimization

3) Building a compiler requires knowledge of: Programming logic Theory of automata, context free Algorithms and data structure Computer architecture Software engineering

Q.2 Explain static and dynamic checking.


Ans. A compiler must check that the source program follows both the syntactic and semantic conversations of the source language. This checking called static checking. Checking done during the execution of the target program called dynamic checking. Checking done by a complier is said to be static while checking done when the target program runs is termed dynamic. In principle any check can be done dynamically if the target code carries the type at all element along with the value of the element. A sound type system eliminates the need for dynamic checking for type errors can not occurs when the target program runs. A language is strongly if its compiler can guarantee that the programs it accepts will execute without type errors. In some checks can do only dynamically. Table; array [0 255] of char; i; integer and then compute table [i], a compiler can not in general guarantee that during execution the value of I will be lie in the range o to 2552.

Q.3 What is an activation record? Draw one activation record with the fields those are generally present in many languages.
Ans. Each execution procedure is referred to as an activation of the procedure. If a procedure is no recursive then there exists only one activation of procedure at a time, whereas if a procedure is recursive several activation of that procedure may be active at the same time. The information needed by a single execution or a single activation of a procedure is managed using contiguous block of storage called an activation record or activation frame consisting of the collection of fields. The activation record contains the following information. 1. Temporary values used during expression evaluation 2. Local data of a procedure 3. Saved machine status information (PC, registers, return address) 4. Access link for access to non local names (1) SOLUTION

Compiler Design

VISUE

5. The actual parameter 6. The returned value use by called procedure to return a value of calling procedure. 7. (Optional) Control link, points to the activation record of the caller.

Return Value Actual Parameters Control Links Access Links Machine Status Local Data Temporaries

Q.4 What is importance at design?

symbol table in compiler

Ans. Symbol Table: A symbol table is a data structure connecting a record with fields for the attributes of the identifier. The data structure allows us to find the records for each identifier quickly and to store or retrieve data from that record quickly. Importance: The symbol table must have an efficient mechanism for accessing the information held in the tables as well as for adding new entries to the symbol table. Symbol table is a useful abstraction to aid the compiler to ascertain and verify the semantics or meaning of piece of code. This most important function of the symbol table in a compiler first they cache useful information about various symbol from the source code program for later use during code generation second they provide the type checking mechanism that determine the semantic correctness of a program.

Q.5 List and Explain the error recovery strategies.


Ans. 1. 2. 3. 4. There are four different strategies that are used for error recovery which are listed below: Panic Level Pharse Level Error Production Global Correction

Panic Level Recovery: The strategies that a parser can employ to recover form symbolic error. This is the simplest method to implement and can be used by most parsing methods. On discovering an error, the parser discards input symbol once at a time until one of a designated set of synchronizing token is found. The synchronizing token is found. The synchronizing tokens are usually delimiters, such as semicolon or end whose role in the source program is clear. The compiler design must select the synchronizing token appropriate for the source language. While panic mode correction often skips a considerable amount of input without checking it for additional errors, it has the advantages of simplicity and unlike some others methods to be considered later, it is guaranteed not to go into an infinite loop. In situations where multiple errors in the same statement are rare, this method may be quite adequate. Phrase Level Recovery: On discovering an error, a parser may perform local correction ion the remaining input, that is, it may replace a prefix of the remaining input by some string that allows the parser to continue. A typical local correction would be replacing a comma by it semicolon delete an extraneous semicolon, or insert a missing semicolon delete an extraneous semicolon or insert a missing semicolon. We must be careful to choose replacements that do not load to infinite loops, as would be the case for example, if we always inserted something on the input ahead of the current input symbol. This type of replacement can correct any input string and has been used in server error repairing compilers. The method was first used with top-down parsing. Its major drawback is the difficulty it has in coping with situation in which the actual error has in which the actual error has occurred before the point of detection.

(2)

SOLUTION

Compiler Design

VISUE

Q.6 What do you mean by syntax directed definitions and translation scheme.
Ans. Syntax Directed Definitions: A syntax directed definitions is a generalization of grammar in which each grammar syntax has an associated set of attributes, partitioned into two subsets called the synthesized and inherited attributes of that grammar symbol. In syntax directed definitions each grammar function production A -> has associated with it a set of semantic rules of the form b:= f (c1,c2,c3,ck) where f is a function and either. 1. b is a synthesized attributes of A and c1, c2, .ck are attributes belonging to the grammar symbols of the production. 2. b is an inherited attributes of one of the grammar symbols in the right side of production and c1, c2ck are attributes belonging to the grammar symbol of the production. Syntax directed definition terminals are assumed to have synthesized attributes only as the definition does not provide any semantic rules of terminals. Syntax Directed Definitions: A syntax directed definition generalizes a content free grammar by associating a set of attributes with each node in a parse tree. Each attributes gives some information about the node. Ex: attributes associated with an expression node may give its value, its type or its location memory etc. As an example, suppose the grammar certain the production. X YZ so node X in a parse tree has nodes Y and Z as children and further suppose the nodes X, Y and Z have associated attributes X.a, Y.a and Z.a respectively. If the semantic rule (X.a:= Y.a + Z.a) is associated with the x YZ production then the parser and a attributes of node Y and Z together and set the a attribute of node X to their sum. Translation Scheme: In syntax directed translation schemes: Embed semantic rules into the grammar Each semantic rule can only use information computed by already executed semantic rules. If compare with syntax directed definitions then we found that definitions describe only relationship among attributes associated with grammar symbols. A translation scheme is a convenient way of describing an n attributed definition. Ex: - assume the grammar has a production A XY and further assume that A, X and Y have inherited attributes A.i, X.i and Y.i and synthesized attributes A.s, X.s and Y.s respectively. Because we have an i-attributed definition. X.i can only be function of A.i that is X.i=F (A.i) Y.i can only be function of A.i, X.i and X.S that is Y.i=g (A.i, X.i, X.s) A.s is a function of A.i, X.i and X.s, Y.i and Y.s that is A.s = h (A.i, X.i, X.s, Y.i, Y.s) A translation scheme would be embed in the production A XY as follows A {X.i = f (A.i);}X {Y.i = g (A.i, X.i, X.s);}y

Q.7 Discuss the principle sources of optimization.


Ans. A transformation of a program is called local if it can be performed by looking only at the statements in a basic block, otherwise it is called global. Local transformations are usually performed first. 1. Function preserving Transformations: There is a common sub expression elimination copy propagation, dead-code elimination and constant folding are common examples of such function preserving transformation. 2. Common Sub Expressions: An occurrence of an expression E is called a common sub expression if E was previously computed, and the values of variables in E have not changed since the previous computation. It needs to avoid recomputing the expression if it can use the previous computed value. 3. Copy Propagation: When the common sub expression in a = d + e is eliminated in the algorithm user a new variable + to hold the value of d + e. Since control may reach c: d + e either after the assignment to a or after the assignment to b, it would be incorrect to replace c: d+ e by either c:a or by c:=b. 4. Dead-Code Elimination: A Variable is live at a point in a program if its value can be used subsequently; otherwise it is dead at that point. A related data is dead or useless code statements that computer values that never get used.

(3)

SOLUTION

Compiler Design

VISUE

While the programmer is unlikely to introduce any dead code internationally it may appear as the result at previous transformations. Code motion, which moves code outside a loop, introduction variable elimination which reduction in strength. 5. Code Motion: An important modification that decreases the amount of code in a loop is code motion. The transformation takes an expression that yields the same results independent of the number of times a loop is executed and of times a loop is executed and places the expression before the loop. 6. Induction Variables and Reduction in strength: While code motion is not applicable to the quick sort, considering the other two transformations are loops are usually processed inside out.

Q.8 Discuss the issues in the design of a code generator.


Ans While the details are depends on the target language and operating system; issues such as memory management, instruction selection, register allocation and evaluation order are inherent in almost all code generation problems. INPUT TO CODE GENERATOR: The input to the code generator consist of the intermediate representation of source program produced by the front end, together with information in the symbol table that is used to determine the run-time addresses of the data objects denoted by the names in the intermediate representation. TARGET PROGRAM: The o/p of the code generator is the target program, like the intermediate code. This o/p may take a variety of forms. Absolute machine language program as o/p has the advantages that it can be placed in a fixed location in memory and immediately executed. Producing an assembly language program as o/p makes the process of code generation somewhat easier. It generates symbolic instruction and uses macro facilities. INSTRUCTION SELECTION: The nature of the instruction set of the target machine determines the difficulty of the instruction selection. The quality of the generated code is always determined by its speed and size. A target machine with a rich instruction set may provide several way of implementing a given operation. Ex: MOV a, Ro ADD #1, Ro MOV Ro, a Instruction speeds are needed to design good code sequence. REGISTER ALLOCATION: Instruction involving register operands are usually shorter and faster than those involving operands in memory. Therefore efficient utilization of registers is particularly important in generating good code. During register allocation, we select the set of variables that will reside the registers at a point in the program. During a subsequent register assignment phase, pick the specific register that a variable will reside in. Ex: Multiplication M x, y. CHOICE OF EVALUATION ORDER: The order in which computations are performed can affect the efficiency of the target code. Initially, we shall avoid the problems by generating code for the 3-add code statements in the order in which they have been produced by the intermediate code generator. APPLICATIONS TO CODE GENERATION 1. Undoubtedly the most important criterion for a code generator is that it produce correct code. 2. Correctness takes on special significance because of the number of special cases that a code generator might face. 3. Given the premium on correctness, designing a code generator so it can be easily implemented tested and maintained is an important design goal.

Q.9 What is compiler? List its advantages.


Ans. COMPILER: A compiler is an program that reads a program written in one language- the source program and translates it into an equivalent program in another language target language. Source Program

Compiler

Target Program

Error Messages (4) SOLUTION

Compiler Design

VISUE

Advantages: A one pass compiler is fast, since all the compiler code is located in the memory at once. It can process the source text without the overhead of the operating system having to shut down one process and start another. Output of each pass of the multipass compiler is stored on disk and must be reacting in each time the next pass starts. Multipass compiler has not imposed any type of restriction upon the user. Each pass of compiler can be regarded as a mini compiler, having an input source written in one intermediary language and producing an o/p written in another intermediate language.

Q.10 Give the difference between the NFA and DFA.


Ans. NFA
1. Non Deterministic Finite Automata 1.

DFA
Deterministic Finite Automata

2. State can have more than one transition for some input symbol 3. When we follow Thompsons rule we have transition and single final state 4. Functionally NFA has more no of states that DFA 5. Slower recognizer 6. Expression to NFA is easy 7. Automation is not easy

2. 3. 4. 5. 6. 7.

There is not a case such like that Not the case Less no of states than NFA in worst case no of state is exponential Faster Recognizer Expression to DFA is not easy Automation is easy

Q.11 Explain DAG with an example.


Ans. A directed acyclic graph for an expression identifier the common sub expressions in the expression. Like a syntax tree, a DAG has a node for every sub expression of the expression, an interior node represents an operator and its children represents it operands. The difference is that a node in a DAG representing a common sub expression has more than one parent in a syntax tree; the common sub expression would be represented as a duplicated subtree. Ex: a + a * (b c ) + (b c ) * d The leaf, for a has two parents because a is common to the two sub expression a and a* (b c). Likewise, both occurrences of the common to sub expression a and a* (b c). Likewise, both occurrences of the common sub expression b c are represented by the same node, which also has two parents. A DAG is obtained if the function constructing a node first checks to see whether identical nodes already exist. For example, Before constructing a new node with label OP and field with pointers to left and right mknode (OP, LEFT, and RIGHT) can check whether such a node has already been constructed. If so mknode (OP, LEFT and RIGHT) can return a pointer to the previously constructed node. The constructing functions mkleaf can behave similarly.

Q.12 Explain Peephole Optimization.


Ans Statement by statement code generation strategies often produce target code that contains redundant instructions and suboptimal constructs. The quality at such target code can be improved by applying optimizing transformations to the target program. The term optimizing is somewhat misleading because there is no guarantee that the resulting code is optimal under any mathematical measure. (5) SOLUTION

Compiler Design

VISUE

A simple but effective technique for locally improving the target code is peephole optimization; a method for the target program by examining a shorter sequence of target instruction is called peephole and replacing these instructions by a shorter or faster sequence, whenever possible. Although, we discuss peephole optimization a technique for improving the quality of the target code, the technique can also be applied directly after intermediate code generation to after intermediate code generation to improve the intermediate representation. The peephole is a small, moving window on the target program. The code in the peephole need not be contiguous, although some implementations do require this. It is characteristic of peephole optimization that each improvement may spawn opportunities for additional improvement. Following are the characteristic of peephole optimizations: Redundant- Instruction elimination Flow of control Optimization Algebraic Simplification Use of machine idioms Redundant Instruction Elimination: If we see the instruction sequence MOV Ro, a MOV a, Ro While target code such as would not be generated if the algorithm were used, it might be, like the maintained. Unreachable Code: Another opportunity for peephole optimization is the removal of unreachable instructions. An unlabeled instruction immediately following an unconditional jump may be removed. This operation can be repeated to eliminate a sequence at instruction. Flow of Control Optimizations: Unnecessary jumps can be eliminated in either the intermediate code or the target code by the following types of peephole optimization. Algebraic Simplification: There is no end to the amount at algebraic through peephole optimization. Reduction in strength: Reduction in strength replaces expensive operations by equivalent character ones on the target machines certain machine. Use of Machine Idioms: The target machine may have hardware instructions to implement certain specific operations efficiently. This mode can also be used in code instructions.

Q.13 Explain Panic mode.


Ans. Definition: The lexical analyzer is unable to proceed because none of the patterns for tokens matches a prefix of the remaining input. Perhaps the simplest recovery strategy is Panic mode recovery. The recovery technique may occasionally confuse the parser, but in a interactive computing environment it may be quite adequate.

Q.14 Define LL, LR, SLR, CLR, LALR, Grammar.


Ans. LL: If each non-terminal of given grammar which appears on the left hand side of more than one production have the LL property that grammar is said to be LL. LR (0) SLR: A LR (0) item is a production with a dot at some position in the right hand side of the production. items A XYZ A XYZ .XYZ Xa A aYZ X.YZ Yb A abZ XY.Z Zc A abc XYZ. LR (1): LR (1) item set of a grammar G is a production of G with a dot at some position of the right hand side and a look head is also attached it as follows: 1. Add every item in l to closure w. 2. Repeat to every item of the form A {B,a} in closure(l) do for every production B rdo add B r , first {(B,a)} to closure (l). LALR: LALR parsing table can be obtained by following steps. Very first we will obtain LR (1) items After this we will calculate canonical collection as we did in SLR (1) parsing table Now, we will combine these canonicals which have same LR (0), but having different lookaheads (6) SOLUTION

Compiler Design
Q.15 How sentinel technique used in i/p buffering

VISUE

Ans. Sentinel technique: It is the technique in which use the code for advancing the forward pointer performs tests. As shown in figure must check each time move the forward pointer that have not moved off one half of the buffer, it do then must reload another half. Code for advance forward pointer. If forward at end of first half then begin Reload second half Forward = forward +1 End Else if forward at end of second half then begin Reload first half Move forward to beginning of first half End Else forward: = forward + 1 Except at the end of the buffer halves the code requires two tests for each advance of the forward pointer. Here it can reduce the two tests to one if we extended each buffer half to hold a sentinel characteristic at the end. The sentinel is a special character that cannot be part of the source program. Most of the time the code performs only one test to see whether forward points to an eof. Only when reach the end of a buffer half or the ends of the file do perform more tests, the average number of tests per input character is very close to 1.

Q.16 Explain the terms


Ans. Binding Of Names: In programming language, the term environment refer to a function that maps a name to a storage location and the term state refer to a function that maps a storage location to the value held there, as using the terms l-value and r-value environment maps a name to an l-value and state maps the l-value to an r-value. A binding is the dynamic counterpart of a declaration, As seen, more than one activation of a recursive procedure can be alive at the same time.

STATIC NOTION
1. Definition of a procedure

DYNAMIC COUNTERPART
1. activations of the procedure

2. declaration of a name 3. scope of a declaration

2. binding of the name 3. lifetime of a binding

Left Recursive Grammar: a left recursive grammar can be eliminated by rewriting the offending production. For example consider a non-terminal A with two productions. AA| Where and are sequence at terminal and that do not start with A. Handle: A handle of a right sentential form Y is a production A and a position of Y where the string may be found and replaced by A to produce the previous right sentential form in a rightmost derivation of Y. Closure of Language: If I is set of items for a grammar then closure (l) is a set of items constructed from l by following two rules. (a) Every item in l is in closer (l). (b) If Aa.xy is in closure (l) and x r is production then add xr to closure (l). (7) SOLUTION

Compiler Design

VISUE

Q.17 Explain register allocation and assignment in code generation.


Ans. Instructions involving register operands are usually shorter and faster than those involving operands in memory. Therefore efficient utilization of registers is important in generating good code. The use of registers is often subdivided into two sub problem. 1. During register allocation, we select the set of variables that will reside in registers of a point in the program. 2. During a subsequent register assignment phase, we pick the specific register that a variable will reside in. Finding an optimal assignment at registers to variables is difficult, even with single register values. The problem is further complicated because the hardware and/or the operating system of the machine. Certain machine requires register pairs for some operands and results. Multiplication instruction can be M x, y Division instruction can be D x, y Where the 64-bit divided occupies an even load registers pair whose even register is X; Y represents the divisor. After division the even register holds the remainder and adds odd register the quotient.

Q.18 Explain register allocation and assignment in code generation.


Ans. Instructions involving register operands are usually shorter and faster than those involving operands in memory. Therefore, efficient utilization of registers is important in generating good code. One approach to register allocation and assignment is to assign specific values in an object program to certain registers. Approach to register allocation and assignment is to assign specific values in an object project. The code generation algorithm is used registers to hold values for the duration of single basic block. Register allocation is to assign some fixed number of registers to hold the most active values in each inner loop. The selected values may be different in different loops. Registers not already allocated may be used to hold values local to one block. This approach has the drawback that the fixed number of register is not always the right number to make available for global register allocation. Usage Counts: On the debit side, if x is live on entry for each exit block is at loop L at which x is live on entry to some successor of B outside of L.

(use (x, B) + 2 * live (x, B))


Block b in L Where use (x, B) is the number of times x is used in B prior to any definition of x. Live (x, B) is 1 if x is live on exit from B and assigned a value in B and live (x, B) is 0 otherwise it happen. Register Assignment for outer loops: Having assigned registers and generated code for inner loops, we may apply the same idea to progressively larger loops. In an outer loop L1 contains an inner loop L2; the names allocated registers in L2 need not be allocated register in L1-L2. However if name x is allocated a register in loop L1 but not L2; we must store x an entrance to L2 and load x if we leave L2 and enter a block of L1-L2. Similarly if we choose to allocate x a registers in L2 but not L1 we must load x on entrance to L2 and store x on exit from L2. We leaves as an exercise the derivation of a certain for selecting names to be allocated registers in an outer loop L, given that choices have already been made for all loops nested within L.

(8)

SOLUTION

Compiler Design

VISUE

Handle: A handle of a right sentential form Y is a production A and a position of Y where the string may be found and replaced by A to produce the previous right sentential form in a rightmost derivation of Y. That is if s AW W then A is the position following is a handle of W. The string w to the right of the handle contains only terminal symbol. Handle Pruning: A rightmost derivation in reverse often called a canonical reduction sequence is obtained by handle pruning. That is we start with a string of terminals w which we wish to parse. If w is sentences of the grammar at hand then w=Yn where Yn is the nth right sentential form of some as yet unknown rightmost derivation.

Q.19 Explain parameter transmission with reference to procedure.


Ans. Parameter transmission are the ways in which parameter are passed by value, the value of the actual parameter is used to initialize the corresponding formal parameter which acts as a local variable in the sub-program it can be understood by.

Q.20 Parameter transmission method


Ans. Parameter passing methods are the ways in which parameters are transmitted to and/or form called sub-programs. A variety of method has been developed by language designers. 1. Parameter passing by value: Parameter passing methods are the ways in which parameters are is used to initialize the corresponding formal parameter which acts as a local variable in the subprogram. It can be understood by following program.

Call
Actual transfer of 2 and 3

Add (2, 3)

Int add (int a, int b) { Int c = a + b; return c; }

Pass by value is normally implemented by the actual data transfer. The main disadvantage of the pass by value method if physical moves are done is the additional storage is required for the formal parameters, either in the called sub-program or in some area outside both the caller and called subprogram. Storage and the move operation can be costly if the parameter is large such long array. 2. Parameter Passing Pass by Reference: In case of pass by reference method we pass reference of actual parameter to the formal parameter. The actual parameters are shared with the called subprogram.

Action (Sa, Sb)

Passing of reference

Void action(int *x, int y)

{ }

(9)

SOLUTION

Compiler Design

VISUE

The several disadvantages to the pass by reference method. First access to formal parameter will most likely be slower because more level of indirect addressing needed then when data values are transmitted. Another serious problem of pass by reference is that aliases can be created. This is expected because pass-by the reference makes access path available to the called sub-programs.

Q.21 Explain Heap Allocation.


Ans. One useful runtime organization is the Heap a large block of storage that may be partitioned at will into smaller blocks. Heap Allocation Available space

Free

Value of Y

Free

Value of X

Free

Value of Z

Free

X Y Z

There we see pointer form fixed location representing three names x, y and z. These fixed locations might be allocated statically or might be on a stack. They point to blocks of memory in the heap, and the value of each name, including a data description giving the block length, is kept in the block. The portions of the heap not currently in use are linked together in an available space list. That is each free block contains a pointer to the next free block and information regarding how long free block is. One method is to attach a use count to every block telling how many pointers reaches zero, the block is made free. Another problem of heap management is fragmentation when we free a block we must do something to attach it to adjacent free blocks if any. The available space list will consist of many little blocks, none of which is sufficient to hold a large block of data. When there is no more available space, each block is checked to see whether there is a path of pointers from the location associated with some name to the block. If not the block is placed on the available space list, this process is called garbage collection.

Q.22 What is code optimization? Explain with the help of an example.


Ans. A good optimizing compiler can improve the target program by perhaps a factor of two in overall speed, in comparison with a compiler that generates code carefully but without using the specialized techniques generally referred to as code optimization. The code optimization phase attempts to improve the intermediate code, so that faster running machine code will return result. Some optimizations are trivial. Example: - A natural algorithm generates the intermediate code using an instruction for each operator in the tree representation after semantic analysis, even though there is a bettor way to perform the same calculation using the two instructions. Temp1 := id3 * 60.0 Id1 :- id2 + temp1;

(10)

SOLUTION

Compiler Design

VISUE

There is nothing wrong with this simple algorithm. The three address statement is an abstract form of intermediate code. In an actual compiler these statement can be implemented an of the following way. (1) Quadruples (2) Tripler (3) Indirect triples (4) Comparison of representation (5) Single array representations Quadruples: a record structure with four fields which we shall call OP, ARG1, ARG2 and RESULT. This representation of three address statements is known as quadruples. Here OP contains an internal code for the operator. Ex: - A: =B OPC, here puts B in ARG1, C in ARG2, and A in RESULT. Ex: - T1 = -B T2 = C +D T3 = T1* T2 A: = T3 OP (0) (1) (2) (3) Uminus + * := ARG1 B C T1 T3 ARG2 D T2 RESULT T1 T2 T3 A

Triples: To avoid temporary names into the symbol table can allow the statement computing temporary value into represent that value. In three address code representation the ARG1, ARG2 and argument of OP are either pointer to the symbol table or pointers into the structure itself. Since three fields are used this intermediate code format is known as triples. OP (0) (1) (2) (3) Uminus + * := ARG1 B C (0) A ARG2 D (1) (2)

Indirect Triples: Another Implementation of three address code which has been considered is that of listing pointers to triples rather than listing the triples themselves. This implementation is naturally called Indirect Triples. STATEMENT (0) (1) (2) (3) (14) (15) (16) (17) (0) (1) (2) (3) OP Uminus + * := ARG1 B C (14) A ARG2 D (15) (16)

(11)

SOLUTION

Compiler Design
Q.23 Comparison of Representations: The use of Indirection.

VISUE

Ans. Difference between triples and quadruples as a matter of how much indirection is present in the representation. The quadruple notation, the location for each temporary can be immediately accessed via the symbol table from where it is needed. At the three address statements defining or using the temporary. An important benefit or quadruples appears in an optimizing compiler, where we often move statement around. Moving a statement that defines a temporary value requires us to change all pointers to that statement in the ARG1 and ARG2 arrays. These problems make triples difficult to use in an optimizing compiler. Single Array Representation: Both triples and quadruples waste some space, since fields will occasionally be empty. If space is important, one can use a single array and store either triples or quadruples consecutively. Since the operator determines which fields are actually in use, we can decode the single array if we follow each operator by those of ARG1, ARG2 and RESULT. The disadvantages of this representation is seen if we try to examine the statements in reverse order, since we cannot tell just by looking at a word whether it represents an operator or operand.

Q.24 Define the following


Ans. L-Attributes: An L-attributes name encompasses virtually all translation that can be performed without explicit construction of a parse tree. S-Attributes: Syntax directed definition that uses synthesized attributes exclusively is said to be an S-attributes. Compiler: A compiler is a compute program that translates a program written in a computer language into another computer language. Interpreter: Interpreter translates one program statement into machine language executes it and then proceeds to the next statement. Parse Tree: A graphical representation for derivation that filters out the choice regarding replacement order. This representation is called the parse tree. Syntax Tree: A tree in which each leaf represents an operand and each interior node an operator is called syntax tree. Grammar: The suitable notation must be used to specify the constructs of a language for this we uses grammar. Mathematically grammar (G) is; G = (VT, VN, S, P) where VT set of terminal VN set of non terminal S start symbol P set of production Ambiguous Grammar: A grammar that produce move than one parse tree for the same sentence is said to be ambiguous. Synthesized Attributes: An attributes said to be synthesized if its value at a parse tree is determined from attribute value at the children of the node. Synthesized attribute have the desirable property that they can be evaluated during a single bottom up traversal of the parse tree. Inherited Attributes: An inherited attribute is one whose value at a node in a parse tree is defined in terms of attributes at the parent and/or sibling of that node. NFA: A nondeterministic finite automation (NFA) is a mathematical model that consist of A set of states S A set of input symbols E A transition function move that maps state symbol pairs to sets of states. A set state so that is distinguished as the start state. A set of states F distinguished as accepting state. DFA: A deterministic finite automation is a special case a special case of non-deterministic finite automation in which: 1. No state has E- transition. i.e. a transition on input E and 2. For each state S and input symbol a, there is at most one edge tabled a leaving S. Token: The phase, separates characters of the source language into groups that logically belong together these groups are called tokens. Pattern: In general there is a set of strings in the i/p for which the some token is produced as o/p this set of strings is described by a rule called a pattern.

(12)

SOLUTION

Compiler Design

VISUE

Lexeme: Lexeme is a sequence of characters in the source program that is matched by the pattern for a token. In other words we can say that group of characters into meaningful sequence called lexeme.

Q.25 Explain the Symbol Table.


Ans. A compiler needs to collect and use information about the names appearing in the source program. This information is entered into a data structure called a symbol table. The information collected in the symbol table is sued during several stages in the compilation process. The symbol table mechanism must make sure that the innermost occurrence of an identifier is always found first and that names are removed from the active portion of the symbol table when they are no longer active. The contents of a symbol table: A symbol table is merely a table with two fields, a name field and an information field. Some capabilities of the symbol tables are: o Determine whether a given name is in the table o Add a new name to the table o Access the information associated with a given name and o Add new information for a given name o Delete a name or group of names from the tables. Data can be associated with a name in the symbol table. This information includes: The string of characters denoting the name. If the same identifier can be used in more than one block or procedure then an indication of which block or procedure this name belongs to must also be given. The attributes of the name Parameters, such as the numbers of dimensions of arrays and the upper and lower limits along each dimension. As offset describing the position in storage to be allocated for the name.

Q.26 State whether the following statements are true or false.


Ans. 1. Ans. 2. Ans. 3. Ans. The structure of source language has a string effect on the number of passes of compiler TRUE A NFA is more powerful than DFA. FALSE CFG is more powerful than regular expression. TRUE

Q.27 List data structure.


Ans. The conceptually simplest and easiest to implement data structure for a symbol table is the linear list of records is called list data structure. When we use a single array or equivalently several array to store means and their associated information. Name1 INFO1 Name2 INFO2 Name N AVAILABLE INFO N

(13)

SOLUTION

Compiler Design

VISUE

New names are added to list in the order in which they are encountered. To retrieve an information about a name we search from the beginning of the array up to the position marked by pointer AVAILABLE, which indicates the beginning of the empty portion of the array. When the name is located, the associated information can be found in the words following NEXT. If we reach AVAILABLE without finding the name, we have a fault the use of an undefined name. If the symbol table contains n names, the work necessary to insert a new name is proportional to n. To find the data about a name we shall on the average search n/2 name, so the cost o f an inquiry is also proportional to n. To insert n names and m inquiries the total work is Cn (n + m), where C is a constant representing the time necessary for a few machine operations. In a medium sized program we might have n=100 and m=1000 so several hundred thousand machine operations are utilized in bookkeeping Advantage: One advantage of the list organization is that the minimum possible space is taken. In a simple compiler, the space taken by the symbol table may consume most of the space used for the compilers data. If space is at a premium, it may well pay to use the inefficient list organization for the symbol table.

Q.28 How Lexical analysis is different from syntax analysis.


Ans. Lexical analysis is the interface between the source program and the compiler. The lexical analyzer then passes the two components of the token to the parser. The first is a code for the token type and the second is the value a pointer to the place in the symbol table reserved for the specific value found. In syntax analysis, the parser has two functions; it checks appearing in its input, which is the o/p of the lexical analyzer, occurs in patterns that are permitted by the specification for the source language.

Q.29 Classify the symbol table and explain any one of them with example.
Ans. Symbol table mechanism would like a scheme that allows us to add new entries find existing entries in a table efficiently. There are three symbol table mechanism discuss in that 1. Lists 2. Tree 3. Hash Search Trees: a more efficient approach to symbol table organization is to add two links fields LEFT and RIGHT to each record. To use these fields to link the records into a binary search tree. This tree has the property that all names Name.j accessible from NAME. I by following the link LEFT I and then following any sequence of links will precede NAME I in alphabetical order. We searching for the records NAME and have found the record for NAME.I. Here, the algorithm to take for NAME in a binary search tree, where p is initially a pointer to the root. While P# NULL do If NAME = NAME (p) Then Else if NAME < NAME (p) Then p: = LEFT (p) Else NAME (p) < NAME Then P:= RIGHT (p) If name are encountered in a random order, the average length of a path in the tree will be proportional to log n, where n is the number of names. Since each search follows one path from the root the expected time needed to enter n names and make m inquires is proportional to (n + m) log n. If n is greater than about 50, there are clear advantages to the binary search tree over the linear list and probably over the linked self organizing. If efficiency is paramount, however there is an even better method that the binary search tree, the hash table. Hash table: This scheme gives us capability of performing m accesses on n names in time proportional to n (n+ m) / k, for any constant k of our choosing since k can be made as large as we like, this method is generally superior to linear lists of search trees and is the method of choice for symbol table in most situations especially if storage is not particularly costly. The basic hashing scheme is illustrated in figure. Two tables, a hash table and a storage table are used. The hash table consists of k words numbered 0, 1... K-1. These words are pointers into the storage table to the heads of k separate liked and lists. (14) SOLUTION

Compiler Design

VISUE

Each record in the symbol table appears on exactly one of these lists. To determine the whether NAME is in the symbol table, we apply to NAME a hash function h such that h (NAME) is an integer between 0 and k-1. It is on the list numbered h (NAME) that the record for NAME belongs. Since the average list is n/k records long if there are n names in the table. H will distribute names uniformly among the k lists and h is easy to compute for names consisting of strings of characters.

NAME1
NAME

DATA1 LINK1 Hash Table NAME2 DATA2 LINK2 NAME3 DATA3 LINK3 AVAILABLE Storage Table

Q.30 Loop Optimization


Ans. Loop optimization is most valuable machine independent optimization because a programs inner loop are good candidate for improvement. The execution efficiency of a loop can be optimized by moving out same code which is loop invariant such code would now be executed just one prior to loop entry and not repeatedly during execution of the loop. Major task of loop optimizations are: 1. Detecting loops in a program 2. Deducing the feasibility of moving certain code out of the loop Example: Computer the dot product of two vectors A and B. Begin PROD: =0; I: =1; do begin PROD: = PROD + A [I] + A [I] * B [I]; I: = I + 1; End Wile I <= 20 End To perform loop optimizations on the intermediate next rather than on the source program. The code has actually applied for it is: 1. PROD: =0; 9. PROD: = PROD + T6 2. I: = 1 10. I: = I+1 3. T1: = 4 * I 11 If I<=20 go to (3) 4. T2: = addr (A)-4 5. T3: = T2[T1] 6. T4: = add (B)-4 7. T5: = T4[T1] 8. T6: = T3 * T5

(15)

SOLUTION

Compiler Design
Q.31 Define the following.

VISUE

Ans. Strength Reduction: Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine instructions are considerably cheaper and than others and can often used as special cases of more expensive operator. For example, X2 is invariably cheaper to implement as x*x than as a call to an exponentiation routine. Fixed point multiplication or division by power of two is cheaper to implement as a shift. Floating point division by a constant can be implemented as multiplication by a constant which may be cheaper. Code Motion: The running time of a program may be improved if we decrease the length of one of its loops, especially an inner loop, even if we increase the amount of code outside the loop. Example: - while CHAR= do CHAR: = GETCHAR () Here GETCHAR () is assumed to return the next character on an input file. In many situations it might be quite normal that the condition CHAR: = is false the first time around, in which case the statements CHAR: = GETCHAR () would be executed zero time. An important source of modification of the above type is called code motion where we take a communication that yields the same result independent of the number of times through the loop. Loop Jamming: A related idea called loop jamming is to merge the bodies of two loops. It is necessary that each loop be executed to that same number of times and that on.

(16)

SOLUTION

Das könnte Ihnen auch gefallen