Beruflich Dokumente
Kultur Dokumente
VISUE
Q.1 What is the challenges of compiler design? Ans. Compiler writing is not an easy job. It is very challenging and must have lot of
knowledge about variables fields of computer science. Some challenges of compiler design. 1) Many Variations: Many Programming language (FORTRAN, C++, JAVA) Many programming paradigms (that is object oriented functional) Many computer architecture (i.e. MIPS, SPARC, INTEL, ALPHA) Many operating system (i.e. LINUX, SOLARIES, WINDOWS) 2) Qualities of a good compiler: The compiler itself must be bug-free. It must generate correct machine code The generated machine code must run fast. The compiler itself must be portable. It must print good diagnostic and error messages The generator code must work well with existing debuggers. Must have consistent and predictable optimization
3) Building a compiler requires knowledge of: Programming logic Theory of automata, context free Algorithms and data structure Computer architecture Software engineering
Q.3 What is an activation record? Draw one activation record with the fields those are generally present in many languages.
Ans. Each execution procedure is referred to as an activation of the procedure. If a procedure is no recursive then there exists only one activation of procedure at a time, whereas if a procedure is recursive several activation of that procedure may be active at the same time. The information needed by a single execution or a single activation of a procedure is managed using contiguous block of storage called an activation record or activation frame consisting of the collection of fields. The activation record contains the following information. 1. Temporary values used during expression evaluation 2. Local data of a procedure 3. Saved machine status information (PC, registers, return address) 4. Access link for access to non local names (1) SOLUTION
Compiler Design
VISUE
5. The actual parameter 6. The returned value use by called procedure to return a value of calling procedure. 7. (Optional) Control link, points to the activation record of the caller.
Return Value Actual Parameters Control Links Access Links Machine Status Local Data Temporaries
Ans. Symbol Table: A symbol table is a data structure connecting a record with fields for the attributes of the identifier. The data structure allows us to find the records for each identifier quickly and to store or retrieve data from that record quickly. Importance: The symbol table must have an efficient mechanism for accessing the information held in the tables as well as for adding new entries to the symbol table. Symbol table is a useful abstraction to aid the compiler to ascertain and verify the semantics or meaning of piece of code. This most important function of the symbol table in a compiler first they cache useful information about various symbol from the source code program for later use during code generation second they provide the type checking mechanism that determine the semantic correctness of a program.
Panic Level Recovery: The strategies that a parser can employ to recover form symbolic error. This is the simplest method to implement and can be used by most parsing methods. On discovering an error, the parser discards input symbol once at a time until one of a designated set of synchronizing token is found. The synchronizing token is found. The synchronizing tokens are usually delimiters, such as semicolon or end whose role in the source program is clear. The compiler design must select the synchronizing token appropriate for the source language. While panic mode correction often skips a considerable amount of input without checking it for additional errors, it has the advantages of simplicity and unlike some others methods to be considered later, it is guaranteed not to go into an infinite loop. In situations where multiple errors in the same statement are rare, this method may be quite adequate. Phrase Level Recovery: On discovering an error, a parser may perform local correction ion the remaining input, that is, it may replace a prefix of the remaining input by some string that allows the parser to continue. A typical local correction would be replacing a comma by it semicolon delete an extraneous semicolon, or insert a missing semicolon delete an extraneous semicolon or insert a missing semicolon. We must be careful to choose replacements that do not load to infinite loops, as would be the case for example, if we always inserted something on the input ahead of the current input symbol. This type of replacement can correct any input string and has been used in server error repairing compilers. The method was first used with top-down parsing. Its major drawback is the difficulty it has in coping with situation in which the actual error has in which the actual error has occurred before the point of detection.
(2)
SOLUTION
Compiler Design
VISUE
Q.6 What do you mean by syntax directed definitions and translation scheme.
Ans. Syntax Directed Definitions: A syntax directed definitions is a generalization of grammar in which each grammar syntax has an associated set of attributes, partitioned into two subsets called the synthesized and inherited attributes of that grammar symbol. In syntax directed definitions each grammar function production A -> has associated with it a set of semantic rules of the form b:= f (c1,c2,c3,ck) where f is a function and either. 1. b is a synthesized attributes of A and c1, c2, .ck are attributes belonging to the grammar symbols of the production. 2. b is an inherited attributes of one of the grammar symbols in the right side of production and c1, c2ck are attributes belonging to the grammar symbol of the production. Syntax directed definition terminals are assumed to have synthesized attributes only as the definition does not provide any semantic rules of terminals. Syntax Directed Definitions: A syntax directed definition generalizes a content free grammar by associating a set of attributes with each node in a parse tree. Each attributes gives some information about the node. Ex: attributes associated with an expression node may give its value, its type or its location memory etc. As an example, suppose the grammar certain the production. X YZ so node X in a parse tree has nodes Y and Z as children and further suppose the nodes X, Y and Z have associated attributes X.a, Y.a and Z.a respectively. If the semantic rule (X.a:= Y.a + Z.a) is associated with the x YZ production then the parser and a attributes of node Y and Z together and set the a attribute of node X to their sum. Translation Scheme: In syntax directed translation schemes: Embed semantic rules into the grammar Each semantic rule can only use information computed by already executed semantic rules. If compare with syntax directed definitions then we found that definitions describe only relationship among attributes associated with grammar symbols. A translation scheme is a convenient way of describing an n attributed definition. Ex: - assume the grammar has a production A XY and further assume that A, X and Y have inherited attributes A.i, X.i and Y.i and synthesized attributes A.s, X.s and Y.s respectively. Because we have an i-attributed definition. X.i can only be function of A.i that is X.i=F (A.i) Y.i can only be function of A.i, X.i and X.S that is Y.i=g (A.i, X.i, X.s) A.s is a function of A.i, X.i and X.s, Y.i and Y.s that is A.s = h (A.i, X.i, X.s, Y.i, Y.s) A translation scheme would be embed in the production A XY as follows A {X.i = f (A.i);}X {Y.i = g (A.i, X.i, X.s);}y
(3)
SOLUTION
Compiler Design
VISUE
While the programmer is unlikely to introduce any dead code internationally it may appear as the result at previous transformations. Code motion, which moves code outside a loop, introduction variable elimination which reduction in strength. 5. Code Motion: An important modification that decreases the amount of code in a loop is code motion. The transformation takes an expression that yields the same results independent of the number of times a loop is executed and of times a loop is executed and places the expression before the loop. 6. Induction Variables and Reduction in strength: While code motion is not applicable to the quick sort, considering the other two transformations are loops are usually processed inside out.
Compiler
Target Program
Compiler Design
VISUE
Advantages: A one pass compiler is fast, since all the compiler code is located in the memory at once. It can process the source text without the overhead of the operating system having to shut down one process and start another. Output of each pass of the multipass compiler is stored on disk and must be reacting in each time the next pass starts. Multipass compiler has not imposed any type of restriction upon the user. Each pass of compiler can be regarded as a mini compiler, having an input source written in one intermediary language and producing an o/p written in another intermediate language.
DFA
Deterministic Finite Automata
2. State can have more than one transition for some input symbol 3. When we follow Thompsons rule we have transition and single final state 4. Functionally NFA has more no of states that DFA 5. Slower recognizer 6. Expression to NFA is easy 7. Automation is not easy
2. 3. 4. 5. 6. 7.
There is not a case such like that Not the case Less no of states than NFA in worst case no of state is exponential Faster Recognizer Expression to DFA is not easy Automation is easy
Compiler Design
VISUE
A simple but effective technique for locally improving the target code is peephole optimization; a method for the target program by examining a shorter sequence of target instruction is called peephole and replacing these instructions by a shorter or faster sequence, whenever possible. Although, we discuss peephole optimization a technique for improving the quality of the target code, the technique can also be applied directly after intermediate code generation to after intermediate code generation to improve the intermediate representation. The peephole is a small, moving window on the target program. The code in the peephole need not be contiguous, although some implementations do require this. It is characteristic of peephole optimization that each improvement may spawn opportunities for additional improvement. Following are the characteristic of peephole optimizations: Redundant- Instruction elimination Flow of control Optimization Algebraic Simplification Use of machine idioms Redundant Instruction Elimination: If we see the instruction sequence MOV Ro, a MOV a, Ro While target code such as would not be generated if the algorithm were used, it might be, like the maintained. Unreachable Code: Another opportunity for peephole optimization is the removal of unreachable instructions. An unlabeled instruction immediately following an unconditional jump may be removed. This operation can be repeated to eliminate a sequence at instruction. Flow of Control Optimizations: Unnecessary jumps can be eliminated in either the intermediate code or the target code by the following types of peephole optimization. Algebraic Simplification: There is no end to the amount at algebraic through peephole optimization. Reduction in strength: Reduction in strength replaces expensive operations by equivalent character ones on the target machines certain machine. Use of Machine Idioms: The target machine may have hardware instructions to implement certain specific operations efficiently. This mode can also be used in code instructions.
Compiler Design
Q.15 How sentinel technique used in i/p buffering
VISUE
Ans. Sentinel technique: It is the technique in which use the code for advancing the forward pointer performs tests. As shown in figure must check each time move the forward pointer that have not moved off one half of the buffer, it do then must reload another half. Code for advance forward pointer. If forward at end of first half then begin Reload second half Forward = forward +1 End Else if forward at end of second half then begin Reload first half Move forward to beginning of first half End Else forward: = forward + 1 Except at the end of the buffer halves the code requires two tests for each advance of the forward pointer. Here it can reduce the two tests to one if we extended each buffer half to hold a sentinel characteristic at the end. The sentinel is a special character that cannot be part of the source program. Most of the time the code performs only one test to see whether forward points to an eof. Only when reach the end of a buffer half or the ends of the file do perform more tests, the average number of tests per input character is very close to 1.
STATIC NOTION
1. Definition of a procedure
DYNAMIC COUNTERPART
1. activations of the procedure
Left Recursive Grammar: a left recursive grammar can be eliminated by rewriting the offending production. For example consider a non-terminal A with two productions. AA| Where and are sequence at terminal and that do not start with A. Handle: A handle of a right sentential form Y is a production A and a position of Y where the string may be found and replaced by A to produce the previous right sentential form in a rightmost derivation of Y. Closure of Language: If I is set of items for a grammar then closure (l) is a set of items constructed from l by following two rules. (a) Every item in l is in closer (l). (b) If Aa.xy is in closure (l) and x r is production then add xr to closure (l). (7) SOLUTION
Compiler Design
VISUE
(8)
SOLUTION
Compiler Design
VISUE
Handle: A handle of a right sentential form Y is a production A and a position of Y where the string may be found and replaced by A to produce the previous right sentential form in a rightmost derivation of Y. That is if s AW W then A is the position following is a handle of W. The string w to the right of the handle contains only terminal symbol. Handle Pruning: A rightmost derivation in reverse often called a canonical reduction sequence is obtained by handle pruning. That is we start with a string of terminals w which we wish to parse. If w is sentences of the grammar at hand then w=Yn where Yn is the nth right sentential form of some as yet unknown rightmost derivation.
Call
Actual transfer of 2 and 3
Add (2, 3)
Pass by value is normally implemented by the actual data transfer. The main disadvantage of the pass by value method if physical moves are done is the additional storage is required for the formal parameters, either in the called sub-program or in some area outside both the caller and called subprogram. Storage and the move operation can be costly if the parameter is large such long array. 2. Parameter Passing Pass by Reference: In case of pass by reference method we pass reference of actual parameter to the formal parameter. The actual parameters are shared with the called subprogram.
Passing of reference
{ }
(9)
SOLUTION
Compiler Design
VISUE
The several disadvantages to the pass by reference method. First access to formal parameter will most likely be slower because more level of indirect addressing needed then when data values are transmitted. Another serious problem of pass by reference is that aliases can be created. This is expected because pass-by the reference makes access path available to the called sub-programs.
Free
Value of Y
Free
Value of X
Free
Value of Z
Free
X Y Z
There we see pointer form fixed location representing three names x, y and z. These fixed locations might be allocated statically or might be on a stack. They point to blocks of memory in the heap, and the value of each name, including a data description giving the block length, is kept in the block. The portions of the heap not currently in use are linked together in an available space list. That is each free block contains a pointer to the next free block and information regarding how long free block is. One method is to attach a use count to every block telling how many pointers reaches zero, the block is made free. Another problem of heap management is fragmentation when we free a block we must do something to attach it to adjacent free blocks if any. The available space list will consist of many little blocks, none of which is sufficient to hold a large block of data. When there is no more available space, each block is checked to see whether there is a path of pointers from the location associated with some name to the block. If not the block is placed on the available space list, this process is called garbage collection.
(10)
SOLUTION
Compiler Design
VISUE
There is nothing wrong with this simple algorithm. The three address statement is an abstract form of intermediate code. In an actual compiler these statement can be implemented an of the following way. (1) Quadruples (2) Tripler (3) Indirect triples (4) Comparison of representation (5) Single array representations Quadruples: a record structure with four fields which we shall call OP, ARG1, ARG2 and RESULT. This representation of three address statements is known as quadruples. Here OP contains an internal code for the operator. Ex: - A: =B OPC, here puts B in ARG1, C in ARG2, and A in RESULT. Ex: - T1 = -B T2 = C +D T3 = T1* T2 A: = T3 OP (0) (1) (2) (3) Uminus + * := ARG1 B C T1 T3 ARG2 D T2 RESULT T1 T2 T3 A
Triples: To avoid temporary names into the symbol table can allow the statement computing temporary value into represent that value. In three address code representation the ARG1, ARG2 and argument of OP are either pointer to the symbol table or pointers into the structure itself. Since three fields are used this intermediate code format is known as triples. OP (0) (1) (2) (3) Uminus + * := ARG1 B C (0) A ARG2 D (1) (2)
Indirect Triples: Another Implementation of three address code which has been considered is that of listing pointers to triples rather than listing the triples themselves. This implementation is naturally called Indirect Triples. STATEMENT (0) (1) (2) (3) (14) (15) (16) (17) (0) (1) (2) (3) OP Uminus + * := ARG1 B C (14) A ARG2 D (15) (16)
(11)
SOLUTION
Compiler Design
Q.23 Comparison of Representations: The use of Indirection.
VISUE
Ans. Difference between triples and quadruples as a matter of how much indirection is present in the representation. The quadruple notation, the location for each temporary can be immediately accessed via the symbol table from where it is needed. At the three address statements defining or using the temporary. An important benefit or quadruples appears in an optimizing compiler, where we often move statement around. Moving a statement that defines a temporary value requires us to change all pointers to that statement in the ARG1 and ARG2 arrays. These problems make triples difficult to use in an optimizing compiler. Single Array Representation: Both triples and quadruples waste some space, since fields will occasionally be empty. If space is important, one can use a single array and store either triples or quadruples consecutively. Since the operator determines which fields are actually in use, we can decode the single array if we follow each operator by those of ARG1, ARG2 and RESULT. The disadvantages of this representation is seen if we try to examine the statements in reverse order, since we cannot tell just by looking at a word whether it represents an operator or operand.
(12)
SOLUTION
Compiler Design
VISUE
Lexeme: Lexeme is a sequence of characters in the source program that is matched by the pattern for a token. In other words we can say that group of characters into meaningful sequence called lexeme.
(13)
SOLUTION
Compiler Design
VISUE
New names are added to list in the order in which they are encountered. To retrieve an information about a name we search from the beginning of the array up to the position marked by pointer AVAILABLE, which indicates the beginning of the empty portion of the array. When the name is located, the associated information can be found in the words following NEXT. If we reach AVAILABLE without finding the name, we have a fault the use of an undefined name. If the symbol table contains n names, the work necessary to insert a new name is proportional to n. To find the data about a name we shall on the average search n/2 name, so the cost o f an inquiry is also proportional to n. To insert n names and m inquiries the total work is Cn (n + m), where C is a constant representing the time necessary for a few machine operations. In a medium sized program we might have n=100 and m=1000 so several hundred thousand machine operations are utilized in bookkeeping Advantage: One advantage of the list organization is that the minimum possible space is taken. In a simple compiler, the space taken by the symbol table may consume most of the space used for the compilers data. If space is at a premium, it may well pay to use the inefficient list organization for the symbol table.
Q.29 Classify the symbol table and explain any one of them with example.
Ans. Symbol table mechanism would like a scheme that allows us to add new entries find existing entries in a table efficiently. There are three symbol table mechanism discuss in that 1. Lists 2. Tree 3. Hash Search Trees: a more efficient approach to symbol table organization is to add two links fields LEFT and RIGHT to each record. To use these fields to link the records into a binary search tree. This tree has the property that all names Name.j accessible from NAME. I by following the link LEFT I and then following any sequence of links will precede NAME I in alphabetical order. We searching for the records NAME and have found the record for NAME.I. Here, the algorithm to take for NAME in a binary search tree, where p is initially a pointer to the root. While P# NULL do If NAME = NAME (p) Then Else if NAME < NAME (p) Then p: = LEFT (p) Else NAME (p) < NAME Then P:= RIGHT (p) If name are encountered in a random order, the average length of a path in the tree will be proportional to log n, where n is the number of names. Since each search follows one path from the root the expected time needed to enter n names and make m inquires is proportional to (n + m) log n. If n is greater than about 50, there are clear advantages to the binary search tree over the linear list and probably over the linked self organizing. If efficiency is paramount, however there is an even better method that the binary search tree, the hash table. Hash table: This scheme gives us capability of performing m accesses on n names in time proportional to n (n+ m) / k, for any constant k of our choosing since k can be made as large as we like, this method is generally superior to linear lists of search trees and is the method of choice for symbol table in most situations especially if storage is not particularly costly. The basic hashing scheme is illustrated in figure. Two tables, a hash table and a storage table are used. The hash table consists of k words numbered 0, 1... K-1. These words are pointers into the storage table to the heads of k separate liked and lists. (14) SOLUTION
Compiler Design
VISUE
Each record in the symbol table appears on exactly one of these lists. To determine the whether NAME is in the symbol table, we apply to NAME a hash function h such that h (NAME) is an integer between 0 and k-1. It is on the list numbered h (NAME) that the record for NAME belongs. Since the average list is n/k records long if there are n names in the table. H will distribute names uniformly among the k lists and h is easy to compute for names consisting of strings of characters.
NAME1
NAME
DATA1 LINK1 Hash Table NAME2 DATA2 LINK2 NAME3 DATA3 LINK3 AVAILABLE Storage Table
(15)
SOLUTION
Compiler Design
Q.31 Define the following.
VISUE
Ans. Strength Reduction: Reduction in strength replaces expensive operations by equivalent cheaper ones on the target machine. Certain machine instructions are considerably cheaper and than others and can often used as special cases of more expensive operator. For example, X2 is invariably cheaper to implement as x*x than as a call to an exponentiation routine. Fixed point multiplication or division by power of two is cheaper to implement as a shift. Floating point division by a constant can be implemented as multiplication by a constant which may be cheaper. Code Motion: The running time of a program may be improved if we decrease the length of one of its loops, especially an inner loop, even if we increase the amount of code outside the loop. Example: - while CHAR= do CHAR: = GETCHAR () Here GETCHAR () is assumed to return the next character on an input file. In many situations it might be quite normal that the condition CHAR: = is false the first time around, in which case the statements CHAR: = GETCHAR () would be executed zero time. An important source of modification of the above type is called code motion where we take a communication that yields the same result independent of the number of times through the loop. Loop Jamming: A related idea called loop jamming is to merge the bodies of two loops. It is necessary that each loop be executed to that same number of times and that on.
(16)
SOLUTION