Sie sind auf Seite 1von 4

SVPMs College of Engineering DEPARTMENT OF COMPUTER ENGINEERING

Class :- BE Comp(2013-14) Subject :- Computer Laboratory-l (410446) Staff :- Kumbhar Hemant Ramdas

PRACTICAL ASSIGNMENT NO:-2

Aim: - To Implement a lexical analyzer for a subset of C using LEX implementation with Error handling Concepts: How Lexical Analyzer for subset of C works? The lexical analyzer reads source C program and produces tokens, which are the basic lexical units of the language. For example, the expression *ptr = 56; contains 10 characters or five tokens: *, ptr, =, 56, and ; . For each token, the lexical analyzer returns its token code and zero Or more associated values. The token codes for single-character tokens, such as operators and separators, are the characters themselves. Defined constants (with values that do not collide with the numeric values of significant characters) are used for the codes of the tokens that can consist of one or more characters, such as identifiers and constants. For example, the statement *ptr = 56; yields the token stream shown in second column of Table2.1 below; the associated values, if there are any, are shown in third column of Table2.1. Token Operator Identifier Operator Constant Token CODE '*' ID '=' VAL Lexeme Stored In

"ptr"

Symbol-table entry for "ptr Literal-table entry for 56

"56"

Punctuation Mark ; Table 2.1:- Lexical analysis of statement


1|Page

SVPMs College of Engineering DEPARTMENT OF COMPUTER ENGINEERING


Class :- BE Comp(2013-14) Subject :- Computer Laboratory-l (410446) Staff :- Kumbhar Hemant Ramdas

PRACTICAL ASSIGNMENT NO:-2

The token codes for the operators * and = are the operators themselves, i.e., the numeric values of * and =, respectively, and they do not have associated values. The token code for the identifier ptr is the value of the defined constant ID, and the associated values are the saved copy of the identifier string itself, a symbol-table entry for the identifier, if there is one. Likewise, the integer constant 56 returns VAL, and the associated values are the string "56" and a symbol-table entry for the integer constant 56. Keywords, such as for, are assigned their own token codes, which distinguish them from identifiers.

Lexical Specification for C Language 1. Keywords 2. Identifier 3. Operators: Arithmetic, comparison, logical 4. Constants: integer, floating, string 5. Punctuation Marks 6. Comments Symbol and Literal Tables These tables are data structures used by compiler to hold the information about source program constructs. This information is

incrementally collected by analysis phase of a compiler and used by synthesis phase to generate target code.
2|Page

SVPMs College of Engineering DEPARTMENT OF COMPUTER ENGINEERING


Class :- BE Comp(2013-14) Subject :- Computer Laboratory-l (410446) Staff :- Kumbhar Hemant Ramdas

PRACTICAL ASSIGNMENT NO:-2

Symbol table contains information like name of symbol, its type, and position in storage. Likewise Literal table contains information like literal, its type etc. Error handling It is hard for a lexical analyzer to tell, without the aid of other components, that there is a source-code error. For instance, if the string fi is encountered for the first time in a C program in the context: fi ( a == f ( x ) ) . .. a lexical analyzer cannot tell whether f i is a misspelling of the keyword if or an undeclared function identifier. Since f i is a valid lexeme for the token id, the lexical analyzer must return the token id to the parser and let some other phase of the compiler probably the parser in this case handle an error due to transposition of the letters.

However, suppose a situation arises in which the lexical analyzer is unable to proceed because none of the patterns for tokens matches any prefix of the remaining input. The simplest recovery strategy is "panic mode" recovery. We delete successive characters from the remaining input, until the lexical analyzer can find a well-formed token at the beginning of what input is left. This recovery technique may confuse the parser, but in an interactive computing environment it may be quite adequate.

3|Page

SVPMs College of Engineering DEPARTMENT OF COMPUTER ENGINEERING


Class :- BE Comp(2013-14) Subject :- Computer Laboratory-l (410446) Staff :- Kumbhar Hemant Ramdas

PRACTICAL ASSIGNMENT NO:-2

Other possible error-recovery actions are: 1. Delete one character from the remaining input. 2. Insert a missing character into the remaining input. 3. Replace a character by another character. 4. Transpose two adjacent characters. Transformations like these may be tried in an attempt to repair the input. The simplest such strategy is to see whether a prefix of the remaining input can be transformed into a valid lexeme by a single transformation. This strategy makes sense, since in practice most lexical errors involve a single character. A more general correction strategy is to find the smallest number of transformations needed to convert the source program into one that consists only of valid lexemes, but this approach is considered too expensive in practice to be worth the effort.

Programs:Program1. Implement a lexical analyzer for subset of C. program must include Symbol table and Literal table creation and error handling mechanism (use panic mode recovery). Conclusion:Hence we understand which different tokens in C are. How to write regular expression for each type of token. With the help of LEX tool and Regular expression for tokens we now learned how to separate out input source program in tokens. We also learned Symbol table, literal table management and error handling and recovery.
4|Page

Das könnte Ihnen auch gefallen